Introduction: The Reality of AI in Organisations Today
AI is everywhere—embedded in tools, teams, and workflows. Yet, for most organisations, the journey from experimentation to real, scalable value is fraught with challenges. Despite significant investment, the majority of AI initiatives stall at the pilot stage, fail to deliver measurable business outcomes, or create new risks and inefficiencies. Why is this happening, and what can organisations do to break the cycle?
1. Pilots Everywhere, Production Nowhere
The Problem:
AI adoption is high, value realization is low. Few mange to move beyond the pilot stage of Agentic workflows.
Many organisations run numerous team-level pilots that rarely make it to stable production or translate to measurable business outcomes. A common pattern: fragmented ownership, tool sprawl, and unclear pathways from idea to production - resulting in duplicated efforts and stalled scale.
- Fragmented ownership and accountability across teams
- Shadow AI and tool sprawl without governance
- No clear, repeatable pathways from idea → pilot → production
- Local optimisations that don’t translate to enterprise value
- Duplicated efforts and wasted investment across teams
- Difficulty scaling successful pilots beyond their originating area
Industry insights:
Independent analyses (e.g., MIT commentary on enterprise AI) report high spend but limited realised value—often less than 25% of initiatives achieve measurable impact. The common pattern is lots of people- and AI-assist pilots without the governance, operating model, and value-cadence needed to cross the production chasm. (Related discussion)
What To Do:
- Establish clear, repeatable pathways from idea to production.
- Centralise visibility of all AI initiatives to reduce duplication.
- Focus on scaling proven use cases, not just launching new pilots.
2. When AI Experiments Go Nowhere
The Problem:
Despite heavy investment, most organisations run numerous AI pilots that never translate to measurable business value or stable production. Fragmented ownership, tool sprawl, and unclear paths from idea to production stall scale and outcomes.
- Inability to prove ROI or business impact
- Gaps in risk, compliance, and audit readiness
- Difficulty communicating innovation and progress to stakeholders
Why It Matters:
Leaders need timely, accurate, and aggregated data to make informed decisions, secure funding, and show responsible AI use. Without this, pilots fail and value is lost.
If producing quarterly reports - especially collecting and aggregating data on AI use cases and innovation - takes weeks or months, your ability to adapt is limited, creating new challenges.
What To Do:
- Automate data aggregation and reporting for AI initiatives.
- Build real-time dashboards that connect AI activity to business outcomes and risk.
- Make governance part of the delivery flow, not an afterthought.
3. Organisational Bottlenecks: Where Progress Gets Stuck
The Problem:
As organisations grow their AI capabilities, bottlenecks shift across teams and functions. E.g. it was common to blame the code writing as the bottleneck but with AI assisted code many of those bottlenecks have moved elsewhere. Now more than even product discovery and rapid prototyping for feedback are key skills to avoid jamming in countless features that no one actually wanted to need.
- Planning and prioritisation — code speeds up, but prioritisation cadence doesn’t. Result: more features started than finished, longer cycle time, and stalled value delivery.
- Adoption and change management — customers and internal users can’t absorb the rate of change. Result: low feature usage, rework, and support load increases.
- Operations, risk, finance, and security — deployment, controls, and runbooks don’t keep pace. Result: operational incidents, audit gaps, unplanned cost, and release freezes.
Theory of Constraints:
Optimizing non-bottleneck steps (e.g., code generation) yields little benefit if decision-making, governance, and customer validation remain the constraint. Focus where flow actually stalls.
- Examples of mixed-methods metrics:
- Outcomes: retention lift, revenue per user, task success, resolution time, policy compliance rate.
- Outputs: deployment frequency, lead time for change, MTTR, escaped defects, model drift rate.
- Qualitative: survey pulses, diary studies, field observations, sense-making workshops, ethics reviews.
Align outcomes and outputs to impact: connect every initiative to an outcome hypothesis, define leading indicators and guardrails, and review them in a regular value cadence so the organisation orients around strategy—not activity.
Measurement guidance (educational sources): Prefer evidence from user-centered and ethical research practices (e.g., IEC/ISO human-centred design standards like ISO 9241-210, Stanford d.school methods, and academic HCI literature) over purely promotional industry reports.
- If code generation speeds up but decision-making doesn’t: track lead time from idea → decision → release; implement weekly prioritization and a clear stop-start-continue cadence to unblock teams.
- If technical debt isn’t addressed: set WIP limits and a standing debt budget (e.g. 20%); measure operational toil and defect escape rate; only scale what you can reliably run.
- If operations can’t keep up: move to trunk-based delivery with automated tests, SLOs, and error budgets; gate releases on runbook readiness and on-call coverage.
- If customers can’t absorb changes: adopt feature flags and progressive delivery; monitor adoption, task success, and time-to-value, not just releases shipped.
- How do you know you’re solving real problems? Run continuous discovery: weekly customer interviews, in-product feedback prompts, concierge tests, and A/B tests tied to outcome metrics.
What To Do:
- Track lead time from idea to decision to release, not just code velocity.
- Implement regular, outcome-focused prioritisation and stop-start-continue reviews.
- Limit work-in-progress and focus on what delivers measurable impact.
4. Measuring Outcomes and Impact, Not Just Outputs
The Problem:
Many organisations fall into the trap of measuring activity (outputs, vanity metrics) rather than real business outcomes and impact. This leads to:
- Misaligned incentives (leads to local optimisation)
- Failure to scale successful initiatives (how to take what works and build capability)
- Inability to demonstrate value to finance, product, or audit
Emerging practice
It’s too early to declare universal best practices in AI measurement - what works today will evolve (think Cynefin: complex, novel, and context‑dependent). Focus on emerging and good practices that fit context, and adapt as you learn. Participate in communities of practice / local meetups / to share and learn from your colleagues internally and your peers.
Be cautious with metrics: it’s easy to track vanity metrics that don’t matter internally, but avoid the opposite extreme—waiting for perfect, fully automated outcome measures.
Use mixed methods where useful (surveys, sense‑making, sensor networks, qualitative feedback). Align both outcomes and outputs to impact so the whole organisation orients around value and strategy, not just activity.
What To Do:
- Define outcome metrics for every AI initiative up front.
- Use mixed methods: combine quantitative (e.g., revenue lift, retention) and qualitative (e.g., user feedback, adoption) measures.
- Review metrics regularly and adapt based on what’s working.
5. The golden thread for AI transformation
Many AI transformation methods repeat familiar themes—customer focus, end-to-end value, outcome orientation, and platform thinking. What’s new is the speed, complexity, and uncertainty of AI. Organisations need a focused way to prioritise key AI initiatives.
The Golden Thread for AI:
A practical approach creates a “golden thread” linking strategy to top AI use cases—and back from outcomes to strategy. This means:
- Mapping the golden thread: Explicitly link strategy → portfolios → initiatives → teams → AI use cases. Trace outcomes back to strategy for fast learning and course correction.
- Prioritising high-impact AI: Use the golden thread to focus on AI opportunities with the greatest business value, avoiding scattered pilots.
- Visualising dependencies and risks: Show links across people, teams, products, data, models, and controls in dashboards to spot bottlenecks, duplication, and risks in real time.
- Embedding governance in the flow: Integrate lightweight risk, compliance, and security controls into AI delivery with clear ownership and fast feedback.
- Ruthless prioritisation: Leaders must deprioritise low-value work, reduce work-in-progress, and focus teams on outcome-driven AI experiments.
- Short, outcome-focused probes: Set outcome metrics upfront, run brief experiments, sense results fast, and scale what works—stop or adapt what doesn’t.
Why this matters:
This ensures AI isn’t scattered pilots but a value-driven transformation. Focusing on the golden thread lets organisations:
- Align every AI initiative to strategic goals and measurable outcomes.
- Accelerate learning and scaling of successful AI use cases.
- Cut wasted effort and risk by deprioritising low-impact work.
- Enable real-time governance and auditability as AI scales.
Anecdote:
A large financial firm told us it takes a month to compile data for one AI progress report. In fast-moving AI, this delay stops leaders from timely evaluation or course correction—showing the need for real-time traceability and focused execution.
This approach turns AI from scattered pilots into a disciplined, value-driven transformation. It enables real-time governance, faster scaling of what works, and clear accountability for outcomes.
Where to next?
If these challenges resonate, we’re building approaches to help organisations move from AI chaos to accountable outcomes - enabling you to see, govern, and scale your human–AI workforce with confidence.









