DEV Community

Luca Bartoccini for Superdots

Posted on • Originally published at superdots.sh

Why AI Deployments Fail — What Research Shows

Every major consulting firm has now run the numbers, and they tell the same story. McKinsey's 2025 State of AI survey found that only 39% of organizations report any enterprise-level financial impact from AI, despite 88% now using it regularly in at least one business function. BCG surveyed more than 1,800 C-level executives across 19 markets and found that 74% of companies have yet to show tangible value from their AI investments. Gartner placed generative AI in the Trough of Disillusionment on its 2025 Hype Cycle, and separately predicted that at least 30% of generative AI proof-of-concept projects will be abandoned before reaching production. The gap between AI adoption and AI impact is not a small gap, and it is not closing on its own.

The obvious interpretation is that AI is overhyped. That enterprise tools aren't mature enough. That companies need better models, better interfaces, more time. This is what most post-mortems say. It is mostly wrong.

The interesting question isn't whether AI is delivering value — some companies are clearly extracting it at scale, consistently, across multiple functions. The interesting question is what those companies are doing differently. When you look at the data carefully, the answer has almost nothing to do with the technology they chose.

The Numbers, Precisely

Before diagnosing the failure pattern, it's worth being exact about what the research actually measures, because different surveys capture different things and conflating them produces the wrong conclusions.

McKinsey's 2025 report distinguishes between organizations that use AI and organizations that profit from it. Of the 88% now using AI in at least one function, only 39% report any EBIT impact at all. Within that group, only roughly 6% — approximately 1 in 16 organizations surveyed — say AI accounts for more than 5% of their total EBIT. McKinsey labels these "high performers." They are not a different type of company. They are not bigger, more tech-forward, or running more sophisticated models. They are making different organizational choices.

BCG's October 2024 report, which surveyed 1,000 executives across 59 countries and 20 sectors, found a similar split from a different angle. Only 4% of companies have developed cutting-edge AI capabilities and consistently generate significant value. The rest exist on a spectrum from "running experiments" to "deployed and underwhelmed." The 4% are not clustered in any particular industry or geography. What clusters them is their approach.

Gartner's February 2025 analysis added a third data point: through 2026, organizations will abandon 60% of AI projects that lack AI-ready data, and 63% of organizations either don't have or aren't sure whether they have the data management practices needed to support AI at scale. This is not a model quality problem. Data readiness is an organizational infrastructure problem — one that precedes the AI decision by years and cannot be solved by switching vendors.

The pattern across these three data sources is consistent enough to be treated as a finding rather than noise: most AI deployments are not failing because AI does not work. They are failing because the conditions required for AI to work have not been created.

What Companies Are Actually Deploying

AI implementation failure is when a deployed AI system does not deliver measurable business value within the expected timeframe — distinct from technical failure, where the model underperforms, or pilot failure, where a proof of concept is abandoned before deployment. Most documented cases are implementation failures, not technical ones. The model works. The organizational context does not.

When you look at where the bulk of enterprise AI is deployed, a pattern is visible. The highest-adoption use cases are the most visible ones: email drafting, report generation, meeting summaries, customer service templates, code autocomplete. These are tasks with clear inputs and outputs, easy to prototype, easy to demo, and easy to present to a budget committee. They generate slides that show hours saved per person per week, multiplied by headcount, multiplied by average salary — a number that looks compelling in a business case.

What these use cases are not, in most cases, is high-leverage.

Writing an email faster does not change how a sales team qualifies leads. Summarizing a meeting does not change how a product team decides what to build. Generating a quarterly report in minutes rather than hours does not change how a leadership team interprets the numbers or what they decide to do about them. The visible work accelerates. The decisions underneath it stay the same. And it is the decisions — not the documentation of the decisions — where organizational outcomes are actually set.

McKinsey's data addresses this directly. The single factor most strongly correlated with AI delivering measurable EBIT impact is workflow redesign. Not model selection. Not vendor choice. Not AI budget. Redesign. Only 21% of organizations using generative AI have redesigned at least some of the workflows that AI now touches. These are, in effect, the same organizations that are reporting financial impact.

The other 79% are using AI as an accelerated typewriter. Faster output, same process, same decisions, same outcomes.

The Pilot-to-Production Gap

There is a second failure mode that aggregate numbers tend to obscure. Many AI projects that are counted as "deployed" have never left pilot status in any meaningful operational sense.

A generative AI pilot is easy. You write a prompt, you get an output, you show it to a stakeholder. The feedback loop is immediate and visible. A production-grade AI deployment that changes how decisions get made is harder by an order of magnitude — it requires clean data, integration into existing systems, process redesign, change management, and measurement infrastructure. The skills required for the former are widely distributed across organizations. The skills required for the latter are not.

BCG found that 60% of companies reporting no AI value have not defined any financial KPIs for their AI programs. They know what they are spending. They have no system for measuring what they are getting back. When there is no measurement infrastructure, there is no feedback loop. When there is no feedback loop, projects persist on optimism rather than evidence. The 60% figure is striking not because it shows negligence, but because it shows how little of the work of deployment — the organizational and measurement work — is being done relative to the work of acquisition.

What's interesting is that this measurement gap is partly structural. The easiest AI deployments to measure are the ones where output is the product: words written, hours saved, tasks completed. These are activity metrics. The deployments with the highest strategic leverage — forecasting accuracy, decision speed, error reduction in high-stakes processes — require outcome metrics that are harder to isolate, take longer to manifest, and require a measurement baseline that most organizations didn't establish before deployment started.

You cannot retroactively measure the impact of an AI that has been running for six months without a pre-deployment baseline. The organizations that are measuring impact defined their success criteria before they deployed. This is a minority.

Focus as a Differentiator

The BCG AI Radar from January 2025 surfaced a finding that doesn't receive enough attention in post-implementation reviews: high-performing AI companies focus on an average of 3.5 use cases. Companies not seeing value focus on an average of 6.1.

This pattern appears in most large-scale enterprise technology transitions. The organizations that extract value do so by concentrating effort on a small number of processes, redesigning them completely, measuring outcomes from the start, and iterating. The organizations that don't see value run portfolios of shallow pilots — each one visible enough to report upward, none of them deep enough to change how work actually gets done.

BCG's data also shows that leading AI companies allocate more than 80% of their AI investment to reshaping core business functions. Laggards allocate 56% to individual productivity tools — applications that help individuals work faster but don't change organizational processes or decision quality. The distinction matters because individual productivity gains are extremely hard to capture as business value. If a salesperson spends 30 minutes less per day on email, the company sees a financial benefit only if that 30 minutes shifts to measurably higher-value activity. If it gets absorbed into lower-priority work, or if it simply reduces effort without changing outcomes, the AI spend produces no measurable return.

This is the efficiency trap: making things faster without changing what you're doing doesn't improve outcomes. It improves activity metrics. Activity metrics are not business outcomes.

Where AI Actually Has Leverage

Across McKinsey, BCG, MIT, and Gartner, three deployment patterns appear consistently in organizations that show measurable AI ROI.

Process redesign before deployment. Organizations with the highest AI ROI don't add AI to existing workflows — they redesign the workflow first, then build AI into the redesigned version. This is slower, more expensive upfront, and harder to demo in a leadership meeting. It is also, consistently, the only approach that delivers financial impact at scale. For organizations thinking through AI change management, the research suggests that the order of operations matters more than most implementation guides acknowledge: define the process, define the outcome metric, select the AI. Most companies reverse this sequence.

Narrow scope, deep integration. High performers choose fewer use cases and integrate them more completely. They change the data flows that feed the AI, the decision processes that act on its outputs, and the incentive structures around using it. AI workflow automation platforms — the tools that wire AI into live data sources, CRM systems, and operational processes rather than running it as a standalone interface — are growing precisely because deep integration is where the leverage is. Connecting an AI to a live operational dataset is harder than running prompts against documents. It requires data quality, integration work, and organizational change. But this is consistently where the research shows returns that exceed the cost.

Measurement from day one. The best AI tools for operations aren't necessarily the most impressive in demos — they're the ones that touch the decision layer of a process rather than the output layer. Generating a formatted report of last quarter's numbers is output layer. Generating a ranked list of which customers are most likely to churn next quarter, delivered to the person who can act on it, measured against actual churn outcomes, is decision layer. The difference in leverage is not marginal.

The Invisible Leverage Problem

There is a structural reason why organizations keep deploying AI in low-leverage areas even when the evidence points toward higher-leverage alternatives.

Low-leverage use cases are easy to propose, easy to approve, and easy to demonstrate. Email writing is visible. Report generation is visible. The ROI case writes itself: X hours saved per person per week, Y people, Z dollars. The number lands on a slide and gets approved.

High-leverage use cases — AI that touches how decisions get made, how processes are prioritized, how resources are allocated — are invisible by comparison. You cannot easily demo better decisions. You cannot put improved judgment quality on a slide. The measurement infrastructure typically doesn't exist yet. The process redesign required is uncomfortable because it involves changing what people are accountable for, not just how fast they work. These projects require organizational changes that are slower, harder to quantify in advance, and more politically complex than buying a new software subscription.

This is why the AI automation for business playbook that actually works looks different from the one that gets most of the airtime. It isn't about deploying AI everywhere and measuring activity. It's about choosing fewer, higher-leverage deployment points, redesigning the processes around them, and measuring outcomes rather than outputs.

What the Research Means for Organizations Still Accumulating Pilots

Gartner's April 2026 survey of infrastructure and operations leaders found that only 28% of AI use cases in that sector fully succeed and meet ROI expectations, and that 57% of failures were attributed to expecting too much, too fast. This is consistent with where generative AI sits on the Hype Cycle. Organizations adopted it expecting immediate transformation. Transformation, in every prior technology cycle, comes after a technology has been absorbed into normal practice and workflows have been redesigned around it — not at the point of initial deployment.

The honest read of the research is not that AI doesn't work. It is that AI deployment, as most organizations are currently practicing it, doesn't work. The technology is not the constraint. The strategy is.

BCG's finding that only 4% of companies are consistently generating significant value from AI is striking not because of how few are succeeding, but because of how consistent the pattern of success is. Those companies are not using better models. They are making different choices about where to deploy, how to integrate, how to measure, and how many use cases to pursue at once.

For organizations that have been running AI pilots for a year or more without measurable financial impact, the research suggests a single diagnostic question: of the AI currently deployed, how many of those workflows have been redesigned around the AI, rather than just supplemented by it? The answer, McKinsey's data suggests, will explain most of the gap between what organizations are spending on AI and what they are getting back.

The technology will keep improving. The organizational bottleneck will not resolve itself.


Sources: McKinsey & Company, "The State of AI," March 2025; BCG, "Where's the Value in AI?", October 2024; BCG AI Radar, "Closing the AI Impact Gap," January 2025; Gartner press release, July 29, 2024; Gartner press release, February 26, 2025; Gartner press release, April 7, 2026.


Originally published on Superdots.

Top comments (0)