The Mobile AI Projects That Deliver ROI vs the Ones That Become Board Presentations

#mobile #webdev #decisionframeworks #aimandate

This piece was written for enterprise technology leaders and originally published on the Wednesday Solutions mobile development blog. Wednesday is a mobile development staffing agency that helps US mid-market enterprises ship reliable iOS, Android, and cross-platform apps — with AI-augmented workflows built in.

Seven in ten enterprise AI pilots never reach production. The ones that do share four characteristics. The ones that stall share three different ones.

Seven in ten enterprise AI pilots never reach production. The money is spent. The vendor presents a demo. The board asks for a launch date. The answer is a revised timeline. Six months later, the project is described as "learnings" in a quarterly review and the budget is allocated elsewhere.

This is not a technology problem. The technology works. It is a scoping problem, a vendor selection problem, and sometimes a project sequencing problem. The four characteristics below describe what the projects that actually ship have in common. The three that follow describe what the ones that stall have in common instead.

Key findings
Projects that reach production start with a user problem, not a technology choice. Projects that stall start with a technology choice and then look for a problem to apply it to.
The vendor's AI capability in their development process — faster code, better testing — is separate from their ability to ship AI features in a production app. Evaluate both, but do not conflate them.
A 90-day pilot that ends with a demo is not a pilot. It is a proof of concept. Only count it as success when real users interact with the feature in a production environment.

The seven in ten problem

Gartner's research on enterprise AI consistently shows that the majority of AI projects initiated do not reach production. The causes vary in the details but converge on the same pattern: the project was scoped before the fundamentals were assessed, the vendor was selected for general capability rather than specific AI track record, and the success criteria were defined loosely enough that it was never clear when the project had actually succeeded.

The consequence is a cycle that repeats: board mandate, vendor engagement, pilot that does not ship, regrouping, second pilot. Each cycle costs time and budget and erodes the board's confidence in the organisation's ability to execute on AI.

Breaking the cycle requires understanding what separates the projects that ship from the ones that do not.

Four characteristics of projects that ship

1. The feature solves a problem users already have, using data the app already collects.

The AI features that reach production fastest are not the most technologically ambitious. They are the ones where the user problem is already visible in the app's data — users failing to find what they search for, users abandoning a manual data entry step, users asking for something the app does not yet surface automatically. The AI layer adds intelligence to a problem that is already established. The data required to train or run the model already exists in the system.

Contrast this with a feature that requires users to change their behavior to generate the data the model needs. That feature requires adoption before it becomes useful, which means it provides no value until a threshold of users has engaged with it — a threshold that may not be reached before the board loses patience.

2. The success metric is defined before the build starts.

"Users engage with the AI feature" is not a success metric. "Search success rate — defined as a search session that ends with the user opening a result — increases from 42 percent to 60 percent within 60 days of launch" is a success metric. The specificity does two things: it forces clarity about what the feature is actually trying to accomplish, and it gives the board something observable to track rather than a narrative to evaluate.

Projects with a defined, measurable success metric ship because the team knows what done looks like. Projects without one drift because every iteration can be described as progress.

3. The vendor has shipped the same class of feature before.

A vendor that has shipped an on-device document classification feature in a production insurance app understands the accuracy threshold required for user trust, the App Store disclosure requirements for features that process document content, and the edge cases that appear at scale. A vendor encountering these for the first time will learn on your project.

The gap between a vendor that has shipped the feature before and one that has not is not a quality gap — it is a timeline gap. The second vendor will eventually figure out what the first vendor already knows. The question is whether your timeline allows for that learning curve.

4. The compliance and App Store implications are assessed before the build starts.

AI features in enterprise mobile apps trigger three categories of review that do not apply to standard features: data handling and residency requirements (who processes the data the model uses, where it is stored, what happens if the model is updated), App Store AI disclosure requirements (Apple requires disclosure of AI-generated content and certain categories of AI feature), and internal compliance review (healthcare, financial services, and regulated industries have specific requirements around AI-generated outputs presented to users).

Projects that assess these before the build starts absorb the compliance work into the timeline accurately. Projects that discover them mid-build absorb them as delays.

Three characteristics of projects that stall

1. The project starts with a technology choice rather than a user problem.

"We are going to add a large language model" is a technology choice. "We are going to reduce the time users spend on manual data entry in the claims flow" is a user problem. The first produces a project looking for a home. The second produces a project with a clear destination.

Technology-first projects tend to produce impressive demos. The demo shows the model doing something sophisticated. What it does not show is whether the output is accurate enough, consistent enough, and relevant enough for real users to rely on it. That gap between demo and production is where most AI pilots stall.

2. The vendor's AI capability was evaluated on their pitch, not their track record.

Every vendor will claim AI capability in 2026. The claim is not evidence. The evidence is a production app with a named AI feature that real users interact with today. Ask for it. If the vendor redirects to their AI development workflow — faster code review, better testing — that is a different capability. Valuable, but different.

A vendor that uses AI in their development process ships faster. A vendor that has shipped AI features in production apps can build yours. Both claims can be true for the same vendor. Evaluate each separately.

3. The pilot scope was set to match the timeline rather than the work.

"We have 90 days and a fixed budget, so we will scope something that fits." This produces a feature that was designed to be buildable rather than to be useful. The feature ships — or reaches a demo state — within the timeline. Users ignore it. The metric does not move. The board receives a presentation about what was learned.

A pilot scoped to be learnings rather than to ship a useful feature is not a pilot. It is a hedge against the risk of failing to deliver something real. The projects that produce ROI are scoped around what users need, then timed accordingly — not scoped around the timeline and justified afterward.

Read more case studies at mobile.wednesday.is/work

The decision framework

Before approving budget for an AI pilot, ask four questions:

One: what specific user problem does this feature solve, and what data do we already have that the feature would use? If neither can be answered specifically, the project is not ready to scope.

Two: what is the measurable success metric, and what is the threshold at which we would call the pilot a success? If this cannot be agreed before the build starts, it cannot be evaluated after.

Three: has the vendor shipped this class of feature in a production app? If not, what is the realistic additional timeline for the learning curve, and is that acceptable?

Four: what are the compliance and App Store implications, and have they been assessed? If not, build the assessment into the discovery phase before any build work begins.

A project that cannot answer all four questions clearly is not ready to start. Pushing it to start anyway produces one of the seven in ten.

What to tell your board

The most effective response to a board AI mandate is not a roadmap. It is a single, well-scoped feature with a specific metric and a realistic timeline.

"We are going to add AI-powered document extraction to the claims flow. It will reduce manual entry time by 35 percent, which maps directly to the operational cost reduction goal. The feature will be in production in 14 weeks. We have selected a vendor with three prior production deployments of the same class of feature. Here is what success looks like at week 14."

That is a response the board can evaluate, track, and hold accountable. A roadmap of five features with a 12-month timeline is not.

The board does not want AI in the abstract. They want evidence that the organisation can execute on AI at a speed and quality that matters to the business. One shipped feature, with a real outcome, delivers that evidence. Five demos do not.

Want to go deeper? The full version — with related tools, case studies, and decision frameworks — lives at mobile.wednesday.is/writing/mobile-ai-projects-that-deliver-roi-vs-become-board-presentations-2026.