Every AI project I've run has a moment that feels like a finish line and isn't. Someone in the room sees the demo work, the answer comes back clean, and the mood shifts to "great, when can we ship it." I've learned to brace for that moment, because the gap between a demo that works once and a feature people can depend on every day is where most of the real project actually lives.
The numbers back up the feeling. Depending on whose study you read, somewhere between 80% and 90% of AI pilots never reach production. The reason is rarely the model. The model is usually the part that works. What stalls projects is everything around it: the data that's messier in production than in the curated demo set, the integrations into tools nobody scoped, the question of who's accountable when an answer drifts six weeks after launch. (More on why so many stall: the things AI projects that don't get cancelled do differently.)
So when a client expects magic, my job isn't to deflate them. It's to redraw the finish line somewhere honest.
Scoping when the expectation is "it just knows"
The trickiest conversations happen early, before any code, when the client describes what they saw a competitor do or what ChatGPT did for them over the weekend. That's a real reference point and I don't dismiss it. But a weekend prompt and a production feature are different animals, and the difference is almost all the work.
What I do instead of promising the magic: I get specific about the unglamorous list. Where does the data come from, and who keeps it clean? What happens when the model is confident and wrong? Who reviews the edge cases, and how often? None of that shows up in a demo. All of it shows up in week three of production. Putting it on the table during scoping isn't pessimism. It's the only way the estimate means anything.
LLM accuracy is a range, not a number
This is the expectation I reset most often. Traditional software is deterministic. You give it the same input, you get the same output, and "done" is a thing you can point at. LLMs aren't like that. The same question can return a slightly different answer, and "correct" depends on context. Hallucination rates in real, live conditions vary wildly by domain. A general assistant might be fine. Anything touching legal or medical content carries much higher risk and needs heavy guardrails before it goes anywhere near a user.
I've stopped saying "the AI will be accurate." I say: here's the accuracy band we're targeting, here's how we'll measure it, and here's what we do with the cases that fall outside it. A human review step, a confidence threshold, a fallback. Clients are far more comfortable with a known failure mode than with a promise that has no floor. The teams that get burned are the ones who sold a number they couldn't hold.
Where the budget actually goes
The other expectation gap is cost, and it's brutal because it's invisible early. The proof of concept is cheap. It's a sliver of the budget and it makes everyone optimistic. Then production shows up with the bill nobody itemized: monitoring, a rollback plan, retraining when behavior drifts, incident response when it breaks at an awkward hour. I've seen the real first-year cost of running a thing land many times above what the POC suggested, and the surprise is what damages trust, not the number itself.
So I price the boring parts out loud, in the first estimate, even when it makes the proposal less exciting. A POC that omits the cost of keeping the thing alive isn't a smaller version of the project. It's a different project.
What "done" means now
For a normal feature, done is when it passes QA and ships. For an AI feature, I treat done as the point where it runs reliably without me standing next to it. That means it's monitored, someone owns it, there's a clear path when an answer goes sideways, and the people using it understand what it can and can't do. The demo proved the idea was possible. Done proves we can live with it.
This is the part of delivery I care most about at Shanti Infosoft, because it's the part clients feel every day after the excitement of launch wears off. The demo earns the meeting. The reliable version earns the renewal. My whole job is making sure we don't confuse the two, and that the client doesn't either.
The demo always works. Treat that as the start of the hard part, not the end of it, and the rest of the project gets a lot more honest.
Top comments (0)