Has anyone actually measured ROI on their AI spend?

#ai #discuss #startup #productivity

Every time I ask this, the same thing happens. Someone pulls up a slide with projections, or the conversation moves on. Nobody has a straight answer.

And I get it. Most teams have shipped something. They have put real time and money into it. But somewhere along the way, the conversation about what success actually looks like never happened. Not because anyone was careless. Just because there was always something more pressing. And now a few months later, leadership is asking harder questions and the answers are not there.

A medical equipment marketplace came to us last year. Their job was to connect buyers to suppliers — hospitals and procurement teams on one side, manufacturers and distributors on the other. Every match was taking four to eight months, and the conversation they wanted to have was about automating it with AI.

Before anyone started building anything, I asked for a week. No code. No proposals. Just sitting with the team and watching how things actually got done, and then getting on calls with their users.

There is a moment in those conversations that I wait for. Someone starts by explaining how the process works, and then at some point they stop doing that and just tell you what it actually costs them. Not complaining. Just saying it plainly.

That deal did not close because we could not find the right supplier in time. That supplier went with someone else because we took too long to respond. That quarter was slower than it should have been.

That is the thing you are listening for. Not where does this feel slow, but where does slow actually hurt, and what does that cost.

For this team it kept coming back to the matching. A buyer would come in with a requirement, someone on the team would go through the supplier database manually, factor in location, certifications, availability, past transactions, and put together a shortlist. Same logic, done from scratch every single time.

So that is where we started. We built something that could do that matching automatically, and what used to be part of a months-long process now took a few minutes.

But before we wrote a single line of code, we had sat down and agreed on the exact numbers we were trying to move. Time to first qualified match. Deals that reached the next stage within a certain window. Revenue that closed in the quarter versus what was getting pushed out. We had named those things, written them down, and put someone's name next to each one.

When we looked at the numbers after, time to first qualified match had come down from around three weeks to under two days. Deals moving to the next stage within the first month went from about 30 percent to close to 60. And the revenue that had been slipping into the next quarter because the process was too slow started closing in the same quarter it came in.

The numbers were there because we had decided what to track before we started. That is the only reason we could say with any confidence that it worked.

An insurer had been working on bringing AI into their underwriting process. Underwriting involves going through a lot of information for every application, and the teams doing it were stretched. The process was slower than anyone wanted. It was a real problem and the people working on it knew what they were doing.

They had built a pilot, spent months on it, got it working. The results inside the pilot looked promising and leadership had been supportive through most of it. But when they came to us, the pilot had not moved. It had been sitting in the same place for a while and nobody was quite sure what to do with it.

We started by just spending time with them. Going through what had been built, how it worked, what the last year had looked like. And somewhere in those early conversations the cost came up.

Before the pilot, processing one application was costing around fifty cents. After running it through the AI system, even just within the pilot, that number had gone up to around five dollars.

I remember the person who told us this just stopping for a moment after he said it. Not dramatically. Just sitting with it. A year of work and that is where things stood.

So we spent a few days going through the whole pipeline, not to find fault but just to understand what was happening at each step. Where the data was coming from, what was being sent across, how many calls were being made per application, what each of those was costing.

What we found was not one thing that had gone wrong. It was a few things, each of which had made sense at the time, that had added up in a way nobody had seen until someone looked at the full picture.

The policy system was old and it could not send data selectively, so every time the AI needed to process an application, the integration was pulling the entire record. Documents, scanned attachments, old notes, fields that had nothing to do with the underwriting decision, all of it going across every single time with the model working through far more than it needed to.

The team had also used a vendor platform to build the pilot on, which made sense early on because it helped them move faster. But that platform sat in the middle of everything and every call went through it. Every retry, every status check, all of it being billed, and it added up faster than anyone had tracked.

And because the core system could not be touched without going through a formal change process that took months, the team had built workarounds to keep things moving. Each workaround added steps, each step added to the cost, and because the billing was coming from a few different places, the total had not been visible until we put it all in one place.

When we walked them through what we had found, it was a quiet conversation. Not because anyone was shocked, but more because seeing it all together made it clear that this was not something a few fixes would sort out. The approach itself needed to change. We were honest with them about that, and it was not easy to hear after a year of work, but it was the honest answer.

What we suggested was moving away from the vendor platform and working directly with the model, building something that only pulled the fields that actually mattered for the decision. And for the straightforward applications, which turned out to be most of their volume, using a smaller lighter model and keeping the heavier processing for the cases that actually needed it.

I think about that team sometimes. They were not careless. They had made decisions that were reasonable at the time and worked hard through all of it. The situation they ended up in came down to one question that had not been asked early enough — what is this going to cost when we are running it at full scale, not in the pilot, but in production, on every application, every day. And more than that, what does the business look different if this works. Nobody had written that down before they started.

That question got asked eventually. It just got asked at a point where the answer was harder to sit with.

Before the build starts, someone needs to name the business metric that is supposed to move, what it looks like today, and what it should look like after. Not as a formality. Because without it, there is no way to know if the work did anything at all. And by the time someone asks, it is usually a year in, and the answer is harder to sit with than it needed to be.

Would love to hear what this looks like where you are.

DEV Community

Has anyone actually measured ROI on their AI spend?

Top comments (0)