No Generative AI was used in writing this article. If you enjoyed reading it, please let me know. It helps me dedicate more time to things that resonate with my readers.
It took Uber 4 months to burn the entire AI budget for 2026. The cost if AI, the ROI and the question of AI investment have entered the picture.
Uber’s COO, Andrew MacDonald, said in his recent interview that the costs became “harder to justify” because the link between spending on AI tokens and creating more useful features “was not there”.
It started with all you can eat subscriptions.
Since the early days of tech, we’ve been conditioned to expect all you can eat subscriptions. For a month, you could watch Netflix non-stop, for 24 hours, 7 days a week and you’d still pay the same price as if you watched only one.
That’s a great deal!
In fact, such a good deal, that we see it everywhere. On Netflix, on Amazon Prime, pretty much all the software.
The magic of zero marginal cost (in digital goods)
In traditional businesses, selling one more unit of physical product costs the company an extra unit of resources.
Each flower bouquet of 5 roses sold requires 10 roses. Selling 100 bouquets will require 100 bouquets x 5 roses = 500 roses. This is called marginal cost.
But that’s not the case in the digital world!
Once Netflix pays the massive upfront cost to produce Stranger Things ($30 million an episode), it costs them exactly the same amount of money whether 10 people watch it or 100 million people watch it. As long as they get enough subscriber to cover the initial investment, every new subscriber after that is 100% pure profit. This is a zero marginal cost.
Large Language Models are more like a flower shop
The cost of generating tokens requires significant amount of energy. This makes token generation expensive.
This week news cycle brought us a story about an anonymous company accidentally burning through $500M of tokens.
This very nature of LLMs requires that all-you-can-eat subscriptions have to be subsidized.
Why is it a good thing?
Ultimately, it’s about the unit economics.
We went through a very similar pain with a shift to cloud computing. The solution was to not abandon AWS completely due to spiraling bills.
Instead we built dashboards, tracked usage, and an entire discipline was born: FinOps.
When the costs are real, the value has to be real too.
Where would you start?
I'll leave you with the FinOps for AI framework from Narev.
Top comments (1)
FinOps for AI is going to be a real discipline for the same reason cloud FinOps became one: a usage-based bill with no visibility quietly balloons until someone panics, and AI tokens are the new untracked cloud spend. The parallel is exact, easy to start, hard to attribute, and the waste hides in defaults nobody revisits. The AI-specific twist is that the cost drivers are different from compute: it's over-provisioned models (frontier prices for work a cheap model handles), context bloat (paying to send tokens the task didn't need), and retry waste (paying twice for one correct answer because nothing verified the first). So AI FinOps isn't just dashboards, it's the engineering levers: per-call attribution so cost has an owner, routing so each task uses the cheapest model that clears the bar, context discipline, and caching. Visibility first (you can't cut what you can't see), then the structural optimizations the visibility points you to. The cultural part matters too, like cloud FinOps, it only works when the people writing the prompts can see what their choices cost. Measure, attribute, then optimize the biggest line. That treat-AI-spend-as-an-engineering-discipline instinct is core to how I think about cost in Moonshift. As FinOps-for-AI matures, do you see attribution (per-team/feature cost) or routing/optimization becoming the bigger lever first?