Jonathan Murray

Posted on May 25

If Microsoft and Uber can't afford AI coding, what chance do the rest of us have?

#ai #productivity #devtools #opensource

DeepSeek cited as a budget alternative

Two stories landed in the same news cycle.

Microsoft cancelled most internal Claude Code licenses. Windows, Surface, Teams, Outlook, all migrating to GitHub Copilot CLI by June 30. Reporting is consistent on the why: usage exploded, the bills got indefensible, and the company that owns Azure and is one of Anthropic's biggest partners decided it was cheaper to migrate thousands of engineers than to keep paying the meter.

Uber's CTO Praveen Neppalli Naga said the company is "back to the drawing board" on AI coding. They burned through their planned 2026 AI budget within months. R&D was $3.4B last year and is still climbing. Engineers were ranked on internal leaderboards for AI tool usage. Claude Code became dominant. Costs went vertical.

Read that twice. Two of the most capitalized, AI-bullish companies on the planet just hit the wall on AI coding cost, and we're still in the first inning.

If they can't make the math work, what happens to the rest of us.

The thing nobody is saying out loud

The current generation of AI coding tools is built on an assumption: more tokens equals better output.

Bigger context windows. Longer reasoning chains. More tool calls per task. The whole industry is in a token-maxing arms race, and the pricing model is perfectly aligned with that race. Every additional token the agent burns is revenue for the model provider. Every re-fetch of the same file, every redundant reasoning loop, every "let me re-read your codebase to remember what we discussed", that's the meter running.

This is the part where I'm supposed to be diplomatic. I'm not going to be.

Claude Code is excellent. Cursor is excellent. Codex is excellent. The engineering is genuinely impressive. But the business model is a parking meter and you are the car. The longer your session, the deeper the agent goes, the more files it touches, the more money the vendor makes. Productivity and cost are positively correlated. That's not a bug. That's the design.

Microsoft figured this out at scale and pulled the plug. Uber figured it out and is rebuilding from scratch. If you're a developer reading this thinking "well, my $200/month plan is fine for now", I have bad news. Your plan is fine because somebody upstream is eating the difference between what you pay and what your usage actually costs. That subsidy ends the moment these companies need real margins. Anthropic is reportedly raising at a $900B valuation. OpenAI just raised again. The investor math doesn't close at "we lose money on every power user forever."

You're not the customer in this model. You're the funnel.

Bigger context is not the answer

The industry's response so far has been to make the context window bigger. 200K. 1M. 2M. Look at all this room.

This is a category error.

A bigger context window doesn't help you, it helps the bill. You're paying to stuff your entire repo into a prompt every turn so the model can "remember" what file structure you have. That's not memory. That's amnesia with a credit card attached.

Real memory, the kind your brain runs on, doesn't reload everything every time you think. It selectively recalls what's relevant. It compresses. It forgets things that don't matter. It builds a model of the world that persists across sessions.

When your coding agent actually remembers your codebase architecture, your conventions, the decision you made last Tuesday, the bug you fixed in auth.ts three weeks ago, the patterns your team prefers, it doesn't need to re-read 400K tokens of context to do the next task. It already knows. The token bill collapses. Quality goes up, not down, because the agent isn't drowning in fresh context every turn.

This is the part of the stack the hyperscalers don't want to build.

Memory is harder than context. Memory is opinionated. Memory requires you to commit to architecture decisions about what to retain, what to compress, what to forget. And critically, memory cuts token revenue. It's a direct conflict of interest for any vendor whose margin depends on you burning tokens.

If you're a vendor making money per token, why would you ever ship the feature that uses fewer tokens.

You wouldn't. And they haven't.

Silicon Valley can afford this. The rest of the world cannot.

A Brazilian developer earning R$15K/month does not have a $200/month Claude Max budget. A two-person Jakarta startup is not dropping $1,500/month per seat on agentic coding. An indie hacker in Lagos is not running a Cursor team plan. The math doesn't work and it isn't going to start working because OpenAI raises another $40B at a higher valuation.

The current AI coding market is a luxury product priced for San Francisco salaries and venture-subsidized burn. That's a real market and the companies serving it should keep serving it. But pretending that's the market is delusional.

There are roughly 30 million developers globally. Maybe 2 million of them work at companies that can sustainably absorb token-metered agentic coding at current prices. The other 28 million need a different solution. Not a worse one. A different one. One whose architecture isn't designed to extract maximum revenue per keystroke.

And let's be honest with each other for a second. The "AI levels the playing field for developers in emerging markets" narrative has been one of the dominant talking points of the last two years. Every keynote. Every blog post. Every "the future of work is global" panel.

How is that going? Right now, with current pricing, the playing field is the most tilted it has ever been. A junior developer in Toronto on a Pro plan has more leverage per dollar than a senior developer in São Paulo on a budget. That's not democratization. That's a new caste system with better marketing.

The "aligned with the driver, not the parking meter" test

I keep coming back to this framing because it keeps being right. The question to ask any AI tool you adopt going forward is whose side the economics are actually on.

If the vendor makes more money when you use it more, you have a parking meter. Your interests and theirs diverge the moment you scale.

If the vendor makes more money when you succeed (you ship faster, retain users, build better), you have a partner. Your interests align.

Most of the AI coding industry right now is parking meters wearing partner costumes. Microsoft just got billed for the parking. Uber too. The smart play for everyone else is to pick tools where the architecture itself, not just the marketing copy, is on your side.

What we're doing about it

We're opening the alpha of our CLI at Backboard. Memory-first. Built for the 28 million developers who are not the target market of the current generation of tools.

I'm not going to pitch you here. I'm telling you we're taking this problem on, and we'd rather have a smaller post and a bigger fight than the other way around.

If the Microsoft and Uber stories landed wrong for you, if you're tired of token bills that look like rent, if you think memory is more interesting than context, come find us.

We're aligned with the driver. Not the parking meter.

backboard.io

Top comments (15)

Mykola Kondratiuk • May 30

the Microsoft case is backwards as a warning for small teams. the cost problem at enterprise scale is procurement overhead and standardization pressure, not token price. individual devs can mix tools per task with zero constraint. completely different economics.

Stephen Dicks • May 27

We need better programming languages, and AI is not the answer. We have spent over 10 years going backwards with JavaScript (and TypeScript) / React / Angular arguments rather than finding better solutions. IMHO Java was going the right way in the mid-1990s (thanks to one visionary) but lost its way. We need better abstractions so that lower-skilled people can write software using plug-and-play lego, its just hardly anyone is actually trying to do that.

Dvorah • May 29

A developer might decide to pay $200 per month if they are getting more than $200 worth of value. I do agree that the economics have to pencil out - but if humans are very expensive (as so many CEOs state confidently) then the balance has to be between total output on the one hand, and human + AI cost on the other.

Still, I'm curious to try your tool.

Ranjan Dailata • May 26 • Edited

Innovation comes with a price. Companies like Deepseek is making it happen with the lowest possible price. We live in a highly competitive world where everything is possible. That said, there is no going back in time with the manual coding. The AI has taken over, and the companies are already seeing the advantage of it. Remember, Innovation is the key thing that can make things happen.

Jonathan Murray • May 26

We've got a strong harness with Deepseek in our CLI, hitting ~70 +/-3% on Terminal Bench 2.1, which is the highest open source score we've seen so far, excited to have you give it a stress test. would you be open to it?

Ranjan Dailata • May 26

Wow, that's really amazing. You can count on me for sure :)

I am happy to run the load/stress testing against your product.

Harjot Singh • May 29

the cost-trajectory part of this is what most ppl skip past. when MSFT + Uber both flinch at the bill, the indie dev story is way uglier. for solos i think the unlock is decoupling from the all-u-can-eat subscription entirely. been building moonshift on that bet: $3 per shipped SaaS instead of a monthly plan, code lands in ur own github + vercel, so unit cost = a function of WHAT u shipped not how much u tinkered. first run completely free if u want to see what the math looks like on a real prompt, no card needed.

xulingfeng • May 26

The cost-per-call framing is the right lens. We ran a similar experiment with DeepSeek — 3.4B tokens for 7 — and the economiood enough for this specific task at this price point.'

Jonathan Murray • May 26

Totaly agree... I mentioned to the comment above that we're seeing Deepseek in our CLI, hitting ~70 +/-3% on Terminal Bench 2.1. Would you be open to it?

xulingfeng • May 27

We actually did try with smaller models (DeepSeek V4 Flash vs Pro) — the Flash is about 20-50x cheaper per token and handles 95

Jonathan Murray • May 27

Love to see it. Making coding more accessible and aligning with devs is our goal, great to see like minded people emerging!

xulingfeng • May 27

Appreciate you sharing that data point — it aligns with what we've seen too. The fixed cost of context assembly often dwarfs the per-call inference cost in practice, especially for multi-step agent tasks. Would be curious how your results look with shorter context windows, say 4-8k vs 32k+.

Dhruv Patil • May 29

I think cost is one part of the problem, but there’s another layer too: even if tokens get cheaper, most people still don’t know how to use these tools well. Bad context + weak review can still produce code that looks clean but creates hidden complexity.

I wrote a related post from the skill/hiring angle: if AI coding becomes normal, what proves someone is actually good at working with AI? Prompt logs? PRs? decision memos? tests? I don’t think we have a strong proof system for that yet.

link

Would be curious how you think about this, does memory first tooling also help reveal better engineering judgment, or is the main win cost/control?

Mike Ritchie • May 27

I’ve done a little bit of research into using Claude Code with local models, it looks like if your machine can handle it you can get quite a substantial savings that way. What’s your take on it?

Jonathan Murray • May 27

certainly, so if you have a strong coding harness in your terminal its definitely the way to go. Our R-CLI can hit ~70 on terminal bench coding benchmark with Deepseek 4, so if you have enough space to host it and you point the R-CLI endpoint to that model, you effectively have zero cost. Thats what we're trying to enable. The cost for the CLI then is just memory reads and writes per, so $0.003

View full discussion (15 comments)