DEV Community

Cover image for The Meter Was Always Running

The Meter Was Always Running

Daniel Nwaneri on February 20, 2026

On the All-In podcast (episode #261), Jason Calacanis revealed his AI agents cost $300 a day. Each. At 10-20% capacity. Chamath Palihapitiya is no...
Collapse
 
ben profile image
Ben Halpern

That's not a technology question. It's a judgment question.

That applies to most decisions like this.

Very well written overall!

Collapse
 
dannwaneri profile image
Daniel Nwaneri

Appreciate that and you're right, most are.
what makes this one feel different is the speed. Bad technology decisions used to have lag. months before the debt showed up.

Collapse
 
harsh2644 profile image
Harsh

This is exactly the "cloud cost wake-up call" all over again — except compressed into 18 months instead of a decade.

With cloud, we had years to optimize. With AI agents, teams are getting $300/day invoices before they even know what "token budget" means.

The real shift coming in 2026: every PR will include estimated token cost alongside test coverage.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The cloud parallel is right but the compression is what changes everything.
cloud gave teams years to build cost culture. FinOps, tagging, reserved instances, the whole apparatus. teams had time to fail slowly and learn.

$300/day invoices arriving before you've even defined what an agent session should cost means the reckoning is happening before the infrastructure to handle it exists.

The PR token cost idea is interesting. curious who owns that number in your mental model. The dev who wrote the prompt, the team that approved the architecture, or the infra budget that pays the bill?

Collapse
 
matthewhou profile image
Matthew Hou

The cloud cost parallel is useful but I think there's a difference people are missing: with cloud, the cost scaled with users. More traffic meant more spend, but also more revenue to justify it. With AI agents, the cost scales with complexity of the task, not usage. A single internal workflow with zero users can burn through tokens faster than a user-facing feature.

That changes the calculus. You can't just slap a usage-based pricing model on it and call it a day. The teams that figure out how to cap agent reasoning depth without killing usefulness are going to have a real edge.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

the scaling axis distinction is the thing I didn't make explicit and should have.
cloud cost and revenue moved together. more spend meant more users meant more justification. agent cost is decoupled from value delivery entirely. an internal workflow nobody uses can run the meter as hard as your highest-traffic feature. That breaks the FinOps playbook because you can't tag your way to insight when the cost driver is reasoning depth, not request volume.
The reasoning depth cap question is the one I don't have a clean answer to. curious whether you think that's solvable at the prompt layer, the architecture layer, or whether it has to come from the providers token budgets baked into the API itself rather than enforced by the team calling it.

Collapse
 
matthewhou profile image
Matthew Hou

The decoupling of cost from value delivery is the key insight. With traditional infrastructure, spend correlates with usage which correlates with revenue. With agents, a poorly designed workflow can burn tokens on reasoning that produces nothing useful — and your monitoring dashboard won't flag it because the requests "succeeded." The reasoning depth cap question is hard because it's task-dependent. A one-size-fits-all token budget doesn't work. I've been experimenting with per-task budgets based on expected complexity, but it's still more art than science.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri • Edited

"succeeded" is doing a lot of work there. The request completed. The output was useless. Monitoring can't tell the difference.
Per-task complexity budgets is the right architecture.flat token limits kill legitimate deep reasoning on hard problems. The art-vs-science problem is real though until you have enough production data to calibrate expected complexity per task type, you're guessing.
Has the per-task approach caught any silent burns in practice?

Collapse
 
johnumarattil profile image
Johnu Marattil

This line hit hard: "Human engineers are expensive upfront and cheaper over time. Agents are cheap upfront and expensive over time." I run a SaaS solo and use AI heavily across my workflow - content, code, outreach. But the moment I stopped reviewing what the AI produced, quality tanked, and costs crept up. The AI handles volume. The judgment of when to ship, when to rewrite, and when to scrap entirely - that's still the expensive part. Great piece.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

"When to ship, when to rewrite, when to scrap entirely" is the clearest description of the judgment layer I've seen in these comments.
Those three decisions look simple from the outside. they're not. they require knowing your users, your technical debt, your tolerance for risk, and what "done" actually means for this specific thing.

The AI handles volume precisely because it doesn't have to know any of that.
The moment you stopped reviewing is the moment you delegated those three decisions without realizing it. quality tanked because judgment got outsourced along with execution.

Collapse
 
chris_wakefieldchris_4 profile image
Chris Wakefield (Chris)

As a software engineer myself, this is really insightful when everyone around is talking about how we're all about to lose our jobs to AI. I've noticed my own role shifting more to that of architecture, building out the solution proposals as oppose to writing code. Interesting take, thanks.

Collapse
 
dannwaneri profile image
Daniel Nwaneri • Edited

solution proposals as opposed to writing code

is the clearest description of the shift I've seen in the comments.
The role didn't shrink. The center of gravity moved. the judgment about what to build and how to structure it was always the valuable part. it's just more visible now that the execution layer is cheaper.
The engineers who feel most displaced are usually the ones whose contribution was always the proposal layer but never got credit for it because it was bundled with the coding work.

Collapse
 
matthewhou profile image
Matthew Hou

This reframing of error handling as an architectural choice hit home. I had a side project last year with a chain of API calls, each wrapped in its own try/catch. One of them failed silently — caught the error, logged it, returned undefined — and everything downstream kept running on bad data. Took way longer than it should have to figure out what was wrong.

Switched to a single top-level boundary that just halts everything on an unexpected throw. Less "safe" looking code, but the system fails honestly now instead of quietly producing garbage.