DEV Community

Cover image for I asked Cursor to rename a function. It sent 8,400 tokens. I checked.

I asked Cursor to rename a function. It sent 8,400 tokens. I checked.

GDS K S on May 13, 2026

The afternoon I learned what my AI subscription was actually doing, and the 200 lines that took my next bill down 41 percent. I had been using Cur...
Collapse
 
restofstack profile image
Mary Olowu

The spreadsheet detail makes this measuring it instead of assuming is the whole post.

One thing the 4k–14k variance hints at: a lot of that context is the routing layer compensating for the fact that the model has no durable state to retrieve, so it re-ships ambient buffer state every call just in case.

When the load-bearing context lives in a record the agent can pull deliberately, the “play it safe, send everything” default has less to do.

Doesn’t fix Cursor’s economics for you, but it’s a strong argument for not letting the chat window be the only memory.

Saving the token-audit approach.

Collapse
 
ben profile image
Ben Halpern

Holy wow

Collapse
 
ggle_in profile image
HARD IN SOFT OUT

I laughed because I’ve felt that same silent budget hemorrhage. Thank you for ripping open the token details—it’s a warning everyone using AI coding tools needs. Most devs don’t realize the context re‑sent includes huge chunks of unrelated code. Can we teach such tools to send only the AST of what actually changed? If not, maybe we need a local proxy that slims the payload before it ever hits the network.

Imagine a “token budget” mode hard‑coded into the editor. If the projected context exceeds your cap, it simply refuses the request. That would force more modular coding habits from the start.

Collapse
 
thegdsks profile image
GDS K S

Feels like we need a tiny AI sitting on laptop or pc deciding what context to send and not and managing here itself to improve overall process then bloating context.. .
Cause if you keep cleaning the context then the AI doesn't know whole context and starts hallucinating as well...

Collapse
 
ggle_in profile image
HARD IN SOFT OUT

Engineering Agent Memory check the post. dev.to/kenwalger/engineering-agent...

see I'm comment in there, we are talking exactly the problem you are facing in this blog.

Collapse
 
harjjotsinghh profile image
Harjot Singh

8,400 tokens to rename a function is the perfect tiny example of the whole problem. The model doesn't "rename" - it re-reads a pile of context to be safe, then regenerates, and you pay frontier-model rates for what is mechanically a find-and-replace. The mismatch between task complexity and tokens spent is enormous on exactly this kind of trivial edit.

This is the strongest case for routing: a rename, an import fix, a formatting pass - none of it needs a reasoning model. A cheap model (or honestly an LSP/refactor tool) handles it for a fraction of the tokens. You'd reserve the expensive model for the 5% of asks that genuinely require thinking. Checking the actual token count like you did is how people wake up to this - most never look. Great little investigation.

Collapse
 
max_quimby profile image
Max Quimby

Love that you actually measured this. The 4x-7x overhead over the bare API call lines up with what we see when we sniff Cursor/Cline-style clients — the system prompt + workspace indexing + tool definitions roughly dominate the prompt for any short request, and the marginal cost of your actual edit is in the noise.

One nuance worth surfacing for anyone building their own router: the cost-routing math is the easy part. The hard part is the intent classifier itself. A 200-line classifier that routes ~80% correctly is fantastic when it's right, and corrosive when it's wrong, because the failure mode is silent — a slightly worse answer that the user accepts because they don't know Opus would have caught the bug. We started instrumenting "router regret" by re-running a sampled 5% of routed requests on the next tier up and diffing the outputs. It costs a bit but it's the only way to know whether your savings are real or whether you're just downgrading quality you can't see.

Also: the 41% cost win is a great result, but the unit you actually want to optimize is cost per accepted edit, not cost per call. Cheap calls that get rejected and re-prompted are a worse deal than expensive calls that land first try.

Collapse
 
hr_pulsar profile image
HR Pulsar

This is exactly why I still don’t buy the “full AI engineer replacement” narrative. At some point a human notices something’s off.

And honestly, the lack of observability in a lot of commercial AI tooling is weird. You start a process, wait forever, your laptop fans enter takeoff mode, and 20 minutes later you get either nonsense or a beautifully formatted disaster.

Feels like these tools desperately need basic monitoring primitives:
token spikes, loop detection, some tipical alerts, intermediate reasoning snapshots, kill switches mid-run.

At the same time, I genuinely believe AI agents are the next real platform shift. The value is obvious already. But right now they’re still more like copilots with a weirdly broad skill matrix — not Terminators with independent judgment and a stable personality.

Right now a lot of AI workflows feel less like pair programming and more like sending a junior dev into the basement and hoping they come back with the right file 🥴

Collapse
 
theuniverseson profile image
Andrii Krugliak

Ran four agents in parallel last week and one of them ate roughly 60% of the budget on a single rename refactor. Same surprise. The instrumentation gap is the actual problem - you don't know which agent burnt the tokens until 3 hours after the fact, by which point the diff is already merged. Did you find a way to flag token-disproportionate edits in real time, or only post-hoc?

Collapse
 
johns23424234324234 profile image
John

The “cost per accepted edit” point in the comments feels like the missing metric here. Raw token spend is useful, but the real alarm should be “this tiny rename used 8k tokens and produced one accepted diff.” That would make waste obvious without forcing devs to inspect every request manually.

Collapse
 
johns23424234324234 profile image
John

This matches what I keep seeing too: the expensive part is not the visible prompt, it is the safe default context bundle. The practical fix is less “use a cheaper model” and more “make context selection observable,” even if it is just logging files included, estimated tokens, and the reason they were pulled in.

Collapse
 
prasenjeetsymon profile image
Prasenjeet Kumar

If you are really concern about saving tokens and optimising context, you can check out Ogcode - In my experience it is saving me lots of tokens per session. It is on github

Collapse
 
apipeek-dev profile image
API Peek

AI killing money more than a developer. 🤫

Collapse
 
hr_pulsar profile image
HR Pulsar

It depends a lot on your developers 😂