Agdex AI

Posted on Jun 23 • Originally published at agdex.ai

Why Your Claude Code & Cursor API Bills Explode, and How to Cut Them by 70%

#ai #cursor #programming #claude

Last month, I switched my team's development workflow entirely to agentic coding tools—specifically Claude Code and Cursor's Agent mode. The productivity boost was immediate. Tasks that used to take three hours were getting done in fifteen minutes.

But two weeks later, I checked our AWS Bedrock and Anthropic API consoles.

Our bill had spiked to over $1,200. One dev had managed to run up a $90 bill in a single afternoon.

If you've been using these tools, you've probably felt this anxiety. You're hesitant to run them because you don't know if a task will cost $0.05 or $15.00.

After spending a week diving into our API call logs and debugging the prefix cache, I mapped out the exact math of why these bills explode—and built a workflow that cut our token consumption by over 70% without hurting output quality.

Here is the engineering breakdown of what is happening under the hood.

The O(N²) Context Tax

Most devs assume AI costs scale linearly: you send a prompt, you pay for the tokens, you get a response.

Agentic systems like Claude Code or Cursor Agent mode do not work this way. They operate on a quadratic cost model. Because these tools need to maintain state, every single turn (every new message) re-sends the entire conversation history, including system prompts and tool definitions.

If each turn adds ~500 new tokens of code/discussion to the history, and your system prompt + config is 2,000 tokens:

Turn 1: 2,000 (System) + 500 = 2,500 input tokens
Turn 10: 2,000 + 5,000 = 7,000 input tokens
Turn 30: 2,000 + 15,000 = 17,000 input tokens
Turn 50: 2,000 + 25,000 = 27,000 input tokens

By Turn 50, a single simple prompt like "fix that typo" costs you 27,000 input tokens. Across a 50-turn session, the cumulative input consumption is 737,500 tokens.

On Claude 3.5 Sonnet ($3/million input tokens, $15/million output tokens), a single 50-turn session costs you $13.27. Run 15 of these sessions a day, and you're looking at $200/day.

Here is how to stop the bleeding.

1. Structure for Prefix Cache Hits (The 90% Discount)

Anthropic supports prompt caching, which charges only 1/10th of the normal input token price for cache hits ($0.30/MTok instead of $3.00/MTok).

However, Claude's prompt cache is prefix-based. This means the cache matches from the very first token down. The moment a single character changes early in the prompt, the entire cache downstream is invalidated.

To make the most of this:

Keep your CLAUDE.md file static. Every time you tweak CLAUDE.md during a session, you invalidate the cache for all subsequent turns. Write your rules once, and leave them alone.
Put dynamic content at the bottom. Ensure the system prompt, tool definitions, and large library documentations are loaded first (top of the context), and your specific file edits and queries are appended at the very end. (Fortunately, Claude Code handles this ordering automatically, but if you write custom scripts or use Cursor, keep this layout in mind).

2. The Checkpointing Pattern (Externalizing State)

Instead of keeping a long, multi-turn conversation active in your terminal, move the state to your local files. I call this checkpointing.

When a task gets long (past 15-20 turns):

Ask the agent: "Write the current implementation plan to plan.md and the status of files to status.json."
Run the /clear command to wipe the conversation history.
Start a fresh session: "Read plan.md and status.json. Continue from step 4."

This simple loop wipes out the accumulated O(N²) history, dropping your input token cost back to the baseline while keeping the agent fully informed.

3. Track and Limit Session Spend Locally

Never run an agent in an open-ended loop without constraints.

First, use ccusage, a fantastic open-source CLI tool to monitor your local API logs offline. It shows your daily, weekly, and per-session costs across Claude Code, Copilot, and other tools.

# Run ccusage to check daily spend
bunx ccusage claude daily

Second, when launching autonomous loops, enforce boundaries in your prompt:

"Fix the failing tests in src/auth/tests. Stop after fixing them or after 8 iterations, whichever comes first."

DEV Community

Why Your Claude Code & Cursor API Bills Explode, and How to Cut Them by 70%

The O(N²) Context Tax

1. Structure for Prefix Cache Hits (The 90% Discount)

2. The Checkpointing Pattern (Externalizing State)

3. Track and Limit Session Spend Locally

Further Reading

Top comments (0)