Hiroshi Toyama

Posted on May 2

Cursor Composer 2: The Cache Economy Behind a 10x Cheaper Coding Agent

#cursor #ai #productivity #codingtools

Cursor's Composer 2 shipped in March 2026 as the centerpiece of the Cursor 2.0 overhaul. The headline numbers—$0.50/1M input tokens, outperforming frontier models on SWE-bench Multilingual—look like marketing. The cache read mechanism is where the real story is.

Why a Specialized Model at All

Prior Cursor versions proxied Claude or GPT-4. Composer 2 is trained exclusively on coding data via continued pre-training and reinforcement learning. The obvious question is: what's cut?

Everything that isn't code. Composer 2 has no meaningful capability for poetry, history, ethics debates, or anything outside software development. That constraint lets Anysphere run a model that:

Understands intra-repo dependency graphs (if you fix A, B also needs updating)
Navigates hundreds of files in a single long-horizon task
Runs natively in sandboxed terminals and a built-in browser loop
Costs a fraction of what a general-purpose frontier model costs to serve

The pricing reflects this. As of May 2026:

Model	Input (1M tokens)	Output (1M tokens)
Composer 2 Standard	$0.50	$2.50
Composer 2 Fast	$1.50	$7.50
Claude 4.6 Opus	$5.00	$25.00
GPT-5.4	$2.50	$15.00

Standard vs Fast: Same Weights, Different Queue

Anysphere's own language is unambiguous: "Same intelligence." The two variants share identical model weights and parameters. Fast gets priority queue on high-end GPUs (H800/B200 class); Standard runs on lower-priority compute with higher latency tolerance.

This is a deliberate architectural choice. Inference cost scales with compute priority, not model capability. If you can tolerate a 10–30 second response delay, you get the same output for 1/3 the price.

The practical split that Cursor power users have settled on:

Interactive sessions (Fast): You're watching the output in real time. Latency kills flow.
Fire-and-forget tasks (Standard): Refactor 100 test files, generate JSDoc across the repo, migrate an entire API surface. Start it, close the laptop, come back to results.

The Cache Read Economy

This is the mechanism that makes Standard compelling for large codebases.

Every request to Composer 2 sends context: directory structure, recently opened files, conversation history. On the second, fifth, tenth turn of the same session, the majority of that context is identical to what was already sent. That's the cache.

Cache read rates as of May 2026:

Tier	New input	Cache read
Standard	$0.50/1M	$0.20/1M
Fast	$1.50/1M	$0.35/1M

By turn 5 of a non-trivial session, 80%+ of your input tokens are cache reads, not fresh input. Standard's cache read rate ($0.20) is 43% cheaper than Fast's ($0.35), and 60% cheaper than Standard's own new input rate.

Concrete impact: A refactoring session with 10 back-and-forth turns on a large codebase might consume 10M tokens. With Standard and healthy cache hits, that lands around $1.50–$2.00. The same session on Fast: $4.00–$5.00. On Claude 4.6 Opus: potentially $20+.

The Cache Bug (March–April 2026)

The cache story has a footnote worth documenting.

From late March through early April 2026, a backend bug caused Composer 2 Standard to emit cache read counts of zero—every request treated as fresh input at $0.50/1M even when the context was identical to the previous turn. Users reported credit burn rates 10x higher than expected. The irony: switching to Fast (which costs 3x more per token) actually resulted in lower total cost because cache was functioning there.

Cursor's team (Dean and Mohit on the forum thread) acknowledged the bug and pushed a fix around April 7. As of v2.1.116+, the behavior appears stable.

The diagnostic check: open cursor.com/settings → Usage. If Cache Read tokens are consistently below 40% on a multi-turn session against the same codebase, something is wrong. Expected range is 40–90% depending on how varied your requests are.

If you hit zero cache read consistently, copy the Request ID from the chat header and contact support. Cursor has been issuing credit refunds for the overbilling period.

Comparing with Claude Code's Cache

Claude Code (Anthropic's CLI tool) has its own prompt caching via cache_control markers, but with a key structural difference: TTL.

Setting	Write cost	Read cost	TTL
Default	1.25× input	~10% of input	5 minutes
`ENABLE_PROMPT_CACHING_1H=1`	2.0× input	~10% of input	1 hour

The 5-minute default is brutal for any session where you read documentation, test code, or think between turns. The 1-hour option (available since Claude Code v2.1.108) adds to the write cost but eliminates repeated cache misses across the kind of natural pauses that happen in real work.

To enable it:

# ~/.zshrc or ~/.bashrc
export ENABLE_PROMPT_CACHING_1H=1

Verify with usage output during a session—look for ephemeral_1h_input_tokens in the log. If you only see ephemeral_5m_, the variable isn't being picked up.

Note: there were also TTL-related bugs in this period that forced resets to 5-minute behavior. Keep Claude Code at the latest version.

My Usage Data

I exported my own Cursor usage history and analyzed it. Here's what a month looks like across models (442 requests):

Model	Requests	Avg cost/request	Cache read ratio
Composer 2 Standard	73	$0.19	88.3%
Composer 2 Fast	25	$0.32	78.1%
Claude 4.6 Sonnet	212	$0.37	84.7%
Claude 4.6 Opus	93	$0.90	79.5%

The 88.3% cache read ratio on Standard is the headline. For an average request consuming ~390K tokens, 88% of those are cache reads at $0.20/1M rather than fresh input at $0.50/1M. Without that cache hit rate, the average cost per request would be ~$0.40 instead of $0.19.

The top Opus requests peaked at $4.25/request (3.9M total tokens, 3.8M of which were cache reads). Even with excellent cache ratios, Opus's higher base rates mean the same cache-heavy session costs 4–5× more than Composer 2 Standard.

The Actual Decision

Composer 2 is not "Claude but cheap." It's a purpose-built agent runtime that has traded general intelligence for deep coding capability and cost efficiency at the infrastructure level. The Standard/Fast split exists because long-horizon agentic tasks don't need millisecond response times—and charging for that latency premium on 10-turn refactoring sessions is wasteful.

The model choice that makes sense given this:

Default to Standard for any multi-file task where you'll have more than 3–4 turns
Switch to Fast for interactive chat where you're watching output incrementally
Use frontier models (Opus, Claude 4.7) only when Composer 2 hits a genuine capability ceiling—complex algorithmic reasoning, architecture decisions that span non-code domains

The cache makes Standard not just "slower Fast," but a qualitatively different operational mode: background processing with cost amortized over a long context window that grows cheaper the more you reuse it.

Top comments (1)

Jill Mercer • May 3

cursor has more features than i know what to do with — honestly didn't realize how much the cache economy was carrying my workflow. i usually just vibe code until it works and then deal with the bills later. seeing the logic behind the standard/fast split is a relief for someone shipping 11 apps solo. really helpful to see the infrastructure side of the tool i live in every day.