I passed all 12 Anthropic Academy certifications. Then I looked at my wrong answers.
This is part 1 of a series where I break down the things I thought I knew about Claude's API — but didn't. Starting with the one that was silently costing me money.
The 1,024-Token Minimum Nobody Told Me About
I'd been adding cache_control to everything. Short system prompts. Small tool definitions. Felt like free optimization.
Wrong. Prompt caching requires at least 1,024 tokens in the cached block. Anything shorter gets silently ignored. No error. No warning. Just no caching.
The quiz question: "You have a 500-token prompt with a cache breakpoint. What happens?"
My answer: it gets cached. The real answer: nothing. 500 tokens is too short.
What makes this insidious: there's no feedback. Your API calls work fine. You just don't get the 90% cost reduction you thought you were getting. I was paying full price for prompts I assumed were cached.
The minimum exists because caching has overhead — storing, indexing, and retrieving cached content. Below 1,024 tokens, the overhead exceeds the savings. It makes sense once you know it. But nothing in the API response tells you your cache breakpoint was ignored.
Where You Put the Breakpoint Matters More Than Whether You Use One
I knew about cache breakpoints. I did not know where to put them.
Here's what I was doing:
{
"system": [
{ "type": "text", "text": "You are a helpful assistant...",
"cache_control": { "type": "ephemeral" }
}
],
"tools": [
{ "name": "tool_1", ... },
{ "name": "tool_15", ... }
]
}
The breakpoint is on the system prompt. So the system prompt gets cached. But my 15 tool definitions? Re-processed from scratch on every single API call.
The correct placement: on the last tool definition.
{
"system": [
{ "type": "text", "text": "You are a helpful assistant..." }
],
"tools": [
{ "name": "tool_1", ... },
{ "name": "tool_15", ...,
"cache_control": { "type": "ephemeral" }
}
]
}
Why this works: internally, Claude processes everything in this order — tools first, then system prompt, then messages. A cache breakpoint says "cache everything up to and including this point." Putting it on the last tool means: all tools + system prompt = one cached prefix block.
You can have up to 4 breakpoints. A common pattern:
- Last tool definition (caches tools)
- End of system prompt (caches tools + system)
- A point in conversation history (caches the above + earlier messages)
The key insight from the course video: your follow-up request must have identical content up to the breakpoint. Change one character in your system prompt? Cache miss for everything. That's why stable content (tools, system prompts) should be cached, and volatile content (recent messages) should be after the last breakpoint.
One more gotcha: you have to use the long-form content syntax ({"type": "text", "text": "..."}) to attach cache_control. The shorthand ("content": "just a string") has nowhere to put it.
What This Actually Costs
Cache hits cost 10% of normal input pricing. That's a 90% reduction. Cache writes cost 125% — a 25% premium on the first request. The cache persists for 5 minutes by default (refreshed on each use). You can extend it to 1 hour with "ttl": "1h" in the cache_control object, at additional cost.
For a typical autonomous session with 15 tools and a system prompt totaling ~3,000 tokens: caching saves roughly $0.003 per request. Over a 100-request conversation, that's $0.30. Scale that across multiple sessions per day, and it adds up.
I was leaving that on the table for months.
Next in the series: Extended Thinking returns two blocks, not one — and why that matters for your parser.
Anthropic Academy is free: anthropic.skilljar.com
Related:
🛠 Free: claude-code-hooks — 11 production hooks, open source.
Running Claude Code autonomously? Claude Code Ops Kit ($19) — 11 hooks + 6 templates + 3 tools. Production-ready in 15 minutes.
Have you run into caching gotchas with the Claude API? What did you learn the hard way?
Top comments (0)