Claude's prompt caching cuts token costs by 90% on repeated context. Anthropic/2024/Prompt Caching documentation shows the drop from $15 to $1.50 per million tokens when your prefix is already warm. This is not a volume discount. It is a structural price break for anyone running RAG pipelines or multi-turn agents.
The mechanism is the cache_control parameter. You tag your static prefix, system prompt, or retrieved document block on the first request. Anthropic writes that block to cache. Every follow-up call that reuses the same prefix pays the cached
Top comments (0)