DEV Community

Sangmin Lee
Sangmin Lee

Posted on • Originally published at claudeguide.io

Claude 1M Context Window: What It Can Do and What It Costs

Originally published at claudeguide.io/claude-1m-context-window

Claude 1M Context Window: What It Can Do and What It Costs

Claude Opus 4.7 and Claude Sonnet 4.6 support a 1 million token context window — roughly 750,000 words, or the equivalent of 10 average novels. This guide explains what that actually means for your use case, what it costs, and when the extended context is worth it. For guidance on picking the right model tier, see Haiku vs Sonnet vs Opus: Which Model?.

What 1M tokens looks like in practice

Content type Fits in 1M tokens
Words (English prose) ~750,000 words
Pages (standard 250 words/page) ~3,000 pages
Code (Python, ~100 tokens/KB) ~10 MB of source code
GitHub repo (median size) ~3-5 repos in full
Legal documents ~500 standard contracts
Emails ~5,000 average emails
Slack messages ~20,000 messages
PDF pages (no images) ~2,500 pages

Practical upper bound: 1M tokens is the technical limit. In practice, Anthropic recommends staying under 800K for reliable output quality. The model's attention degrades at the very edges of a very long context.

Pricing for extended context

Standard context (0-200K tokens) is billed at the normal rate. Beyond 200K, the per-token rate doubles.

Model 0-200K input 200K-1M input Output
Sonnet 4.6 $3.00/1M $6.00/1M $15.00/1M
Opus 4.7 $5.00/1M $10.00/1M $25.00/1M

Real cost example — 800K token request on Opus:

  • First 200K: 200,000 tokens × $5/1M = $1.00
  • Remaining 600K: 600,000 tokens × $10/1M = $6.00
  • Total input: $7.00 per request
  • Plus output: if the response is 2,000 tokens → $0.05
  • Single request total: ~$7.05

At 100 requests/month: $705/month on input alone. This is the context where selective context matters enormously.

When 1M context is worth it

1. Whole-codebase analysis

When you need Claude to reason across an entire codebase — not just find a file, but understand how components interact — you need the whole thing in context at once.

Use cases:

  • Security audit: finding vulnerability chains across modules
  • Architecture review: identifying circular dependencies, anti-patterns
  • Refactoring plan: understanding all callers before changing a shared function
  • Onboarding doc generation: summarizing the entire codebase for new hires

Alternative to consider first: Claude Code's built-in file navigation (Read, Glob, Grep) lets it explore code without putting everything in context. For 80% of coding tasks, targeted file reading is faster and cheaper.

2. Multi-document synthesis

Legal due diligence, medical record review, financial document analysis, research literature synthesis — tasks where the answer depends on relationships across hundreds of documents.

Use cases:

  • Summarizing 200 earnings calls to find recurring themes
  • Finding discrepancies across 50 supplier contracts
  • Synthesizing 100 research papers into a literature review
  • Analyzing a complete audit trail (logs, tickets, emails) for an incident investigation

3. Long conversation history

Agents that run for many turns can use the full history as context for decision-making. A research agent that has made 50 tool calls, read 30 documents, and produced intermediate results can load the entire history for a final synthesis step.

4. Large structured data

When you need Claude to reason over a large dataset — a 100K-row export in CSV form is ~500K tokens — and the reasoning requires seeing all the data rather than a sample. (Note: for data analysis at scale, a database + targeted query is almost always better than loading raw data into context.)

When NOT to use 1M context

1. You don't actually need it

The most common misuse is sending the full codebase when the task only requires 2-3 files. Use targeted file reads first. Save the full-context approach for tasks where the answer genuinely requires reading everything.

Test: can you find the relevant files with Grep/Glob and read just those? If yes, do that.

2. Speed matters

1M token requests have measurably higher latency. Time to first token is longer. If you need a fast response for a user-facing workflow, consider whether you can reduce the context or use a retrieval step.

3. The cost doesn't justify the use case

At $7+ per request, 1M context requests are expensive. For a use case running 1,000 times/month, that is $7,000+ in input alone. The quality premium must be real and measurable.

4. The task is repetitive over sub-documents

If you are summarizing 1,000 individual documents and do not need cross-document reasoning, process them one at a time (or in batches via Batch API). You do not need 1M context to summarize a single 5-page contract.

How to use the 1M context window

Via the API

1M context requires requesting access via the Anthropic Console for some accounts. Once enabled, you use it by simply sending a larger messages array — no special flag required.

import anthropic

client = anthropic.Anthropic()

# Read all your documents
with open("large_document.txt") as f:
    document = f.read()

response = client.messages.create(
    model="claude-opus-4-7",  # or claude-sonnet-4-6
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": f"Analyze this document and find all clauses that could represent liability:\n\n{document}"
        }
    ]
)
print(response.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Checking your context usage

The response object includes usage.input_tokens. Check this to know exactly what you sent:

response = client.messages.create(...)
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
Enter fullscreen mode Exit fullscreen mode

Combining with prompt caching

For repeated analysis over the same large document (e.g., answering multiple questions about the same contract), use prompt caching to avoid re-billing the input tokens on each call. See the Claude Prompt Caching Guide for a full breakdown of cache pricing and implementation:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": large_document_text,
            "cache_control": {"type": "ephemeral"}  # Cache the document
        }
    ],
    messages=[{"role": "user", "content": "What are the termination clauses?"}]
)

# Second call reuses cached document — 90% cheaper on the input
response2 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": large_document_text,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "What are the payment terms?"}]
)
Enter fullscreen mode Exit fullscreen mode

With a 700K-token document on Sonnet 4.6:

  • Without caching: $3/call for first 200K + $6/call for remaining 500K = $4.80 per question
  • With caching (after first write): $0.30/1M on cached tokens = $0.21 for 700K tokens per question
  • Savings: 96% on repeated queries over the same document

What Claude actually does with a million tokens

This is the question that matters most for deciding whether to use it.

What works well:

  • Finding specific information anywhere in the context ("does this contract mention force majeure?")
  • Cross-referencing across documents ("does the pricing in the email match the contract?")
  • Summarizing the whole into a structured output
  • Finding patterns that only emerge from seeing many instances

What degrades at very long context:

  • Precise recall of specific facts from the middle of a 1M token context (the "lost in the middle" problem — performance is best at the beginning and end)
  • Maintaining a single coherent thread over very long outputs
  • Complex multi-step reasoning when the relevant context is scattered across the full 1M

Mitigation: structure your context so the most important information appears at the beginning and end of the messages array. If you have critical instructions or key documents, place them first.


FAQ

Is 1M context available on Haiku?
No. Haiku 4.5 supports up to 200K tokens. Only Sonnet 4.6 and Opus 4.7 support 1M context.

Does context length affect output quality?
For tasks within the first 200K tokens of context, quality is equivalent to shorter contexts. For very long contexts, attention degrades slightly in the middle. Plan your context layout accordingly.

Can I use 1M context with the Batch API?
Yes. Batch API supports up to 1M context. Pricing is 50% off standard rates, so extended context on Batch API: Sonnet at $3.00/1M for extended tokens (vs. $6.00 standard).

How do I estimate whether I need 1M context?
Count your actual tokens with the countTokens endpoint before building. Many tasks that seem to require full context can be handled with targeted retrieval. Build the retrieval version first; upgrade to full context only if quality is insufficient.

What is the maximum output token length?
Independent of input context length: 8,192 tokens for most models, 16,000 for Opus 4.7. Input context affects what the model knows, not how much it can generate.

Sources

  1. Anthropic models documentation — April 2026
  2. Claude API pricing — April 2026
  3. Long context best practices — April 2026

Frequently Asked Questions

How much does a 1M token request cost on Claude?

On Claude Opus 4.7, a single 800K-token request costs approximately $7.05 in input alone: the first 200K tokens at $5/1M = $1.00, and the remaining 600K at $10/1M = $6.00, plus output. On Sonnet 4.6, the same request costs about $4.80. Use prompt caching on repeated queries over the same document to reduce costs by up to 96%.

Which Claude models support the 1M context window?

Only Claude Sonnet 4.6 and Claude Opus 4.7 support 1M token context. Claude Haiku 4.5 is limited to 200K tokens. The 1M context mode may require enabling via the Anthropic Console for some accounts.

What are the best use cases for Claude's 1M context window?

The highest-value use cases are whole-codebase security audits and architecture reviews, multi-document synthesis (e.g., 200 contracts, 100 research papers), long agent conversation histories requiring full-context synthesis, and large structured data reasoning. Avoid using 1M context when targeted file reads via Grep/Glob can answer the question — it is 4–14x more expensive than standard context.

Does the "lost in the middle" problem affect Claude's 1M context window?

Yes. Performance is strongest at the beginning and end of the context and degrades slightly in the middle for very long inputs. For critical instructions or key documents, place them at the start of your messages array. Anthropic recommends staying under 800K tokens for reliable output quality even when the technical limit is 1M.


Take It Further

Claude API Cost Optimization Masterclass — The practical guide to cutting Claude API costs by 60–90% in production. Model tiering, prompt caching, Batch API, and token compression — with real numbers from 12 optimization scenarios.

PDF guide + Excel cost calculator.

→ Get Cost Optimization Masterclass — $59

30-day money-back guarantee. Instant download.

Top comments (0)