DEV Community

Skila AI
Skila AI

Posted on • Originally published at news.skila.ai

Anthropic 1M Context Window GA: Prices Drop 50%, xAI Admi...

Two things happened in AI this week that, together, tell you everything about where the competitive landscape is heading. On March 13, Anthropic quietly removed the most expensive pricing quirk in frontier AI: the long-context surcharge. Claude Opus 4.6 and Sonnet 4.6 now support a full 1 million token context window at standard pricing — no premium, no special headers, no waiting list. A 900K-token request now costs the same per token as a 9K one. The same week, Elon Musk posted on X that xAI "was not built right the first time around, so is being rebuilt from the foundations up." His admission: xAI had failed to compete with Claude Code and OpenAI's Codex. His solution: poach two senior engineers from Cursor, the AI coding tool that just hit $2 billion in annualized revenue. One company quietly made its best product dramatically more accessible. Another admitted it hadn't built the right thing at all. The contrast couldn't be sharper. ## What Changed: The End of the Long-Context Tax For the past year, building with long-context Claude came with an unspoken tax. Requests exceeding 200K tokens triggered up to a 100% surcharge, doubling or more the effective cost per token at long-context scales. The message was implicit but clear: long context is a premium capability, and you'll pay for it accordingly. That surcharge is gone. As of March 13, 2026, there is no additional charge for long-context requests. The pricing table is blunt in what it implies: Opus 4.6 input pricing dropped from $10 per million tokens to $5. Output dropped from $37.50 to $25. Sonnet 4.6 input went from $6 to $3, output from $22.50 to $15. For any developer who was regularly running requests above 200K tokens, this is a 50% or greater reduction in effective costs overnight. The technical changes are minimal. The anthropic-beta: long-context-2025-01-01 header that developers previously had to include in every long-context request is no longer required — it still works, but you can drop it. Full rate limits now apply at every context length. There is no throttling for long-context requests, no access program, and no waiting list. The availability is broad: Claude Platform natively, Microsoft Azure Foundry, Google Cloud Vertex AI, and Claude Code on Max, Team, and Enterprise tiers. ## The Performance Question A large context window is only valuable if the model can actually use it. This is where Anthropic's announcement spends the most time, and where the numbers get interesting. Opus 4.6 scores 78.3% on MRCR v2 — the Multi-Round Coreference Resolution benchmark — at the full 1 million token scale. Anthropic claims this is the highest among frontier models. Sonnet 4.6 scores 68.4% on GraphWalks BFS at 1M tokens. These are recall-focused benchmarks, and recall is the most important capability for long-context use cases in the real world: legal review, codebase analysis, contract processing, agent trace reconstruction. The question isn't whether the model can generate plausible text at 1M tokens — every frontier model can. The question is


Read the full article on Skila AI

Top comments (0)