Claude Sonnet 4.5 — What’s New and What Still Limits It (2025)

#webdev #programming #ai #anthropic

TL;DR

Stronger agentic coding & computer use: Sonnet 4.5 leads on real‑world benchmarks like OSWorld (61.4%) and is state‑of‑the‑art on SWE‑bench Verified. Anthropic observed 30+ hours of sustained autonomous work in internal runs.
Platform upgrades: code checkpoints in Claude Code, a native VS Code extension, context editing + memory in the API, Agent SDK, and code execution + file creation in Claude apps.
Pricing: unchanged vs Sonnet 4 — $3/M input tokens and $15/M output tokens.
Limits: a 5‑hour rolling session limit (resets every 5h); usage is shared across Claude & Claude Code; Anthropic may apply weekly caps. Default context window is 200K across plans.
Gotchas: safety gating under ASL‑3 may sometimes block benign content; community reports note short default timeouts for long‑running terminal commands in Claude Code (workarounds exist, your mileage may vary).

Best‑in‑class for coding & agents: Sonnet 4.5 emphasizes real software work (multi‑file edits, long‑horizon planning, tool use).
Benchmarks:
- OSWorld (computer use): 61.4% (vs ~42% for Sonnet 4 earlier in the year).
- SWE‑bench Verified (software engineering): SOTA; Anthropic reports 77.2% in a 200K‑context configuration, with a 1M‑context configuration also tested.
Long‑run autonomy: Observed maintaining focus for 30+ hours on complex, multi‑step tasks.
Developer‑facing features:
- Claude Code adds checkpoints (jump back to any saved state) and a refreshed terminal UI.
- VS Code extension for a more native coding experience.
- Context editing + memory to let agents run longer with better state management.
- Claude Agent SDK (infrastructure Anthropic uses for Claude Code) for building your own production‑grade agents.
- In the Claude apps: code execution and file creation (spreadsheets, slides, docs) inside chat.

Model ID: claude-sonnet-4-5 (API). Also available via Amazon Bedrock.
Price: $3 per million input tokens and $15 per million output tokens (same as Sonnet 4).

1) Usage limits & resets

Session limit resets every 5 hours (rolling window).
Usage is shared across Claude (chat) and Claude Code on the same account.
Anthropic may apply weekly/monthly caps to ensure fair access (varies by plan and period).

2) Context window

200K tokens across consumer plans (Pro / Max).
Anthropic tested a 1M‑context configuration in research/benchmarks, but the standard user experience remains 200K.

3) Long‑running commands in Claude Code

Community reports note short default timeouts (≈2 minutes) for shell commands in the integrated terminal. Some users report config‑based workarounds; effectiveness varies by setup.

4) Safety & gating under ASL‑3

Sonnet 4.5 ships under AI Safety Level 3. More protective classifiers sometimes over‑flag benign content (Anthropic says this has improved vs prior releases).

Chunk long tasks (builds/tests) into steps that complete quickly; avoid starting dev servers inside Claude Code.
Keep context lean: summarize, prune attachments, and use the new context editing to maintain a small working set.
Use the API for big jobs where you need precise budgeting or batch processing; keep the chat session for guidance & reviews.
Watch your usage meter (session + any weekly caps) and plan sprints around the 5‑hour reset.

Teams that rely on agentic coding (multi‑file edits, refactors, eval‑driven loops).
Workflows that benefit from computer use/navigation (e.g., spreadsheet manipulation, web‑based ops).
Users who hit capability limits with earlier Sonnet models and need longer‑horizon planning with better reliability.