DEV Community

A0mineTV
A0mineTV

Posted on

Claude Sonnet 4.5 — What’s New and What Still Limits It (2025)

TL;DR

  • Stronger agentic coding & computer use: Sonnet 4.5 leads on real‑world benchmarks like OSWorld (61.4%) and is state‑of‑the‑art on SWE‑bench Verified. Anthropic observed 30+ hours of sustained autonomous work in internal runs.
  • Platform upgrades: code checkpoints in Claude Code, a native VS Code extension, context editing + memory in the API, Agent SDK, and code execution + file creation in Claude apps.
  • Pricing: unchanged vs Sonnet 4 — $3/M input tokens and $15/M output tokens.
  • Limits: a 5‑hour rolling session limit (resets every 5h); usage is shared across Claude & Claude Code; Anthropic may apply weekly caps. Default context window is 200K across plans.
  • Gotchas: safety gating under ASL‑3 may sometimes block benign content; community reports note short default timeouts for long‑running terminal commands in Claude Code (workarounds exist, your mileage may vary).

What’s new in Sonnet 4.5

  • Best‑in‑class for coding & agents: Sonnet 4.5 emphasizes real software work (multi‑file edits, long‑horizon planning, tool use).
  • Benchmarks:
    • OSWorld (computer use): 61.4% (vs ~42% for Sonnet 4 earlier in the year).
    • SWE‑bench Verified (software engineering): SOTA; Anthropic reports 77.2% in a 200K‑context configuration, with a 1M‑context configuration also tested.
  • Long‑run autonomy: Observed maintaining focus for 30+ hours on complex, multi‑step tasks.
  • Developer‑facing features:
    • Claude Code adds checkpoints (jump back to any saved state) and a refreshed terminal UI.
    • VS Code extension for a more native coding experience.
    • Context editing + memory to let agents run longer with better state management.
    • Claude Agent SDK (infrastructure Anthropic uses for Claude Code) for building your own production‑grade agents.
    • In the Claude apps: code execution and file creation (spreadsheets, slides, docs) inside chat.

Pricing & availability

  • Model ID: claude-sonnet-4-5 (API). Also available via Amazon Bedrock.
  • Price: $3 per million input tokens and $15 per million output tokens (same as Sonnet 4).

Practical limitations (the things you’ll feel)

1) Usage limits & resets

  • Session limit resets every 5 hours (rolling window).
  • Usage is shared across Claude (chat) and Claude Code on the same account.
  • Anthropic may apply weekly/monthly caps to ensure fair access (varies by plan and period).

2) Context window

  • 200K tokens across consumer plans (Pro / Max).
  • Anthropic tested a 1M‑context configuration in research/benchmarks, but the standard user experience remains 200K.

3) Long‑running commands in Claude Code

  • Community reports note short default timeouts (≈2 minutes) for shell commands in the integrated terminal. Some users report config‑based workarounds; effectiveness varies by setup.

4) Safety & gating under ASL‑3

  • Sonnet 4.5 ships under AI Safety Level 3. More protective classifiers sometimes over‑flag benign content (Anthropic says this has improved vs prior releases).

Tips to work within the limits

  • Chunk long tasks (builds/tests) into steps that complete quickly; avoid starting dev servers inside Claude Code.
  • Keep context lean: summarize, prune attachments, and use the new context editing to maintain a small working set.
  • Use the API for big jobs where you need precise budgeting or batch processing; keep the chat session for guidance & reviews.
  • Watch your usage meter (session + any weekly caps) and plan sprints around the 5‑hour reset.

Who should upgrade

  • Teams that rely on agentic coding (multi‑file edits, refactors, eval‑driven loops).
  • Workflows that benefit from computer use/navigation (e.g., spreadsheet manipulation, web‑based ops).
  • Users who hit capability limits with earlier Sonnet models and need longer‑horizon planning with better reliability.

References


Thanks for reading

Thanks for reading! If this breakdown helped, consider sharing it with a teammate.

Top comments (1)

Collapse
 
techsplot profile image
Ayomide olofinsawe

thanks man