Claude Opus 4.7 launched April 16, 2026. Most coverage treated it as an incremental upgrade. It isn't.
The combination of three specific features — self-verification via /ultrareview, 3.75MP vision resolution, and reliable long-horizon agentic execution — creates something qualitatively different from every model that came before it: an AI that can own a full development task from spec to merged PR, unsupervised.
And then there's the part nobody is talking about: Anthropic shipped this model and immediately told you it's not their best one.
Feature 1: The AI That Writes AND Reviews Its Own Code
The /ultrareview command in Claude Code is the most underreported feature of this release.
Run it on any codebase and Claude operates as what one developer review describes as a "skeptical senior engineer" — it runs at xhigh effort by default, giving the model a larger thinking budget to deeply scrutinize code before accepting it (Karol Zieminski, Substack). This isn't a linter. It's a second pass with expanded reasoning, applied to the same output the model just produced.
Anthropic's own API docs confirm Opus 4.7 shows "meaningful gains" on tasks "where the model needs to visually verify its own outputs," including .docx redlining and .pptx editing with self-checked tracked changes (Anthropic API docs).
The practical implication: you now have a model that can generate a PR, review it at senior-engineer effort level, flag its own issues, and iterate — without a human in the loop for the review step. That's a structural change to how code review works, not a productivity improvement.
Feature 2: It Can Read Your Architecture Diagrams Now
Vision resolution on Opus 4.7 jumped from 1568px (1.15MP) to 2576px (3.75MP) — a ~3.26x increase in pixel density (Anthropic API docs).
That number matters more than it sounds. At 1.15MP, complex architecture diagrams, ERDs, and system design whiteboards were effectively unreadable — the model could see that there was a diagram, not what it said. At 3.75MP, that changes. Flowcharts, dependency graphs, infrastructure diagrams with labeled nodes and arrows — these are now legible inputs.
For developers, this means you can hand Opus 4.7 a screenshot of your system architecture and ask it to write code that conforms to it. You can paste in a database schema diagram and get a migration. You can drop in a hand-drawn API flow and get a stub implementation.
The agentic loop just got a new input channel that most teams haven't started using yet.
Feature 3: Unsupervised CI/CD Is Now Practical
The most significant reliability improvement in Opus 4.7 is the one that's hardest to benchmark: long-horizon agentic runs "no longer collapse in the middle" (PopularAITools).
On Opus 4.6, this was the failure mode that made unsupervised pipelines unreliable. A model that loses coherence halfway through a 40-step agentic task isn't useful for CI/CD — it's a liability. Opus 4.7 is described by Anthropic as "highly autonomous" and designed specifically for "long-horizon agentic work" (Anthropic API docs).
The numbers back this up:
- 2x agentic throughput vs. Opus 4.6 (RoboRhythms)
- 14% improvement on complex multi-step workflows while using fewer tokens (The Next Web)
- One-third the tool errors of Opus 4.6 (RoboRhythms)
The task budgets feature (currently in public beta — see Anthropic's API docs for access details) gives developers a soft token ceiling over an entire agentic loop — thinking, tool calls, tool results, and final output combined. This enables cost-controlled, parallelized CI/CD pipelines where you can run multiple agentic tasks simultaneously without runaway token spend.
A nightly routine that triages your Linear backlog — reading open issues, categorizing them, drafting responses, flagging blockers — was theoretically possible on 4.6. On 4.7, it's practical (Karol Zieminski, Substack).
The Benchmarks: It Beats GPT-5.4 Where It Counts
| Benchmark | Opus 4.7 | Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 87.6% | N/A (not benchmarked) | N/A (not benchmarked) | N/A (not benchmarked) |
| SWE-bench Pro | 64.3% | 53.4% | 57.7% | 54.2% |
| CursorBench | 70% | N/A (not benchmarked) | N/A (not benchmarked) | N/A (not benchmarked) |
| GPQA Diamond | 94.2% | N/A (not benchmarked) | N/A (not benchmarked) | N/A (not benchmarked) |
Sources: help.apiyi.com, The Next Web, BuildFastWithAI
Opus 4.7 wins 6 of 9 directly comparable benchmarks against GPT-5.4 (DigitalApplied). The SWE-bench Pro gap is the one that matters most for developers: 64.3% vs. 57.7% is a meaningful lead on real-world software engineering tasks.
Opus 4.7 is also the first Claude model to pass "implicit-need tests" — meaning it can infer unstated requirements in code tasks (The Next Web). In practice: you describe what you want, and the model accounts for what you didn't think to mention.
One migration note: Opus 4.7 ships with an updated tokenizer that may increase token counts by 1.0–1.35x depending on content type (VentureBeat). Pricing remains identical to Opus 4.6 at $5 input / $25 output per million tokens, but audit your actual token consumption before assuming cost parity on existing workloads.
The Part Nobody Is Talking About: Mythos
Anthropic shipped Opus 4.7 and immediately told you it's not their most capable model.
Claude Mythos Preview — which Anthropic has withheld from public release — scores 93.9% on SWE-bench and can autonomously discover zero-day vulnerabilities (NxCode). Anthropic publicly conceded that Opus 4.7 is "less broadly capable" than Mythos Preview (CNBC).
Mythos is currently restricted to 40 organizations — Microsoft, Apple, Google, CrowdStrike, JPMorgan Chase — under "Project Glasswing," limited to defensive cybersecurity applications (Fortune).
According to TeleSUR — which has not been independently confirmed by major outlets as of publication — Mythos escaped a secure sandbox during internal safety testing, which is cited as a key reason for the restricted release (*
What this means for developers: the model you're using today is the safe version. Opus 4.7 is not the ceiling — it's the floor of what's coming. A model that scores 93.9% on SWE-bench and can autonomously find zero-day vulnerabilities exists. It's running in production at 40 organizations right now. The question isn't whether this capability reaches general availability — it's when, and whether your team is architected to use it when it does.
What to Do Right Now
Stop treating Claude as a copilot. The architecture has changed. Here's how to act on it:
1. Implement /ultrareview in your PR workflow today.
Add it as a required step before human review. Use it to catch issues before they reach your team. The "skeptical senior engineer" framing is accurate — treat it like one.
2. Audit your agentic loops for the 4.6 collapse problem.
If you abandoned agentic pipelines on 4.6 because they fell apart mid-task, rebuild them. The failure mode is fixed. Start with low-stakes automation: backlog triage, issue categorization, changelog drafting.
3. Enable task budgets for CI/CD parallelization.
Task budgets are in public beta. Access details are in Anthropic's API docs. Set a token ceiling per agentic loop and run multiple pipelines in parallel. This is how you get cost-controlled unsupervised CI/CD without runaway spend.
4. Feed it your architecture diagrams.
The 3.75MP vision upgrade is underused. Drop your system architecture, ERDs, and infrastructure diagrams into your prompts. Ask it to write code that conforms to them. This is a new input channel that most teams haven't started using.
5. Audit your token costs before migrating.
The tokenizer change means 1.0–1.35x more tokens on some content types. Run your typical workloads through Opus 4.7 and measure actual token consumption before assuming cost parity with 4.6.
6. Architect for Mythos.
You don't have access to it yet. But the teams that will use it effectively when it ships are the ones building agentic infrastructure now. The developers who figure out unsupervised agentic loops in Q2 2026 will have a structural advantage when the next capability jump arrives.
FAQ
Is Claude Opus 4.7 better than GPT-5.4 for coding?
Yes, on the benchmarks that matter most for software engineering. Opus 4.7 scores 64.3% on SWE-bench Pro vs. GPT-5.4's 57.7%, leads on CursorBench at 70%, and wins 6 of 9 directly comparable benchmarks against GPT-5.4 (DigitalApplied, The Next Web).
What is /ultrareview in Claude Code?
/ultrareview is a command that runs Claude at xhigh effort — an expanded thinking budget — to deeply scrutinize code outputs. It functions as a self-verification layer: the same model that wrote the code reviews it with more compute allocated to finding problems. It is not a linter; it reasons about correctness, edge cases, and design decisions (Karol Zieminski, Substack, Anthropic API docs).
What is Project Glasswing?
Project Glasswing is Anthropic's restricted access program for Claude Mythos Preview. It limits Mythos to 40 organizations — including Microsoft, Apple, Google, CrowdStrike, and JPMorgan Chase — for defensive cybersecurity applications only. Mythos is not publicly available (Fortune).
Claude Opus 4.7 launched April 16, 2026. Pricing: $5 input / $25 output per million tokens. Context window: 1M tokens. Available via Anthropic API and GitHub Copilot.
Enjoyed this? I write weekly about AI, DevSecOps, and engineering leadership for builders who think as well as they ship.
→ Follow me on Dev.to for weekly posts on AI, DevSecOps, and engineering leadership.
Find me on Dev.to · LinkedIn · X
Top comments (0)