Gabriel Anhaia

Posted on Apr 16

Claude Opus 4.7 Just Dropped. I Tested It. Here's What Changed.

#ai #programming #productivity #news

My Books: The Complete Guide to Go Programming | Hexagonal Architecture in Go
My Tool: Hermes IDE — free, open-source AI shell wrapper for zsh/bash/fish
Me: xGabriel.com | GitHub

Anthropic just shipped Claude Opus 4.7. Not a minor patch. Not a "we improved vibes" update. The benchmarks are genuinely wild — 98.5% visual acuity (up from 54.5%), 3x resolution on image inputs, 21% fewer document reasoning errors, and it resolves coding tasks that neither Opus 4.6 nor Sonnet 4.6 could crack. I've been using it since this morning in real projects, and this is the first model update in a while that made me stop and rethink my workflow.

If you're building anything with Claude — agentic systems, coding assistants, document processing — you need to know what actually changed and what it means for your stack.

What Actually Shipped
The Vision Upgrade Is Not Incremental
Coding: The Numbers vs. Reality
The New xhigh Effort Level
Instruction Following Got Aggressive
What Broke (Or Might Break)
Claude Code Integration
Pricing and Availability
Should You Switch Today?

What Actually Shipped

Let me give you the headline numbers because they tell a story:

Metric	Opus 4.6	Opus 4.7	Change
Visual acuity	54.5%	98.5%	+81%
Image resolution	~1.25 MP	~3.75 MP	3x
Document reasoning errors	baseline	-21%	significant
Complex multi-step workflows	baseline	+14%
Tool call accuracy	baseline	+10-15%
Internal coding benchmark (93 tasks)	baseline	+13%

Same pricing. $5/M input tokens, $25/M output tokens. Unchanged.

That last part matters. Anthropic could've bumped the price. They didn't. The cost-per-capability ratio just got a lot better.

One caveat: input tokens now cost 1.0-1.35x more per request due to tokenizer changes. So your actual bill might tick up slightly even though the per-token price is the same. More on that later.

The Vision Upgrade Is Not Incremental

This is the one that got me. 54.5% to 98.5% visual acuity. That's not an improvement, that's a different capability entirely.

Previous Claude models maxed out at roughly 1.25 megapixels. Opus 4.7 processes images up to 2,576 pixels on the long edge — about 3.75 megapixels. In practice, this means it can actually read dense screenshots, technical diagrams, and chemical structures without hallucinating the small text.

I threw a few things at it:

import anthropic

client = anthropic.Anthropic()

# Reading a dense terminal screenshot
with open("terminal_screenshot.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-opus-4-7-20260416",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Read every line of this terminal output. List any errors or warnings with their exact line numbers."
                }
            ],
        }
    ],
)

With Opus 4.6, I'd get maybe 70% of the lines right on a busy terminal. Small font sizes were a coin flip. Opus 4.7 nailed it — every line, correct exit codes, correct timestamps. Even the dimmed gray text in my zsh prompt.

If you're building computer-use agents or any kind of screen-reading automation, this is the update you've been waiting for. The previous visual limitation was a real blocker for production use cases. It's not anymore.

Coding: The Numbers vs. Reality

13% improvement on their internal 93-task benchmark. Sounds modest until you read the fine print: "resolves tasks that neither Opus 4.6 nor Sonnet 4.6 could solve."

That's the interesting part. It's not just doing the same things faster — it's solving problems the previous models couldn't touch. Anthropic also claims 3x more production task resolution on engineering benchmarks.

What I noticed in practice:

Cleaner output. Opus 4.6 had this habit of wrapping everything in helper functions and abstractions you didn't ask for. Opus 4.7 produces noticeably cleaner code. Fewer wrapper functions, less over-engineering. If you ask for a function, you get a function, not a class hierarchy.

Better error recovery. When an agentic workflow hits a wall — wrong file path, unexpected API response, missing dependency — Opus 4.7 is better at self-correcting without spiraling. I watched it recover from a wrong assumption about a database schema, backtrack, re-read the migration files, and fix itself. Opus 4.6 would've committed to the wrong path harder.

More creative problem-solving. This one's harder to benchmark but I felt it. When I gave it an open-ended refactoring task on a Go codebase (rewrite this HTTP handler layer to use generics properly), the approach it took was genuinely clever. Not just correct — the kind of solution I'd see from a senior engineer who'd thought about the problem for 20 minutes.

// What Opus 4.6 typically generated - works, but verbose
func GetUser(db *sql.DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        id := chi.URLParam(r, "id")
        var user User
        err := db.QueryRow("SELECT id, name, email FROM users WHERE id = $1", id).
            Scan(&user.ID, &user.Name, &user.Email)
        if err != nil {
            http.Error(w, "not found", http.StatusNotFound)
            return
        }
        json.NewEncoder(w).Encode(user)
    }
}

// What Opus 4.7 suggested - generic handler with proper error types
func ResourceHandler[T any](fetch func(ctx context.Context, id string) (T, error)) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        id := chi.URLParam(r, "id")
        resource, err := fetch(r.Context(), id)
        if err != nil {
            // checks if the error implements our StatusError interface
            // instead of assuming 404 for everything
            writeError(w, err)
            return
        }
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(resource)
    }
}

// Usage becomes dead simple
mux.Get("/users/{id}", ResourceHandler(userStore.GetByID))
mux.Get("/posts/{id}", ResourceHandler(postStore.GetByID))

That generic handler pattern is exactly what I'd write by hand. The error handling with interface checking instead of blanket 404s? That's the detail that separates "technically correct" from "production-ready."

The New xhigh Effort Level

There's a new effort parameter value: xhigh. Sits between high and max.

response = client.messages.create(
    model="claude-opus-4-7-20260416",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 8192,
        "effort": "xhigh"  # new level
    },
    messages=[{"role": "user", "content": "..."}],
)

Anthropic recommends it as the starting point for coding and agentic use cases. The idea: high sometimes doesn't think hard enough for complex tasks, but max burns tokens on problems that don't need that much reasoning. xhigh is the sweet spot.

In my testing, xhigh versus high made a noticeable difference on multi-file refactoring tasks. On simple "write me a function" prompts, the difference was negligible. So use it where it matters and don't waste tokens where it doesn't.

There's also a new task budgets feature in public beta — you can guide how the model allocates tokens across different parts of a complex task. Useful if you've got an agent that burns 80% of its budget on the easy part and runs out of gas on the hard part.

Instruction Following Got Aggressive

This is the one that'll bite you if you're not ready.

Opus 4.7 takes instructions more literally than any previous Claude model. Anthropic's own words: "substantially better adherence" and "takes instructions more literally than predecessors." They even recommend retuning existing prompts.

I'll say it plainly: if your prompts have sloppy instructions that Opus 4.6 gracefully ignored or interpreted charitably, Opus 4.7 will follow them to the letter. And you might not like the result.

Example: I had a system prompt that said "always respond in JSON format." With Opus 4.6, it would still give me a natural language preamble before the JSON when it felt the user needed context. Opus 4.7? Pure JSON. Every time. No exceptions. Even when a clarifying question would've been more helpful.

The fix: Be precise about what you actually want. If you mean "respond in JSON format unless the user's question requires clarification," say that. The model won't guess your intent anymore — it'll do what you told it.

This is actually a good thing for production systems. Predictability over cleverness. But you'll need to audit your prompts.

What Broke (Or Might Break)

Real talk about migration:

Token counts shifted. The tokenizer update means the same input text now produces 1.0-1.35x more tokens. Your rate limits, budget calculations, and context window management all need rechecking. If you're running close to the context limit, you might start getting truncated responses.

Prompt behavior changed. Because of the stricter instruction following, prompts that relied on Claude "reading between the lines" will behave differently. Not wrong, just different. Test before you deploy.

Existing eval suites may need updating. If your evals check for specific output patterns, the cleaner code generation and more literal instruction following will change outputs. Your pass rates might temporarily drop even though the actual quality improved.

None of these are showstoppers. But if you deploy to production without testing, you'll find out the hard way. I've been there — shipped a model update on a Friday once. Don't be me.

Claude Code Integration

If you use Claude Code (the CLI), Opus 4.7 is already live. You're probably reading this post from a session that's running it.

The big new addition: /ultrareview. It's a dedicated code review command that runs a deep analysis of your changes — not just linting, but actual design review. Think "what would a careful senior engineer catch in a PR review?" Anthropic says it identifies bugs and design issues at that level.

Pro and Max subscribers get three free ultrareviews. After that, it burns your regular token budget, but honestly, three deep reviews per billing cycle is pretty generous for catching the kind of bugs that slip past CI.

Auto mode also got expanded to Max users, so you can run longer agentic sessions with fewer permission interruptions.

Pricing and Availability

	Price
Input	$5 / million tokens
Output	$25 / million tokens

No change from Opus 4.6. Available today on:

Claude.ai (web and desktop)
Anthropic API (claude-opus-4-7-20260416)
Amazon Bedrock
Google Cloud Vertex AI
Microsoft Foundry

The model ID for API calls is claude-opus-4-7-20260416. Update your configs.

Should You Switch Today?

My take: yes, with caveats.

If you're building agentic systems — switch immediately. The error recovery improvements alone are worth it. Tool call accuracy up 10-15% means fewer failed loops and less wasted compute.

If you're doing anything with vision — switch yesterday. The jump from 54.5% to 98.5% visual acuity transforms what's possible. Screen reading, document extraction, diagram understanding — all of it actually works now.

If you're running production prompts — test first, then switch. The stricter instruction following is better long-term but will break prompts that were relying on the model's charitable interpretation. Budget a day for prompt auditing.

If you're cost-sensitive — watch your token counts. Same per-token price, but the tokenizer changes mean you might be sending more tokens than before. Monitor your first week of usage.

The safety profile is solid — similar to Opus 4.6 with improvements in honesty and prompt injection resistance. Anthropic is being cautious with cybersecurity capabilities (deliberately reduced from their Mythos Preview model), which is probably the right call.

One thing I appreciate: they shipped a genuinely better model at the same price. No "Opus 4.7 Pro Max Ultra" tier. No paywall on the core improvements. The upgrade path is just... update your model string.

What's your first impression? I'm especially curious if anyone else is seeing the stricter instruction following in their existing prompts. Drop a comment or hit me up.

If you're working with AI in the terminal, check out Hermes IDE — it's the free, open-source shell wrapper I built that layers AI completions, git management, and multi-project sessions on top of your existing shell. Works with Claude, Gemini, Aider, Codex, and Copilot. It already supports Opus 4.7 via the Anthropic API.

For more of my writing, xGabriel.com.

Top comments (0)

The discussion has been locked. New comments can't be added.

DEV Community

Claude Opus 4.7 Just Dropped. I Tested It. Here's What Changed.

Table of Contents

What Actually Shipped

The Vision Upgrade Is Not Incremental

Coding: The Numbers vs. Reality

The New xhigh Effort Level

Instruction Following Got Aggressive

What Broke (Or Might Break)

Claude Code Integration

Pricing and Availability

Should You Switch Today?

Top comments (0)