Michael Tuszynski

Posted on Apr 22 • Originally published at mpt.solutions

Opus 4.7 Isn't Slower. Your Prompts Are.

#ai #claude #anthropic #llms

Opus 4.7 Isn't Slower. Your Prompts Are.

Since Anthropic released Claude Opus 4.7 on April 16, 2026, a consistent complaint has rippled through developer communities: it feels slower. Threads tell roughly the same story — the prompts that worked perfectly on Opus 4.6 now take noticeably longer to return, sometimes pause for what feels like forever before any output appears, and in a few cases produce shorter or more literal answers than expected.

Here's what's actually happening: Opus 4.7 isn't slower. Your prompts are wrong for it.

The headline feature of 4.7 is adaptive thinking — and it changes the contract between you and the model in ways that invalidate most of the prompt scaffolding we built up over the last eighteen months. If you're still writing prompts the way you did for Sonnet 4.5 or Opus 4.6, you're actively working against the model.

Let's break down what actually changed, why it looks like a latency regression, and how to rebuild your prompting skills to turn 4.7 into an advantage rather than a frustration.

What Adaptive Thinking Actually Is

Previous reasoning models gave you a dial: thinking.budget_tokens: 32000, say, and the model would use that budget on every request regardless of whether you asked it to sort a list or design a distributed system. You paid for thinking you didn't need on easy tasks, and you hit the ceiling on hard ones.

Adaptive thinking replaces that dial with a decision. In 4.7, the model reads your prompt, estimates its complexity, and dynamically allocates reasoning tokens — heavy for hard problems, light or none for trivial ones. Per Anthropic's docs: "Claude evaluates the complexity of each request and determines whether and how much to use extended thinking."

Three things about this matter for anyone shipping production code against the API:

First, it's the only thinking-on mode on 4.7. The old thinking: {type: "enabled", budget_tokens: N} config now returns a 400 error. If you had fixed budgets wired into your orchestration layer, they're dead. You have to migrate to thinking: {type: "adaptive"} plus an effort parameter (low, medium, high, xhigh, or max).

Second, adaptive thinking is OFF by default. Quoting the release notes directly: "Requests with no thinking field run without thinking." If you just swapped the model ID from claude-opus-4-6 to claude-opus-4-7 and changed nothing else, you downgraded your own pipeline — you're now running a more capable model with its reasoning disabled.

Third, thinking display defaults to "omitted". This is the silent change that's breaking the most UX. On Opus 4.6, display defaulted to "summarized" — you got streaming thinking text that gave users visible progress. On 4.7, display defaults to "omitted" — the thinking tokens are generated and billed, but the content is empty. Anthropic calls this out explicitly: "If your product streams reasoning to users, the new default will appear as a long pause before output begins."

That "long pause" is where 90% of the "4.7 is slower" complaints come from. The model is thinking exactly as fast as before — you just can't see it anymore.

Why "Slower" Is Mostly a Perception Problem

Let's put actual numbers on this. Anthropic reports 4.7 achieves a 13% lift over Opus 4.6 on their 93-task coding benchmark. That performance improvement comes primarily from the model reasoning more thoroughly on hard problems. More reasoning means more tokens. More tokens mean more wall-clock time.

But this is the critical distinction: wall-clock time per request is not the same as throughput for your business. If 4.7 takes 40% longer on a coding task but resolves it in one pass instead of three iterations, your engineer's effective throughput just doubled. The bottleneck in most AI-assisted engineering workflows isn't model latency — it's the feedback loop of "model returns something wrong, engineer corrects, model tries again."

Three practical sources contribute to the perceived slowdown:

Omitted thinking display — no visible progress, users assume it's hung
New tokenizer uses up to 1.35x more tokens on the same content, per the Opus 4.7 release notes
Adaptive allocation correctly decides to think harder on your harder prompts — the opposite of what the old fixed-budget model did

None of these are performance regressions. They're the visible artifacts of a model doing exactly what you should want it to do.

The Prompting Contract Has Changed

The deeper issue is that adaptive thinking only works well if your prompts let it work well. Most of the prompt-engineering reflexes we developed over the last two years were workarounds for limitations that no longer exist — and those workarounds now actively degrade 4.7.

Here's what changed in the model's behavior, per Anthropic's own migration guide:

More literal instruction following, particularly at lower effort levels. The model will not silently generalize an instruction from one item to another, and will not infer requests you didn't make.
Response length calibrates to perceived task complexity rather than defaulting to a fixed verbosity.
Fewer tool calls by default, using reasoning more. Raising effort increases tool usage.
Fewer subagents spawned by default.
More direct, opinionated tone with less validation-forward phrasing.

Read those carefully. Every one of them is a rejection of a specific prompt-engineering hack. Let me translate:

The old Claude would generalize your example into an implied rule. 4.7 won't. If you showed 4.6 one formatted JSON response and asked for five more "like this," it would infer the pattern. 4.7 wants you to state the pattern.

The old Claude padded responses with "Here's a full breakdown…" preambles. 4.7 skips them. If you built downstream parsers expecting that filler, they may now miss.

The old Claude eagerly spawned subagents and made tool calls to demonstrate thoroughness. 4.7 prefers to reason it out. Your orchestration that counts tool invocations as a quality signal is now measuring the wrong thing.

Five Prompt Patterns That Now Backfire

From the field, these are the most common habits that worked on 4.6 and earlier but actively hurt on 4.7:

1. "Think carefully and step-by-step before responding."

This phrase was load-bearing on models without native reasoning. On 4.7 with adaptive thinking enabled, it's either redundant (the model already decided to think) or it overrides the adaptive decision on prompts where thinking wasn't warranted. Remove it unless you have measured evidence it helps your specific workload.

2. "Double-check your work before returning." / "Verify each assertion."

4.7 does this natively as part of its reasoning. Anthropic explicitly recommends auditing your prompts for this scaffolding: "If existing prompts have mitigations in these areas (e.g. 'double-check the slide layout before returning'), try removing that scaffolding and re-baselining." The scaffolding doesn't just waste tokens — at lower effort levels, literal instruction-following means the model may spend compute re-verifying things it was already confident about.

3. Leaving the goal implicit.

On 4.6, Summarize this article got you a reasonable summary because the model inferred you probably wanted something useful. On 4.7, you get a literal summary — often competent, often generic. The fix is to lead with outcome: Summarize this article for a CTO deciding whether to adopt this approach. Five bullets, each with one concrete implication for architecture. You're not being verbose — you're giving adaptive thinking enough signal to allocate reasoning correctly.

4. temperature=0 for determinism.

This now returns a 400 error. temperature, top_p, and top_k are all rejected on 4.7. If you had deterministic pipelines relying on temperature zero, they're broken. Migrate to prompt-based steering — strict rubrics and explicit format contracts do more for consistency than sampling parameters ever did, and they survive model upgrades.

5. Streaming thinking to users without opting in.

As covered above: set display: "summarized" explicitly if you want users to see progress. Otherwise, they watch a spinner for 8 seconds and assume your app is broken.

How to Prompt for Advantage

Here's what actually works on 4.7, based on the official Opus 4.7 prompting guidance and the patterns emerging from early adopters:

Define outcome, not just task. State what success looks like before describing what you want done. Adaptive thinking uses this to calibrate how deeply to reason.

Stack your constraints explicitly. Audience, format, length, tone, exclusions — all in the prompt. 4.7 won't guess, so gaps get filled literally or poorly.

Use effort as a deliberate lever. Start with xhigh for coding and agentic tasks. Use high as a minimum for anything intelligence-sensitive. Drop to medium or low only when you have measured evidence the task is simple enough — and accept that at those levels the model will refuse to generalize beyond exactly what you asked for.

Experiment with task budgets for agentic loops. The new task budgets beta (header task-budgets-2026-03-13) gives the model an advisory token budget for a full agentic loop, with a visible countdown. This is genuinely new tooling for anyone building long-running agents — the model self-prioritizes and finishes gracefully rather than running to max_tokens.

Strip your old scaffolding and re-baseline. Take your best-performing 4.6 prompts, remove every "think carefully," "double-check," "be thorough," and "make sure you…" phrase, and run your evals. For most workloads, scores go up.

If you stream, opt back in to visible reasoning:

thinking = {"type": "adaptive", "display": "summarized"}

The Strategic Read

The pattern here is consistent with where Anthropic has been steering developers for the last year: the model is smart enough to make its own decisions about effort, tool use, and verification — your job is to give it clear goals, not to micro-manage its process.

Engineers who built their prompting skills around making weaker models behave better are discovering that those skills transfer poorly. The scaffolding that extracted good answers from Sonnet 4.5 actively suppresses the intelligence that Opus 4.7 wants to deploy.

This is the more interesting story about 4.7 — not the benchmark lift, not the high-res image support, not even adaptive thinking itself. It's that the frontier has moved from "how do I coax useful output from the model?" to "how do I get out of the model's way?"

For engineering teams still evaluating whether to upgrade, the real question isn't whether 4.7 is better than 4.6 — the benchmarks answer that. The real question is whether your prompt library is a legacy liability or a current asset. If it was written to work around limitations that no longer exist, upgrading the model without rewriting the prompts will feel exactly like what the complaints describe: slower, less helpful, more literal.

Do the audit. Strip the scaffolding. Opt in to adaptive thinking. Give the model the outcome, not the procedure.

You didn't buy a slower model. You bought one that's smart enough to make the old tricks obsolete.

DEV Community

Opus 4.7 Isn't Slower. Your Prompts Are.

Opus 4.7 Isn't Slower. Your Prompts Are.

What Adaptive Thinking Actually Is

Why "Slower" Is Mostly a Perception Problem

The Prompting Contract Has Changed

Five Prompt Patterns That Now Backfire

How to Prompt for Advantage

The Strategic Read

Top comments (0)