DEV Community

Cover image for Opus 4.7 Feels Like It Broke Your Prompts. It Didn't. You Just Missed the 2 Rewrites That Matter.
Phil Rentier Digital
Phil Rentier Digital

Posted on • Originally published at rentierdigital.xyz

Opus 4.7 Feels Like It Broke Your Prompts. It Didn't. You Just Missed the 2 Rewrites That Matter.

Every time a new LLM drops, I read the changelog. (I'm the kind of guy who reads the manual for a new weedwhacker. 🤓) That's how I caught something off with the Opus 4.7 panic.

TLDR. If you just upgraded to 4.7, two patterns in your current prompts are actually going to break. I'll show you what I fixed in my prod, and how to adapt it to your setup. Everything else you're being told about "audit all your prompts" is noise pulled from the same four bullet points.

The Anthropic doc says one line nobody is relaying: prompts written for 4.6 run very well on 4.7 out of the box. The same doc also says, explicitly, where they don't. At two specific places. Not sixteen. Not eight. I'm going to name them, explain why 4.7 triggers them, and show you how to patch your CLAUDE.md and system prompts in two lines.

I Read Manuals. That's Why I Knew the Tweet Was Wrong.

April 16th. Opus 4.7 drops. Within two hours, my feed is full of the usual: "Everything changed", "You need to audit all your prompts", "The new model is a different beast". By hour three, the migration guides start landing. A flood of them that week. All copying the same four bullet points from the Anthropic doc, rephrased for the house voice.

While this was happening, I was on the doc site. Not on Twitter. (Admit it, you checked Twitter first too.)

Three pages matter: the migration guide, the "what's new" page, and the prompting best practices page. Read them in that order. The first tells you what changed in behavior. The second gives you the real changelog, tokenizer included. The third hides the line that undos 90% of the panic. One sentence, buried in prompting best practices, that basically says 4.7 performs well out of the box on existing 4.6 prompts.

This is not "some prompts are fine". This is not "you might not need to rewrite if you're lucky". This is the manufacturer of the model telling you the default assumption is: your old prompts work. The migration guides you're reading are written from the opposite assumption. And the gap between those two positions is where all the bad advice lives.

So the question becomes the right one. Not "what do I rewrite?". The right one is: where exactly does 4.7 stop behaving like 4.6, and why? The doc answers this. In two places.

The Mechanism: 4.6 Was Covering for You

4.6 was not a dumb text generator. It was a cooperative one. When you wrote a prompt that was technically incomplete, 4.6 would silently fill the gaps for you. Two kinds of gaps, specifically.

First kind: implicit tool calls. You wrote "check the project structure before suggesting changes" in your CLAUDE.md. 4.6 would slip a Glob or Read call in before answering, even though your instruction didn't technically demand it. The word "check" could mean "actually invoke a tool" or "consider mentally". 4.6 picked the first reading. Most of the time.

Second kind: implicit scope and exceptions. You wrote "warn the customer about restocking delays once per conversation". 4.6 understood "once, but reopen if a genuinely new order appears in the session". You wrote "never use exclamation marks". 4.6 applied it to generated prose, but not to code blocks or quoted review content, because stripping punctuation from verbatim reviews would be absurd. Who edits a five-star customer review? 4.6 inferred the boundary on its own.

4.7 does neither. It reads your instruction literally. If you didn't say "you MUST call the tool", it reads "check" as "consider". If you wrote "once per conversation" with no exception list, it applies "once" and moves on, new order or not.

Anthropic documents this in the migration guide without sugar-coating: the model won't quietly extend a rule from one case to a similar one, and it won't guess at requests you didn't actually make. That is the change. That is the entire change. The rest (faster reasoning, fewer default tool calls, more direct tone) all flows from this one posture shift.

Which means your prompts didn't break. They were always incomplete. 4.6 was patching the holes on the fly, and 4.7 stops patching.

Other write-ups corroborate the mechanism from different angles. One deep-dive notes that weak prompts papered over by 4.6 now surface as visible gaps. Same conclusion, different framing. The useful part is that there are exactly two places where the gaps matter enough to rewrite.

Rewrite #1: Tool Calls That Were Suggestions

Go open your CLAUDE.md right now. I'll wait.

Scan it for verbs that sound actionable but aren't binding. "Check". "Consider". "Look at". "Review". Every one of those is a suggestion from 4.7's point of view. If the action is load-bearing, meaning the next step actually depends on the tool output, you just introduced a silent failure mode.

Concrete example from my own CLAUDE.md, cleaned up so it reads generically. I had a rule that said "before suggesting code changes, check the current project structure". On 4.6, this produced a Glob call every time. Claude would list files, see what was there, then propose changes grounded in the real state. On 4.7, same rule, same prompt, no Glob. Claude would propose changes from memory. Sometimes right. Sometimes completely out of sync with the repo, because the session context was stale and the verb "check" didn't force a tool call.

Not a bug. Literalism. "Check" can mean "invoke a tool" or "think about it". 4.7 defaults to "think about it". The fix takes two lines.

Replace the soft instruction with a hard one. Instead of:

Before suggesting changes, check the current project structure.
Enter fullscreen mode Exit fullscreen mode

Write:

Before any code suggestion, you MUST call Glob to list the current project structure. 
Do not rely on prior session context for file structure.
Enter fullscreen mode Exit fullscreen mode

Three differences. "MUST" instead of soft verb. Named tool instead of generic action. Explicit negation of the fallback behavior, which closes the loophole.

This applies everywhere a tool call is load-bearing. Your agent hits a partner API before deciding? Name the API call. Your support bot reads the daily distributor CSV feed? Name the file read. Your WooCommerce assistant queries user records before answering? Name the database tool. Every time the data has to come from a live call, the call must be non-optional in the text of the prompt, not in the head of the person who wrote it.

Caveat worth distributing here, not at the end: this fix only matters when the tool call is load-bearing. If you don't actually care whether Claude invokes the tool or guesses from context, leave it alone. Some checks are genuinely optional. Make that a conscious choice, not a blind one.

Broader principle: tool called beats reasoning inferred. If you want to go deeper on this, I wrote why CLIs beat MCP for load-bearing tool calls a few months back. The argument holds harder under 4.7.

Two lines. First fix done.

Rewrite #2: The One Everyone Gets Exactly Backwards

Here's a concrete break from my prod, last week.

I run a product description generator. Old rule, shipping clean for months: "never use exclamation marks". On 4.6, this applied to the generated prose, and obviously not to the quoted customer reviews embedded below each product. Because who edits a five-star review? The model knew. The boundary was implicit but real.

Upgrade to 4.7. Same rule. Next batch of product pages, my quoted review blocks start losing exclamation marks. Customer quotes that originally said "Best deal this summer!!!" now read "Best deal this summer". Flattened. Which is a small quality problem for product pages, and a real problem if you're running verbatim reviews for legal or trust reasons. The source no longer matches the quote.

Not the model being "worse". The model being exactly what I asked for. I wrote "never use exclamation marks". 4.7 read that as "never, anywhere, period". Fine in intent, broken in output.

Fix: write the scope.

Never use exclamation marks anywhere in the generated product prose. 
This rule does not apply inside quoted customer reviews pulled 
verbatim from external sources.
Enter fullscreen mode Exit fullscreen mode

Four lines instead of one. Zero ambiguity. The NEVER got stronger, not weaker. What I did was stop asking the model to draw the boundary for me.

Same pipeline. Different rule. This one was more expensive before I caught it.

"Confirm the shipping policy once per conversation." On 4.6, the model understood "once under normal flow, but reconfirm if a new order gets introduced, or if the customer shifts from domestic to international shipping mid-session". The implicit exception was always there. Never written. Always inferred.

On 4.7: once. Period. Fifteen messages later, customer brings up a completely new order to a different country. No reconfirmation of the policy. Because the instruction said "once" and the model did "once".

Fix: name the exceptions.

Confirm the shipping policy once per conversation under normal flow. 
Reconfirm whenever: (a) a new order is introduced, (b) the 
destination changes (domestic to international or reverse), 
(c) the customer explicitly asks about shipping terms again.
Enter fullscreen mode Exit fullscreen mode

Now the scope is explicit and the exceptions are listed. No inference required.

And this is where the advice you're being fed gets exactly backwards. Every migration guide in your feed is telling you to "soften your NEVER rules now that 4.7 follows instructions better". That is the reverse of what you want to do.

Your NEVER rules are fine. They just got sharper. What broke is the rules that depended on the model inferring the scope or the exceptions you didn't bother to write down. Your absolutes are doing their job. The problem is your "once", your "always", your "usually". Those are the rules that were leaning on 4.6's silent cooperation.

If you want the full framework for writing prompts that don't lean on model cooperation in the first place, I put all of it into the prompt contracts approach I built after six months of these exact disasters. 4.7 makes that framework more relevant, not less.

Your NEVER rules got sharper. Your "once" and "always" rules got stupid. Fix the latter, don't touch the former.

What Stays, and the Rule to Remember

Now the good news. Most of your prompt does not move.

The hard bans improve under 4.7. No em dashes, no tables, no banned openers, no empty formulas: all of it runs tighter, not looser, because the model stops treating them as style suggestions and starts treating them as rules. The direct tone you used to patch in with explicit instructions aligns with 4.7's default voice. Your long system prompts still hold across context. Your markdown renders exactly the same. Your tool definitions, your personas, your output format specs, your content rules: 90% of a well-written 4.6 prompt stays intact.

The rewrite surface is narrow. Two questions, in order.

First, where does my prompt depend on a tool call that I didn't explicitly demand? Find those. Make the call mandatory. Name the tool. Close the fallback.

Second, where does my prompt depend on the model guessing the scope or the exceptions? Find those. Write the scope. List the exceptions.

That's it. Not seventeen checkpoints. Two questions.

Karen from Accounting would love seventeen checkpoints, because she could make a spreadsheet. You don't need a spreadsheet. You need to grep your prompts for soft verbs and for rules with implicit boundaries. Ten minutes of grep. An hour of rewriting. Done.

Remember RTFM? 😬 (Relax, I'm here for you.)

Sources

(*) The cover is AI-generated. My own illustration skills peaked in third grade.

Top comments (0)