Mathieu Kessler

Posted on Feb 20

Stop Yelling at Your AI: Why Your Claude 4.5 Prompts Are Breaking Claude 4.6

#ai #claude #programming #promptengineering

If you upgraded to Claude Opus 4.6 or Sonnet 4.6 and started getting worse results, over-engineered code, unnecessary tool calls, walls of text where a sentence would do, your model isn't broken. Your prompts are too loud.

Claude 4.6 is significantly more responsive to system prompts than previous models. Instructions that needed aggressive emphasis in the 3.5 era now overtrigger. The model follows them too literally, using tools when unnecessary, adding abstractions nobody asked for, and generating verbose responses to simple questions.

This is the most common migration mistake I'm seeing right now. Here's how to fix it.

The problem: prompts designed for a model that didn't listen

Claude 3.5 was good, but it sometimes needed prodding. Teams developed prompting habits to compensate:

ALL-CAPS emphasis: "CRITICAL:", "MUST:", "ALWAYS:", "NEVER:"
Aggressive repetition: stating the same constraint three different ways
Threat-level framing: "This is EXTREMELY IMPORTANT"
Absolute rules: "ALWAYS use the search tool for EVERY question"

These patterns worked because 3.5 occasionally needed the extra push. They were a rational response to a model that sometimes ignored nuanced instructions.

Claude 4.6 doesn't have that problem. It reads your system prompt carefully and follows it. When it sees "ALWAYS use the search tool for EVERY question," it uses the search tool for every question -- including ones where it already knows the answer, wasting time and tokens.

Dial-back prompting: the fix

Anthropic's prompting best practices documentation recommends this directly: "Where you might have said 'CRITICAL: You MUST use this tool when...', you can use more normal prompting like 'Use this tool when...'." I've been calling it dial-back prompting -- reducing the intensity of your instructions to match a model that actually follows them.

Before and after

Search tool usage:

BEFORE (Claude 3.5 era):
"CRITICAL: You MUST use the search tool for EVERY question. ALWAYS
verify your answers. NEVER respond without checking at least 2 sources.
This is EXTREMELY IMPORTANT."

AFTER (Claude 4.6 era):
"Use the search tool when you're not confident in your answer or when
the question involves recent events. For well-established facts, you
can respond directly. Aim for efficiency -- verify when it adds value,
not on every response."

Error handling in code:

BEFORE:
"You MUST ALWAYS include error handling in every code example. NEVER
skip validation. CRITICAL: Every function needs try/catch blocks."

AFTER:
"The user is a junior developer who will copy-paste your code directly
into production. Include error handling so they don't ship fragile code
that crashes on edge cases. Include comments explaining what each error
handler catches."

Response formatting:

BEFORE:
"ALWAYS use markdown. ALWAYS include headers. NEVER give a response
without proper formatting. This is NON-NEGOTIABLE."

AFTER:
"Format your response for readability. Use headers and lists for
complex topics. For simple answers, a direct response without
formatting is fine."

Same intent in every case. Different volume. Better results.

The migration checklist

Run through your existing system prompts and apply these changes:

1. Remove ALL-CAPS emphasis

Replace every instance of CRITICAL, MUST, ALWAYS, NEVER, EXTREMELY IMPORTANT with normal-case language. Claude 4.6 doesn't need to be yelled at.

2. Replace absolute rules with contextual guidance

"ALWAYS do X" becomes "Do X when [specific conditions]. Skip it when [other conditions]." Give the model judgment instead of mandates.

3. Add explicit permission to NOT do things

This is the counterintuitive one. Claude 4.6 is so instruction-following that it needs permission to be efficient:

"You don't need to use tools for every question."
"Keep solutions minimal. Don't add abstractions or flexibility that wasn't requested."
"Match your response length to the complexity of the question."

4. Test for over-engineering

After migrating your prompts, test with simple requests. Ask it to write a function that adds two numbers. If you get a class with dependency injection, your prompts still need dialing back.

If you want a systematic approach to auditing your existing prompts, I built a migration template for exactly this: the Claude 3.x to 4.x Migration Prompt walks through each prompt in your codebase and flags the patterns that need updating.

Motivated instructions: explain the WHY

There's a companion technique that works hand-in-hand with dial-back prompting: motivated instructions. Instead of stating bare rules, explain why the rule exists.

Anthropic's documentation highlights this explicitly: Claude 4.x models generalize better from motivated instructions than from bare rules. Their example -- a TTS engine that can't handle ellipses -- demonstrates the pattern well.

Bare rule (unmotivated):

"Never use ellipses in your response."

Motivated instruction:

"Your response will be read aloud by a text-to-speech engine, so never
use ellipses since the TTS engine won't know how to pronounce them.
Also avoid other punctuation that could confuse audio rendering, like
nested parentheses or excessive dashes."

The motivated version does two things the bare rule can't:

It generalizes. The model understands the underlying principle (TTS compatibility) and avoids other TTS-unfriendly formatting you didn't think to mention.
It handles edge cases. When the model encounters a situation you didn't explicitly cover, it can reason from the principle instead of defaulting to literal compliance.

Bare rules get followed literally. Motivated rules get followed intelligently.

More motivated instruction examples

For a customer support bot:

Unmotivated: "Keep responses under 100 words."

Motivated: "Customers reaching this bot are frustrated and looking for
quick resolution. Keep responses concise -- under 100 words when possible
-- because long responses increase abandonment. Front-load the answer
before any explanation."

For a code review agent:

Unmotivated: "Flag all functions longer than 50 lines."

Motivated: "This codebase has a history of maintenance problems caused
by oversized functions. Flag functions longer than 50 lines because
they tend to accumulate hidden side effects. Suggest logical split
points rather than just flagging the length."

Three more migration traps

Trap 1: Over-delegation (Opus 4.6 specific)

Opus 4.6 has a strong tendency to spawn subagents -- delegating subtasks to parallel instances. Anthropic's docs confirm this: "Claude Opus 4.6 has a strong predilection for subagents and may spawn them in situations where a simpler, direct approach would suffice."

Add to your system prompt:

"Work directly (no subagents) when:
- The task is a single-file edit or simple lookup
- Steps must be sequential and share context
- The overhead of delegation exceeds the time saved

Delegate to subagents when:
- Tasks are truly independent and can run in parallel
- Each task requires different specialized context"

Trap 2: Over-tooling

Claude 4.6 will use every tool you give it access to if your prompt doesn't constrain usage. If your system prompt says "use tools to verify everything," it will make API calls to verify what 2 + 2 equals.

Add:

"Use tools when they add value, not reflexively. For questions you can
answer confidently from training data, respond directly."

Trap 3: Verbose responses

Without explicit guidance, Claude 4.6 tends toward thorough (read: long) responses. A question that warrants a one-line answer gets five paragraphs.

Add:

"Match your response length to the complexity of the question. Simple
questions get simple answers. Don't pad responses with caveats,
disclaimers, or context the user didn't ask for."

The bigger picture: from prompt engineering to context engineering

The dial-back prompting shift reflects something larger. Anthropic's context engineering blog post frames the evolution: the job is no longer about finding the right words in a single prompt. It's about managing the entire information environment around inference.

In the Claude 3.5 era, prompt engineering was about finding the magic words. In the Claude 4.6 era, it's about designing the right system:

System prompt structure: XML-tagged sections for role, tools, context, rules
Tool definitions: What tools are available and when to use them
Context loading: What information the model has access to and when
Guardrails: What the model should and shouldn't do -- expressed as principles, not commands

The models got better at listening. Now we need to get better at talking to them.

Stop yelling. Start designing.

I wrote a full guide covering Claude prompting fundamentals -- XML architecture, adaptive thinking modes, model selection with current pricing, and all 18 templates referenced above: nerdychefs.ai/learn/claude-prompting

Sources:

Free prompts and AI workflows at NerdyChefs.ai and GitHub.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.