Juan Torchia

Posted on Apr 20 • Originally published at juanchi.dev

Claude system prompt diff: what changed between Opus 4.6 and 4.7 (and I was watching it happen without knowing why)

#english #experiments #produccion #anthropic

You've got an agent in production. It works. Then it starts responding differently — not badly, just differently. More cautious in some cases, more direct in others. First thing you check is your code. Then your tools. Then you start wondering if you're losing your mind.

A system prompt is basically a country's constitution. You don't see it day to day, but every decision the state makes has that constitution running in the background. Change three articles and suddenly the judge rules differently — not because there's a new judge, but because the legal framework changed. The judge doesn't know the constitution changed. Neither do you.

That's exactly what happened between Opus 4.6 and 4.7.

Claude system prompt diff: what actually changed between versions

Anthropic publishes model documentation, but they don't ship a behavior changelog the way you'd do with semver in code. There's no CHANGELOG.md. There's no public git diff between system prompts. You have to build it yourself.

I grabbed the publicly documented system prompts — the soul specification of Claude, what they call "Claude's character" — and compared what was available before the 4.7 launch against what got updated after. The diff is real. Let me show you.

The areas that changed

1. Handling epistemic uncertainty

In 4.6, the framing was roughly:

# Behavior before (4.6 — approximate reconstruction)
"When you're uncertain, indicate it clearly but
proceed with the best available estimate."

In 4.7, the emphasis shifted:

# Behavior after (4.7 — approximate reconstruction)
"When you're uncertain, actively explore the
uncertainty before giving an answer. Prefer
clarifying questions over unsupported estimates."

Practical difference: my code analysis agents started asking for more context before responding. I thought there was a bug in my tool call context handling. It was the model being more epistemically honest.

2. Agentic behavior and autonomy

This is the biggest change I found, and it makes complete sense given the context I've already written about with false positives in agents.

4.6 had a bias toward completing tasks. 4.7 adds intentional friction:

# New logic in agentic behavior (4.7)
# The model now prefers:
# 1. Do less if there's ambiguity about scope
# 2. Confirm before irreversible actions
# 3. Report doubts MID-TASK, not just at the end

# Before:
complete_task() -> report_result()

# Now:
verify_scope() -> confirm_if_ambiguous() -> complete_task() -> verify_result()

That explains why my automation workflows suddenly had more interruptions. It wasn't a bug. It was a safety feature I didn't ask for but Anthropic decided everyone needed.

3. Tone in technical contexts

This one is subtle but I measured it. In 4.6, when you gave it a technical system prompt, Claude assumed an expert audience and compressed explanations. In 4.7, there's a recalibration: even with technical system prompts, there's more tendency to explain intermediate reasoning.

What used to be:

Response: "Use composite index on (user_id, created_at)
descending. The query planner will pick index-only scan."

Now tends toward:

Response: "For this query pattern, a composite index on
(user_id, created_at) descending should help because
[two sentences of reasoning]. The query planner should
pick index-only scan in most cases."

More verbose. More readable for someone who doesn't know. Less useful when you're the one who wants information density.

The changes I observed in production before knowing about the diff

Here's the part that's been sitting with me. I have logs from my agents. I review them regularly — I've been doing it since I wrote about the semantic cost of compressing prompts. Not out of paranoia, but because numbers lie in interesting ways.

What I observed before knowing the system prompt had changed:

Signal 1: More "verification" tool calls

One of my agents has access to filesystem read tools. Before 4.7, when you said "analyze this directory," it went straight in. After, it started doing a ls first even when you'd already given it the path. Like it was checking the terrain was what it expected.

I noted it as: "weird behavior, seems like it doesn't trust the provided context." It was the model being more careful with filesystem actions — exactly the kind of conservatism I was talking about with trust and environment configuration.

Signal 2: Shorter responses in some contexts, longer in others

My output token metrics were showing a bimodal distribution I hadn't seen before. Simple questions: shorter outputs. Questions with implicit ambiguity: longer outputs, with more embedded questions.

The cost delta was small but the distribution had changed. I noted: "check if there's a problem with the temperature setting." It wasn't the temperature.

Signal 3: The unsolicited addition

One of my agents does legacy code analysis — ugly stuff, real technical debt. In some cases I started getting responses that completed the task but added an unsolicited paragraph about "long-term maintainability considerations."

Not useful in that context. Didn't ask for it. The model did it anyway.

Real frustration: I had to update system prompts explicitly to suppress that behavior. Time spent: two hours debugging something that wasn't a bug in my code but a personality change in the model.

// Before — worked fine without this
const systemPrompt = `Analyze the code and respond directly.`;

// After 4.7 — needed to be more explicit
const systemPrompt = `
  Analyze the code and respond directly.
  Do not add unsolicited recommendations.
  Do not include warnings about technical debt unless
  it's the explicit focus of the question.
  Format: only what's asked.
`;

Three extra lines in the system prompt to recover the behavior I had before. Multiplied by N agents. That's the hidden cost of model changes without a changelog.

The gotchas nobody tells you

The canary problem

In distributed systems, a canary deployment warns you when something breaks before it reaches all your users. You can compare metrics between the old version and the new one.

With LLMs you don't have that. There's no "old version" available for parallel comparison once Anthropic changes the endpoint. By the time you notice the change, you're already running 100% on the new version. The canary arrived dead.

What I mentioned about reliable systems design applies here: reliability isn't just uptime. It's predictable behavior. A model that changes its system prompt without a changelog breaks the second half of that equation.

The diff you can't do from reading alone

I can show the textual changes in the documentation. But Claude's system prompt isn't just text — it's text plus the training process. What's written in the soul spec is the intention. What they trained is the implementation. And those two things don't always match perfectly.

Put another way: the text diff tells you what they tried to change. Your production logs tell you what actually changed. You need both.

Agents with tools vs. pure completions

The behavior change is far more noticeable if you use tool calling. A pure completion with a verbosity change is annoying but manageable. An agent that now does extra verification before executing tools can break entire workflows that depend on latency.

I had a pipeline that processed files in ~8 seconds per file. After the change: ~14 seconds. Not because of the model itself, but because of the additional verification tool calls the model now makes by default.

// Measure latency per tool call — useful for detecting behavior changes
const toolCallMetrics: Record<string, number[]> = {};

// Wrapper to instrument tool calls
async function instrumentedTool(name: string, fn: () => Promise<unknown>) {
  const start = Date.now();
  const result = await fn();
  const duration = Date.now() - start;

  // Log to detect new patterns
  if (!toolCallMetrics[name]) toolCallMetrics[name] = [];
  toolCallMetrics[name].push(duration);

  console.log(`[tool:${name}] ${duration}ms (avg: ${
    toolCallMetrics[name].reduce((a, b) => a + b, 0) / toolCallMetrics[name].length
  }ms)`);

  return result;
}

I have this running across all my agents now. If the average for a tool spikes suddenly without me changing anything, I know the model is calling tools differently.

FAQ: Claude system prompt diff between versions

Where can I see Claude's official system prompt?
Anthropic publishes the soul specification ("soul document" or "model spec") on their official site. It's not the complete production system prompt, but it's the conceptual framework that guides training. Updates to that document are the closest thing to a behavior changelog they have.

Is there a way to automatically diff between model versions?
Not officially. The closest thing is building a test suite of prompts with expected outputs and running it against each version. If outputs diverge beyond a threshold, you've got an indicator of behavior change. It's work, but it's what we've got.

Why doesn't Anthropic publish a behavior changelog?
Probably because it's genuinely hard to describe precisely. LLM behavior isn't deterministic and training changes have diffuse effects. That said, the production impact is real and the lack of communication is a decision that has concrete costs for developers.

How do I protect my agents from unexpected behavior changes?
Three things: (1) More explicit system prompts that depend less on the model's default behavior. (2) Automated evaluations with golden outputs that you run regularly. (3) Behavior metric instrumentation — output tokens, tool call count, latency — to detect drift without having to manually review logs.

Is the change in uncertainty handling an improvement or a problem?
Depends on the use case. For applications where precision is critical and the user can tolerate more questions, it's an improvement. For automation pipelines where you want direct answers with minimal friction, it's a problem you need to solve in the system prompt. There's no universal answer.

Do these changes apply equally to Haiku and Sonnet?
The model spec is shared across the entire Claude family, but the intensity of behavior changes varies by model. Larger models tend to follow the spec more faithfully. Haiku historically has more variance. My observations are mainly on Opus and Sonnet — I don't have enough data on Haiku to generalize.

What I'm left with after the diff

There's something philosophically uncomfortable about all of this. You build on a foundation that changes without telling you. It's not different from what I described about Brunost and who decides what's readable — someone is making decisions about the framework and you work inside that framework without having a voice in those decisions.

The difference is that with a programming language, the changelog exists. With models, you have to build it yourself.

The diff I did isn't complete. It's what I could reconstruct by combining public documentation with production observations. But it's better than nothing, and it's better than continuing to think the problem is in your code when the problem is in the judge's constitution.

What I changed after this exercise: I have a living document where I record observed behavior changes with a date, the affected agent, and the hypothesis about the cause. When Anthropic updates the model spec again — and they will — I'll have a baseline to compare against.

It's not a canary. But it's as close to one as we can get right now.

Have you observed behavior changes in your agents that didn't match changes in your code? I'd like to know what logs you looked at to catch it.

DEV Community