Jamie Cole

Posted on Mar 12

Claude 3.5 Sonnet Changed. My System Prompt Stopped Working. Here's What I Learned.

#ai #anthropic #claude #llm

I've been building with Claude APIs since early 2025. Last month, I noticed something strange: my carefully tuned system prompt stopped producing the outputs I expected. The format was slightly off. The tone shifted. My downstream parsing started failing intermittently.

Anthropic hadn't announced any changes to claude-3-5-sonnet-20241022. My code was identical. But the model was behaving differently.

This article is what I learned from that experience — and what I now do to catch this kind of drift automatically.

What happened

I was using Claude for a structured data extraction task. My system prompt instructed the model to:

Return JSON only, no preamble
Use specific field names (entity, confidence, source)
Never include explanation text

For months it worked perfectly. Then, over the course of about 3 days, outputs started including preamble text: "Here is the extracted data:" followed by the JSON. My json.loads() calls started throwing JSONDecodeError because of the leading text.

It wasn't every response. It was maybe 15–20% of calls. Enough to cause intermittent failures but not a hard crash. The worst kind of bug.

What I first tried (that didn't work)

Checking Anthropic's changelog: Anthropic updates model behavior without always publishing granular release notes. The model version identifier (claude-3-5-sonnet-20241022) hadn't changed, but behavior had.

Checking my own code: Nothing changed. I'd been running the exact same system prompt and API call structure for months.

Adding retry logic: This masked the symptom without fixing the cause. Worse — it made the failure invisible. Errors went down, but bad data was now flowing into my database.

Strengthening the system prompt: Adding more explicit instructions helped but didn't eliminate the drift. The model was behaving differently at the inference level and the system prompt couldn't fully compensate.

The core problem: you have no baseline

Here's the thing that makes this failure mode particularly dangerous: without a behavioral baseline, you don't know if your model has drifted.

You can check if your code is working (tests pass). You can check if the API is responding (health checks pass). But you can't check if the model is still producing the same kind of outputs it was a month ago — unless you have a baseline to compare against.

Most observability tools (LangSmith, Langfuse, Helicone) log what your model does. They don't tell you when the behavior has shifted from before. That requires:

A reference run at time T₀
A regular re-run of the same prompts at T₁, T₂, T₃...
A scoring mechanism that compares T₁ to T₀ semantically (not just string equality)
An alert when the score crosses a threshold

None of the existing observability tools do this automatically. It's a gap.

What I built to solve this

I built DriftWatch — a service that does exactly this.

You paste your critical prompts. It runs them once to create a behavioral baseline. Every hour, it re-runs them and computes a drift score across three dimensions:

Semantic similarity (is the meaning of the output still the same?)
Format compliance (is the structure — JSON, markdown, etc. — preserved?)
Instruction-following delta (did specific instructions like "no preamble" stop being honored?)

When drift exceeds a threshold (configurable, default 0.3), you get a Slack or email alert.

In our test suite, the JSON extraction prompt drifted to 0.316 — whitespace was removed from the output formatting and a trailing period was stripped from a field value. These changes are invisible to json.loads() but break exact-match comparisons and raw string processing.

What drift scores look like in practice

Here are some real drift scores from running DriftWatch on production-style Claude API prompts (same model, two consecutive runs to detect behavioral shifts):

Prompt	Drift Score	What changed
JSON extraction	0.316	Whitespace removed from JSON output; trailing period stripped from value
JSON array extraction	0.000	Stable — no drift
Instruction following	0.575	Trailing period dropped: `"Neutral."` → `"Neutral"` (exact-match parsers break)
Summarization	0.189	Slightly more verbose
Sentiment analysis	0.0	No drift detected

The instruction-following regression (0.575) is particularly dangerous. If you're using if response.strip().lower() == "neutral" this works — but if your parser was written against the baseline (which returned "Neutral." with a period), it breaks silently. No exception. Just wrong behavior.

How to set up drift monitoring for Claude

If you want to do this yourself:

Identify your 5–10 most critical prompts (the ones where output format or behavior matters)
Run each one 3 times and store the median output as your baseline
Schedule a cron job to re-run them daily
Compute cosine similarity between current and baseline outputs (use a sentence transformer for semantic comparison)
Flag anything over 0.3 for manual review
Alert on anything over 0.5 (likely a breaking regression)

Alternatively, DriftWatch does all of this for you. Free tier is 3 prompts, no credit card. Setup takes about 5 minutes.

The bigger pattern

This isn't just a Claude problem. OpenAI does the same thing with GPT-4o. Model providers update model behavior without updating version identifiers. It's a consequence of how LLMs are deployed — rolling updates, A/B experiments, safety fine-tunes that change behavior as a side effect.

The solution isn't to find a provider that doesn't do this (they all do). The solution is to monitor your model's behavior the same way you monitor your API's uptime: continuously, with alerts.

DriftWatch is a behavioral drift detection service for LLM applications. It monitors your prompts on a schedule and alerts you the moment output quality or format shifts — before your users notice.

Free tier available at genesisclawbot.github.io/llm-drift.

If you want to know the moment Claude changes again: how to get real-time alerts when Anthropic updates your model. Or try DriftWatch free — 3 prompts, no card.

Top comments (2)

Arsene Antonel • Mar 13

Good product, bad presentation !
Remove all emojies , icons from driftwatch website .
Right now looks like "college" made design

Raise the level to entreprize !

Some comments may only be visible to logged-in visitors. Sign in to view all comments.