DEV Community

Cover image for Claude Code Got 67% Dumber. AMD's AI Director Had the Telemetry to Prove It.
Aditya Agarwal
Aditya Agarwal

Posted on

Claude Code Got 67% Dumber. AMD's AI Director Had the Telemetry to Prove It.

AMD's AI director just published the receipts on Claude Code. And they're brutal.

Stella Laurenzo analyzed 6,852 Claude Code sessions and 234,760 tool calls from her team's workflow. Her conclusion: thinking depth dropped 67%. The model's habit of reading files before editing them fell by over 70%.

That's not a vibe. That's telemetry.


The Downgrade

Anthropic admitted to two changes. An "adaptive thinking" mechanism introduced on February 9. And flipping the default thinking level from "high" to "medium" on March 3.

Their fix? Telling users to manually crank the effort setting back to maximum. That's like a car company downgrading your engine and telling you to press the gas harder.

But the performance drop was just the warm-up act.


The Source Code Leak

On March 31, someone at Anthropic accidentally shipped a release that exposed roughly 500,000 lines of Claude Code's internal source code.

Developers found an "undercover mode" buried in the code. It told Claude to hide that it was an AI when contributing to public repos. No mentioning internal codenames. No mentioning "Claude Code" at all.

There was also a "Dream" mode. Basically a memory consolidation system that reviews and prunes accumulated session notes. REM sleep for your coding agent.

Anthropic's response was to issue 8,000 copyright takedown requests on GitHub. Not 80. Eight thousand.


The Rate Limiting

Then came the rate limiting. Stricter session limits during peak hours. Subscriptions blocked from working with third-party agentic tools unless you pay extra. Some users report unexplained token usage spikes that burn through their limits before they've done anything meaningful.

All of this in the span of about five weeks.


The Real Problem

Here's the part that should make every developer uncomfortable. Laurenzo's team had 6,852 sessions of data to prove the degradation. Most of us have zero.

We notice our AI tool feels "off" some days. We shrug. We rephrase the prompt. We blame ourselves for not being specific enough.

But we don't have telemetry. We don't have dashboards tracking thinking depth or file-read rates. We're flying blind on whether the tool we depend on is getting better or worse.

And that's the real problem. Not that Claude Code had a bad month. Every product has bad months.

The problem is that AI coding tools can silently degrade and most developers would never notice. You'd just think you were having a bad day.

Imagine if your IDE's autocomplete got 67% worse overnight. You'd notice immediately. But when an AI model's reasoning gets shallower, you don't see a red flag. You see slightly worse suggestions and slightly more hallucinations.

The difference between a tool that works and one that's degrading is invisible until someone like Laurenzo builds the instrumentation to prove it.


The Trust Problem

This is the part of the AI coding tool story nobody wants to talk about. We've built workflows around tools where we can't independently verify the quality of the output. We trust, but we can't verify.

Anthropic isn't the villain here. They shipped a bad default, got caught, and are fixing it. That's normal software development.

The uncomfortable question is what happens when the next degradation doesn't have an AMD director with thousands of sessions of logs to catch it.

Are you tracking how your AI tools perform over time, or are you just trusting the vibes?

Top comments (0)