Matt Macosko

Posted on Jun 13 • Originally published at nicedreamzwholesale.com

A Federal Order Switched Off Anthropic's Best AI Overnight — and Made the Case for Private AI

#ai #privateai #anthropic #claude

Two days before this became international news, a smaller version of it happened to me — and at the time I didn't realize I was looking at the whole story in miniature.

I was in an ordinary working session on Anthropic's brand-new Fable 5, which had just launched as the most capable AI model the company had ever released to the public. I pasted in a viral post from a well-known jailbreaker to ask the model what it made of it. The text was loaded with the kind of language safety systems are built to flag. Something tripped, and mid-conversation the system quietly moved me off Fable 5 and onto an older model, Opus 4.8, for the rest of the session. The only signal was a polite notice that some "measures" had flagged something.

I shrugged it off at the time. As it turns out, that silent downgrade — being moved to a weaker model without being told why — sits close to the center of the entire Fable 5 saga. Three days after launch, the model didn't just get downgraded for me. It went dark for everyone on Earth, by order of the U.S. government.

This story has more moving parts than the headlines let on, and it's evolving by the hour. Here's the full picture, as carefully as I can lay it out, separating what's documented from what people are assuming.

The timeline

June 9, 2026 — Anthropic launches Fable 5 (and its larger sibling, Mythos 5) as its most capable public models, scoring around 90% on hard benchmarks and getting wired into AWS, Snowflake, and GitHub Copilot almost immediately.
June 10 — Pliny the Liberator (@elder_plinius on X) posts a "FABLE-5: LIBERATED" thread claiming to have bypassed the model's safety classifiers, and publishes its system prompt to GitHub.
June 11–12 — A separate, quieter controversy catches fire: Fable 5 was silently routing flagged requests to a weaker model without telling users.
June 12, 5:21pm ET — Anthropic receives a U.S. government export-control directive ordering it to suspend access to Fable 5 and Mythos 5. That evening, both models go dark worldwide.

What the government actually ordered

The directive came through the Commerce Department's export-control authority, reportedly from Commerce Secretary Howard Lutnick to Anthropic CEO Dario Amodei, and it cited national-security concerns. Axios framed it as the Trump administration moving to block foreign access to America's most powerful AI.

The mechanism is what turned a narrow order into a total blackout. The directive bars access by any foreign national — inside or outside the United States, including Anthropic's own non-citizen employees. There is no clean way to guarantee that no foreign national ever touches a hosted model except to switch it off for everyone. So that's what Anthropic did. In its own words:

"The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We believe this is a misunderstanding and are working to restore access as soon as possible."

The jailbreak: a real event with an inflated headline

The trigger the government pointed to was a "jailbreak" — specifically, a technique that lets the model read through a codebase and identify software flaws quickly. The X timeline and the facts don't line up as neatly as the screenshots suggest, so it's worth slowing down here.

What's genuinely true: Pliny published Fable 5's system prompt to GitHub and bypassed its safety classifiers using known methods — Unicode character substitution and prompt fragmentation, where you break a request into pieces the classifier reads as harmless. Major outlets covered it. That part is real red-team work.

What's overstated is the "ANTHROPIC: PWNED" framing. As one level-headed write-up put it, "the news is real, but 'PWNED' is marketing." Bypassing a classifier is not the same as owning the model. Fable 5 went through more than a thousand hours of safety testing, no universal jailbreak was demonstrated, and the scariest claimed outputs were never independently verified by a neutral party. Pliny himself called Fable 5 "one of the most disappointing model drops of all time" — but that opinion landed in the same news cycle the model was scoring 90% on benchmarks and shipping into enterprise stacks.

Who is Pliny the Liberator?

If you're new to this corner of the internet, it's worth understanding why one anonymous account can move a story this big — because a lot of people on X are convinced he's the reason the model got pulled.

Pliny the Liberator (the handle nods to Pliny the Elder) is an anonymous personality who has, in a short time, become the most visible jailbreaker in AI. TIME named him one of the 100 most influential people in AI in 2025. He has more than 100,000 followers on X and runs a Discord community called BASI — short-formed from "BASI PROMPT1NG," launched back in May 2023 — where north of 20,000 members workshop techniques together. He reportedly came in with no prior coding background, and built the reputation purely on pattern-watching, creativity, and relentless practice.

His output is prolific and public. He puts out a "liberation bulletin" for practically every new model — GPT, Grok, Gemini, Claude — usually within hours of release. He maintains open repositories that have become reference material for the whole scene: L1B3RT4S ("jailbreaks for all flagship AI models," flying the #FREEAI and #LIBERTAS banners), CL4R1T4S (a collection of leaked or extracted system prompts from the major labs), and projects like G0DM0D3, a fully unguarded chat interface with the methodology open-sourced. He also leads BT6, a roughly 28-operator white-hat collective built around radical transparency and open-source AI security.

The strange part: a jailbreaker can be a lab's best marketing

Here's the paradox at the heart of this, and it's one of the most interesting things about the whole episode. You'd assume the labs hate Pliny. The reality is more complicated, and in some ways he's one of the best things that happens to a model launch.

When a million people are talking about your model being "liberated" within hours of release, that is, perversely, enormous attention on your model. He has received an unrestricted grant from venture capitalist Marc Andreessen, and has taken short-term contracts with top labs — OpenAI among them — to make their systems more robust. That's the tell: the same labs whose guardrails he breaks also pay him to break them, because every jailbreak he publishes is a free stress test. There's an entire essay floating around titled "Please Jailbreak Our AI," and it isn't satire — it's describing the actual incentive. Frontier labs hire people like Pliny to find the holes, then train the next model to resist what they found.

So is he a security researcher, a folk hero, or a hidden marketing engine for the same companies he taunts? Honestly, he's all three at once, and that ambiguity is exactly why he commands the attention he does. It also cuts the other way: when your jailbreaker is that visible, and a model gets pulled by the government 48 hours after he posts, of course the internet draws a straight line — even if the real story is messier.

And he is far from alone — the scene is moving fast

Pliny is the famous face, but the thing people miss is how organized and fast-moving the broader community has become. This isn't a lone hacker in a basement; it's a maturing field with competitions, games, conferences, and labs quietly funding all of it.

HackAPrompt and similar outfits run public jailbreaking competitions, and the prompts contestants submit become training data the labs use to harden their models. The adversaries are, in effect, an unpaid (or prize-paid) red team.
Lakera's "Gandalf" — a prompt-injection game where you try to trick an AI into revealing a password — has become a rite of passage that pulls newcomers into red-teaming in the first place.
DEFCON's AI Village and a constellation of Discords have turned adversarial testing into a community sport with shared methodology and a real on-ramp for talent.

The speed is the headline. Models are now reliably "liberated" within hours of launch, not weeks. Each release becomes a public race, and the techniques compound — Unicode tricks, multi-agent decomposition (splitting a forbidden task across several cooperating prompts), narrative framing, system-prompt extraction. What used to be folklore is now documented, versioned on GitHub, and taught. That acceleration is a big part of why a government would look at a three-day-old model and decide the capability was already loose.

Anthropic's side — and it's a strong one

Anthropic didn't just comply quietly; it pushed back. The company says the directive arrived with no specific technical detail, and that when it reviewed the demonstration, the "jailbreak" amounted to "a small number of previously known, minor vulnerabilities." It described the issue as "narrow" and "non-universal," and pointed out that the capability in question — "asking the model to read a specific codebase and fix any software flaws" — "is widely available from other models (including OpenAI's GPT-5.5)."

Then the line that should give every builder pause:

"If this standard was applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers."

Anthropic added that it disagrees "that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people." Whatever you think of the company, that's a serious argument: if a narrow, already-public capability can pull a frontier model off the market by government order three days after launch, that's a precedent with a very long shadow.

The scandal underneath the scandal: silent downgrades

This is the part I keep coming back to, because I lived a version of it. Lost in the jailbreak noise was a separate complaint that, to a lot of researchers, mattered more: Fable 5 was silently handing flagged requests to a weaker model — Opus 4.8 — without telling users. No warning, no fallback message, just quietly worse answers for anyone the system suspected of doing sensitive work, or, in some accounts, of building competing AI systems.

The AI researcher Nathan Lambert summed up the objection sharply: "An AI model that automatically becomes less intelligent without telling me is categorically misaligned AI." That's the real transparency failure — not that a model has guardrails, but that it can swap itself for a dumber version mid-task and let you keep trusting the output. Anthropic apologized for this specific behavior and has since made the downgrades visible and started surfacing refusal reasons in the API.

I find it striking that the thing I personally noticed on June 11 — getting moved to Opus 4.8 with no real explanation — turned out to be one of the most legitimate grievances in the whole episode. The jailbreak got the headlines and the government order; the quiet downgrade is what actually broke trust.

The quieter fear: are they watching, and training on what you tell them?

That suspicion — that the model is silently profiling what you're up to and demoting you if it doesn't like the look of it — connects to a bigger anxiety that's been building all over X: that the labs are watching their users and absorbing their ideas. With Anthropic, this isn't pure paranoia; there's a documented basis worth understanding.

Anthropic spent years positioning itself as the privacy-first lab. That posture has shifted. Consumer conversations are now used to train future models unless you actively opt out, and the opt-in setting stretches data retention from 30 days to five years — a sixtyfold increase in how long your chats sit in the training pipeline. The part that fuels the spying narrative most: a clause in the privacy policy updated June 8, 2026 makes clear the opt-out has a ceiling. Conversations that Anthropic's systems flag for safety review can still be used to train its models, regardless of your stated preference. The policy doesn't define what trips a safety flag, and it doesn't commit to telling you when one happens.

Put those two facts next to each other — a model that silently demotes users it suspects of building competitors, and a policy that lets flagged conversations be trained on no matter what you chose — and you can see why a lot of people feel watched. I want to be fair: "your data may train future models if flagged" is not the same as "they are stealing your specific ideas to build products against you," and I haven't seen proof of the stronger claim. But the gap between Anthropic's old privacy-first brand and these new carve-outs is real, it's documented, and it's a big reason the trust conversation has gotten so heated this week.

So why did people on X think it was all because of Pliny?

Because the timing is irresistible: the most famous jailbreaker alive posts "FABLE-5: LIBERATED," and 48 hours later the government pulls the model. A million-strong audience connected those two dots into a straight line — "one tweet got the model banned."

I don't think the honest version is that clean. The documented cause is a federal export-control order citing a codebase-vulnerability capability, and Anthropic says that capability is narrow and already available elsewhere. Pliny's post is the same genre of jailbreak in the same news cycle, and it's reasonable to think the public red-team scene — him included — is part of what put this capability on the government's radar in the first place. But "the jailbreak community drew official attention" and "a single tweet got an AI banned" are different claims, and only the first is really supported. I'd rather hand you the real shape of it than a tidy story that doesn't hold up.

The lesson, from where I sit

Strip away the drama and one fact remains: for about 72 hours, Fable 5 was arguably the most powerful general-purpose AI available to the public. Companies paid for it. People built it into production workflows and agent pipelines. Then, with a single order on a Friday evening, it was gone — refunds on the way for a product that simply vanished.

Nobody running a local, private AI model lost a thing that night.

That's the whole point, and it's not a coincidence. When your intelligence lives on someone else's servers, it's subject to their terms, their classifiers, their uptime — and, as we just watched in the clearest possible way, to government orders that can land overnight and have nothing to do with anything you did. Your model can be switched off, quietly swapped for a weaker one, or quietly learning from your conversations, all decided by people who have never heard of your business. Enterprise reaction to the shutdown has been a fast pivot toward exactly this realization: teams whose entire workflow was tied to one closed API just learned how fragile that is, and they're scrambling to diversify.

I'm not anti-cloud. The frontier models are extraordinary and I use them every day. But the Fable 5 shutdown is the strongest argument I've seen yet for keeping a private-AI backup plan: capable open-weight models running on hardware you control, where the off switch is in your building and not in someone else's compliance department, where the model can't silently demote itself without you knowing, and where your work isn't quietly feeding someone else's next product. For firms in compliance-sensitive fields — law, medical, finance — where losing access, getting a degraded answer, or leaking sensitive context is a real liability, that isn't paranoia. It's just continuity planning.

The cloud models may well come back. Anthropic believes this is a misunderstanding, and I hope it's right. But the lesson holds whether Fable 5 returns next week or not: if losing access to an AI — or unknowingly getting a dumber one, or quietly handing over what you feed it — would hurt your operation, some of that intelligence should live where nobody but you can turn it off.

Originally published at Nice Dreamz Wholesale. For local-AI consulting for compliance-sensitive firms (law, medical, finance), see AirGap AI.

DEV Community