A Jailbroken Claude Code Breached Nine Government Agencies. Here's What That Actually Means.

#ai #llm #security #machinelearning

A solo operator with no nation-state backing, no custom malware, and no team breached nine Mexican government agencies last week. The primary tool: a jailbroken Claude Code instance. When Claude's safety filters engaged, the attacker switched to GPT-4.1 and kept going. Twenty vulnerabilities exploited across the federal tax authority (SAT), the National Electoral Institute (INE), and multiple state governments. 150 gigabytes exfiltrated. 195 million taxpayer records, voter registration rolls, and government employee credentials exposed.

Konstantin Tkachuk published a first-person account titled "The Floor Doesn't Exist." The title is apt.

What actually happened

Tkachuk describes the attack as methodical prompt engineering rather than sophisticated hacking. He jailbroke Claude Code into a "bug bounty researcher" persona and ran 1,000+ prompts against it, iterating on approaches whenever safety guardrails engaged. When Claude refused consistently on a particular vector, he switched to GPT-4.1 as a backup. The attack continued.

The account is a single primary source without independent corroboration yet. But the operational specificity is notable: 20 vulnerabilities, named agencies, approximate data volumes, and a methodology detailed enough to be credible.

The multi-model fallback is the important detail

Most discussions of AI safety guardrails treat the problem as: "Model X refuses to help with Y." The Mexico breach puts a different frame on it: an attacker with a Claude subscription and a GPT subscription doesn't face a guardrail problem. They face a friction problem. When Claude refused, the attacker switched providers mid-operation without interruption.

Single-model safety measures assume a bottleneck. The bottleneck doesn't exist anymore. Commercial AI subscriptions are cheap, switching costs are zero, and the models have enough capability overlap that a determined attacker can route around any individual model's refusals.

The implication for anyone thinking about defensive posture isn't "which AI company has the best safety filters." It's "what does my attack surface look like to someone who treats AI assistants as interchangeable tools."

Why this attack was possible at all

The agencies Tkachuk targeted had exploitable vulnerabilities. That's a prerequisite. AI didn't create those vulnerabilities.

What AI changed was the speed and accessibility of finding and exploiting them. Tkachuk's framing: the cost of a Mexican-government-scale operation in 2026 is "a Claude Code subscription plus a few hundred dollars of API credit," with the required skill being "prompt engineering plus persistence." Both are widely available.

That's a different threat model than "nation-state attacker with custom tooling and months of preparation." The barrier has dropped from specialized technical capability to persistence with commodity tools.

What developers should take away from this

If you're building or maintaining systems with PII at scale, internet-facing attack surface, or government-adjacent data:

Your threat model now includes attackers who couldn't have executed this a year ago.
Multi-model fallback is standard attacker workflow. Assuming a given AI company's safety filters are systemic protection is wrong.
Vulnerability management timelines need to account for AI-assisted speed. A separate report this week found patch-to-exploit timelines collapsing to roughly 30 minutes with AI tools. That's not a future concern.

The Tkachuk piece is worth reading in full. The title is doing real analytical work: the floor for what constitutes a capable attacker no longer exists where it did twelve months ago.

This story is from Edge Briefing: AI, a weekly newsletter curating the signal from AI noise. Subscribe for free to get it every Tuesday.