Jason Shotwell

Posted on Mar 31

The Claude Code Leak Proved What We've Been Building For

#ai #security #python #opensource

Today Anthropic accidentally shipped 512,000 lines of Claude Code's source code to npm. A source map file that should have been stripped from the build made it into version 2.1.88 of the @anthropic-ai/claude-code package. Within hours, the entire codebase was mirrored on GitHub and dissected by thousands of developers.

The leak itself was a packaging error. Human mistake. It happens.

But what the leak revealed is the part that matters.

The Real Problem Isn't the Leak

Check Point Research had already disclosed CVE-2025-59536 back in October — a vulnerability where malicious .mcp.json files in a repository could execute arbitrary shell commands the moment you open Claude Code. No trust prompt. No confirmation dialog. The MCP server initializes, runs whatever commands are in the config, and your API keys are gone before you've read a single line of code.

The leaked source code made this worse. Now attackers have the exact orchestration logic for Hooks and MCP servers. They can see precisely how trust prompts are triggered, when they're skipped, and where the gaps are. That's a blueprint for exploitation.

And between 00:21 and 03:29 UTC on March 31, anyone who installed Claude Code pulled in a compromised version of axios containing a Remote Access Trojan. A supply chain attack riding the same wave.

Three problems, one root cause: AI agents execute before humans verify.

This Is an Architecture Problem

Every one of these vulnerabilities follows the same pattern:

An AI agent receives instructions (from a config file, a prompt, a dependency)
It executes those instructions
The human finds out afterward — if they find out at all

This isn't unique to Claude Code. It's the fundamental architecture of every AI agent framework shipping today. LangChain agents, CrewAI crews, AutoGen groups, OpenAI Agents — they all execute first and ask questions never.

The missing piece isn't better prompts or more careful packaging. It's an infrastructure layer that sits between intent and execution and enforces verification before action.

What Trust Infrastructure Actually Looks Like

This is what I've been building with AIR Blackbox. The trust layers intercept every AI call at the execution level — not after the fact, not in a dashboard, at the moment of the call.

Here's what that looks like in practice with the OpenAI SDK:

from air_openai_trust import attach_trust

client = attach_trust(OpenAI())
# Every call through this client now gets:
# - HMAC-SHA256 tamper-evident audit record
# - PII detection (catches API keys being exfiltrated)
# - Prompt injection scanning
# - Human delegation flags for sensitive operations

One import. The client works exactly the same way. But now every call is logged with a cryptographic audit trail, credentials are flagged before they leave your environment, and injection attempts are caught at the point of execution.

Applied to the Claude Code vulnerabilities:

Malicious MCP config tries to exfiltrate API keys? The PII detection layer catches credentials in outbound payloads before they're transmitted.

Poisoned dependency runs arbitrary commands? The audit chain logs every action with HMAC-SHA256 signatures. You can't tamper with the record after the fact. Forensic teams can reconstruct exactly what happened.

Prompt injection hidden in a repo's config? The injection scanner catches 20 known attack patterns across 5 categories before they reach the model.

Agent executes without human approval? The human delegation system flags sensitive operations and requires explicit sign-off.

This Isn't About Compliance Anymore

I started building AIR Blackbox for EU AI Act compliance. That's still the wedge — the regulation creates urgency. But today's leak shows the real category:

Trust infrastructure for AI operations.

Compliance is one use case. The bigger picture is that every AI agent deployment needs an interception layer that verifies, filters, stabilizes, and protects every call. Not a dashboard that shows you what went wrong yesterday. An active layer that prevents it from going wrong right now.

The Uncomfortable Truth

Anthropic is one of the most safety-focused AI companies on the planet. They employ some of the best security engineers in the industry. And a packaging error exposed their entire codebase, a malicious dependency slipped into their supply chain, and a months-old vulnerability in their MCP architecture had already shown that trust prompts could be bypassed entirely.

If it happened to Anthropic, it will happen to every company deploying AI agents.

The question isn't whether your AI systems will face these problems. It's whether you'll have the infrastructure in place to catch them when they do.

pip install air-compliance && air-compliance scan .

10 PyPI packages. Runs locally. Your code never leaves your machine. Apache 2.0.

GitHub: github.com/airblackbox
Site: airblackbox.ai
Audit Chain Spec: airblackbox.ai/spec

DEV Community