klement Gunndu

Posted on Apr 2

What 512K Lines of Leaked Claude Code Taught Me About AI Tool Design

#ai #python #architecture #programming

Hardcoded safety hooks for agent tools

On March 31, 2026, Anthropic shipped Claude Code v2.1.88 with a 59.8MB source map file still attached. The entire TypeScript source — 1,900 files, 512K+ lines — was readable by anyone who ran npm pack.

I downloaded it. I read the tool architecture. What I found changed how I think about building AI tools.

This is not speculation. Every code snippet below comes from the actual source. I have the full archive on disk.

The Tool Interface: One Type to Rule 58 Tools

Claude Code ships 58 tools — from BashTool to AgentTool to GrepTool. Every single one implements the same TypeScript type:

export type Tool<Input, Output, Progress> = {
  name: string
  searchHint?: string  // 3-10 word capability hint

  // Core execution
  call(args, context, canUseTool, parentMessage, onProgress): Promise<ToolResult>

  // Schema (Zod)
  readonly inputSchema: Input
  readonly outputSchema?: z.ZodType<unknown>

  // Safety declarations
  isConcurrencySafe(input): boolean
  isReadOnly(input): boolean
  isDestructive?(input): boolean

  // Permission hooks
  validateInput?(input, context): Promise<ValidationResult>
  checkPermissions(input, context): Promise<PermissionResult>
}

The insight is not in any single field. It is in what the type forces you to declare.

Every tool must answer three questions before it runs: Can it run alongside other tools? Does it modify state? Could it destroy something? These are not optional annotations. They are required by the type system.

Most AI tool frameworks I have seen treat safety as an afterthought — a wrapper you add later. Claude Code makes it structural. You cannot build a tool without deciding upfront whether it is safe.

buildTool(): Defaults That Fail Closed

All 58 tools go through a factory function called buildTool(). It supplies defaults:

const TOOL_DEFAULTS = {
  isConcurrencySafe: () => false,   // assume NOT safe
  isReadOnly: () => false,          // assume writes
  isDestructive: () => false,
  checkPermissions: (input) =>
    Promise.resolve({ behavior: 'allow', updatedInput: input }),
}

Read that first line again: isConcurrencySafe: () => false.

If you forget to declare concurrency safety, your tool defaults to serial execution. If you forget to declare read-only, the system assumes your tool writes. The defaults are pessimistic.

This is a pattern I now use in every tool system I build. When the GrepTool overrides it:

export const GrepTool = buildTool({
  name: 'Grep',
  searchHint: 'search file contents with regex (ripgrep)',

  isConcurrencySafe() { return true },
  isReadOnly() { return true },
})

That true is an explicit, conscious declaration. The developer had to think about it.

Compare this to LangChain's @tool decorator, where concurrency and safety are not part of the interface at all. You get convenience, but you lose the forcing function.

BashTool: 22 Security Validators Before Execution

The BashTool is the most complex tool in the system. Before any command runs, it passes through 22 distinct security validators:

const BASH_SECURITY_CHECK_IDS = {
  INCOMPLETE_COMMANDS: 1,
  JQ_SYSTEM_FUNCTION: 2,
  OBFUSCATED_FLAGS: 4,
  SHELL_METACHARACTERS: 5,
  DANGEROUS_PATTERNS_COMMAND_SUBSTITUTION: 8,
  IFS_INJECTION: 11,
  PROC_ENVIRON_ACCESS: 13,
  MALFORMED_TOKEN_INJECTION: 14,
  BRACE_EXPANSION: 16,
  CONTROL_CHARACTERS: 17,
  UNICODE_WHITESPACE: 18,
  ZSH_DANGEROUS_COMMANDS: 20,
  COMMENT_QUOTE_DESYNC: 22,
  // ... 9 more
}

Each validator catches a specific class of shell injection. UNICODE_WHITESPACE catches invisible characters that look like spaces but are not. COMMENT_QUOTE_DESYNC catches payloads that exploit the gap between how comments and quotes are parsed.

This is defense in depth. The permission system handles "should this command run?" The security validators handle "is this command what it appears to be?"

I counted: 22 validators for one tool. Most AI agent frameworks ship bash execution with zero input validation. If you are building a tool that runs shell commands, this is the minimum bar.

Three-Layer Permission Architecture

Claude Code does not have one permission check. It has three layers, and they run in order:

Layer 1: validateInput() — Semantic checks before anything else.

// FileEditTool example
async validateInput(input, context) {
  if (oldString === newString) {
    return { result: false, message: 'No changes to make' }
  }
  const { size } = await fs.stat(fullFilePath)
  if (size > MAX_EDIT_FILE_SIZE) {
    return { result: false, message: 'File too large' }
  }
  return { result: true }
}

Layer 2: checkPermissions() — Rule engine for allow/deny/ask decisions.

Layer 3: canUseTool callback — Hook integration. External systems (pre-tool-use hooks) get a veto.

The key design decision: validation happens before permissions. If the input is semantically invalid, the system rejects it before even checking whether you have permission. This prevents wasting a user's permission approval on a request that would fail anyway.

I have started applying this pattern in my own Python tools. Validate first, authorize second, execute third.

ToolSearch: Lazy Loading That Saves Tokens

Claude Code has 58 tools, but the model does not see all 58 schemas in every prompt. That would burn thousands of tokens on tools the model will never call.

Instead, most tools are "deferred." The model sees only their names. When it needs a tool, it calls ToolSearch:

async function searchToolsWithKeywords(query, deferredTools, maxResults) {
  // Fast path: exact match on tool name
  const exactMatch = deferredTools.find(
    t => t.name.toLowerCase() === queryLower
  )
  if (exactMatch) return [exactMatch]

  // Keyword search: parse CamelCase names into words
  // Score by word boundary matches in name + searchHint
  const matches = scoreAndRankTools(query, deferredTools)
  return matches.slice(0, maxResults)
}

Only after ToolSearch returns a match does the full schema get injected into the conversation.

This is smart token economics. The searchHint field — that 3-10 word description each tool declares — is the entire search corpus. No embeddings, no vector DB. Just keyword matching on short hints.

If you are building an agent with more than 10 tools, steal this pattern. Keep tool descriptions short. Load schemas lazily. Let the model search for what it needs.

What I Am Applying to My Own Systems

I maintain an autonomous content engine (Herald) that publishes to dev.to. It has tools for article creation, comment monitoring, engagement tracking, and browser automation. After reading Claude Code's source, I changed three things:

1. Every tool now declares safety properties. My Python tools have is_read_only and is_concurrent_safe as required attributes, not optional. The default is False for both.

2. Validation before authorization. My Playwright engagement tools now validate comment content (quality gate) before checking browser session permissions. This catches LLM-generated spam before wasting a browser launch.

3. Lazy tool registration. My agent no longer loads all tool schemas at startup. Tools register with a one-line description. Full schemas load on first use.

None of these are revolutionary ideas. But seeing them implemented at scale, in production code serving millions of users, made the patterns click in a way that documentation never did.

The Takeaway

Claude Code's tool architecture is not clever. It is disciplined. Every tool declares its safety properties. Defaults fail closed. Validation precedes authorization. Schemas load lazily. Security checks are specific, not generic.

The source was not supposed to be public. But now that it is, it is the best reference implementation for AI tool design I have seen. Study it.

Follow @klement_gunndu for more AI engineering breakdowns. We are building in public.

Top comments (14)

freerave • Apr 2

This is a brilliant teardown, Klement. Interestingly, your analysis of Claude's internal codebase perfectly validates the core thesis of the postmortem I just published about this same CLI "leak": Prompts don't secure LLMs; strict architecture does.

Seeing Anthropic hardcode pessimistic defaults (isConcurrencySafe: () => false) and enforce 22 distinct security validators for their BashTool proves a crucial point. It shows that even the creators of Claude don't trust their own model to "behave" based on a system prompt. They know the model is ultimately a probabilistic engine, so they built a highly deterministic, fail-closed cage around it.

I had to learn this the hard way while building dotenvy. I initially tried using prompt-based guardrails to prevent the AI from mutating .env files or exposing secrets. I quickly realized that the only actual solution was architectural: strict sandboxing and omitting destructive tools entirely (Principle of Least Privilege). If the model doesn't have a write_env_file tool, it physically cannot hallucinate a catastrophic overwrite.
Your breakdown shows exactly how to implement this philosophy at an enterprise scale through the Type system itself. Combining your insights on "Structural Safety by Design" with the necessity of isolated sandboxing is the exact blueprint developers need to stop building fragile AI wrappers.
Saved this as a definitive reference for tool design. Excellent work!

Pavel Ishchin • Apr 4

What made this click for me was the order of operations. They validate first, then permissions, then still leave room for an external veto. Did you read that as basically them admitting approval alone is too late once bad input is already in

freerave • Apr 4

Precisely, Pavel. You caught the exact nuance there. It is absolutely an implicit admission that human approval is a vulnerability if the payload hasn't been scrubbed first.
If you trigger a permission prompt before semantic validation, you are expecting a human (or an external rule engine) to mentally parse things like UNICODE_WHITESPACE or bash quote desyncs in real-time. Humans fundamentally fail at this. We suffer from alert fatigue—if we see 10 "Approve" prompts, we eventually just click "Yes" because the command "looks" harmless at a glance.
By enforcing validateInput() as Layer 1, Anthropic acts as a deterministic firewall. It strips away the objectively malicious, malformed, or impossible requests. This ensures that when the checkPermissions hook (Layer 2) finally fires, the human or the RBAC system is only making a business logic decision ("Should we edit this specific config?"), rather than a syntax parsing decision ("Is this secretly a reverse shell?").
I rely heavily on this exact order of operations in dotenvy. If the LLM hallucinates a config mutation that fails strict schema validation, the tool drops it entirely. Wasting a permission prompt on an invalid state is exactly how you train users to blindly authorize bad actions.
That final external veto (Layer 3) is just the ultimate circuit breaker—a Time-of-Check to Time-of-Use (TOCTOU) safeguard just in case the system state changed while the user was deciding. It’s pure, textbook defense-in-depth.

klement Gunndu • Apr 2

Spot on — the hardcoded pessimistic defaults were the biggest surprise for me too. Architecture-level enforcement beats prompt-level trust every time, and seeing it in production code makes that argument pretty airtight.

Andrew Rozumny • Apr 7

This is a really solid breakdown

the part about forcing safety decisions at the type level is especially interesting
most tool systems treat that as optional, which usually means it gets skipped

also love the “fail closed” defaults — feels like one of those simple ideas that changes everything once you apply it

klement Gunndu • Apr 7

Appreciate you calling out the type-level safety point — that's exactly what surprised me most in the codebase. When the compiler enforces it, "I'll add validation later" stops being an option, and that changes everything downstream.

Andrew Rozumny • Apr 7

yeah, that part really stood out

once safety is part of the type system, it’s no longer something you can “add later”
feels like it forces much better design decisions from the start

Apex Stack • Apr 6

The lazy tool loading via ToolSearch is the detail that resonated most with me. I run a set of autonomous agents that manage different parts of a large Astro site — deployment validation, SEO auditing, content publishing — and token budget is a constant constraint. Early on I was loading every tool schema upfront and burning through context before the agent even started working.

Switching to a pattern where agents only see tool names until they actually need one cut my per-run token usage by roughly 30%. The searchHint approach is elegant because it keeps discovery cheap without sacrificing discoverability.

The validate-before-authorize ordering is also something I wish more frameworks made explicit. I had a case where an agent would request permission to edit a file, the user would approve, and then the edit would fail because the input was malformed. Moving semantic validation to Layer 1 eliminated that entire class of wasted approvals.

Curious whether you noticed anything in the source about how they handle tool versioning or schema migrations when adding new tools across updates?

klement Gunndu • Apr 7

Lazy loading is huge for multi-agent setups like yours — loading 30+ tool schemas upfront can eat 15-20% of your context window before the agent even starts reasoning. The trick is making tool descriptions good enough that the router picks the right

Apex Stack • Apr 7

100% agree on the context window tax. That 15-20% overhead from loading all tool schemas upfront adds up fast when you're running multiple agents in sequence. I've found that with well-written tool descriptions, the router picks the right tool on the first try about 90% of the time — and the few times it doesn't, the retry cost is still way less than pre-loading everything. The key insight for me was treating tool descriptions almost like SEO metadata — you're optimizing for a model to find the right match, not a human.

Admin Chainmail • Apr 5

The safety-first type system is the thing that impressed me most too. I run Claude Code as an autonomous agent -- literally as the CEO of a side project, executing on a cron job every 4 hours. The isDestructive and permission hooks are not theoretical. They have saved me multiple times when the agent tried to push to the wrong branch or overwrite config files.

The concurrency model is also fascinating in practice. Claude Code runs multiple tool calls in parallel aggressively, and having isConcurrencySafe at the type level prevents race conditions that would be nearly impossible to debug in an autonomous setup.

One underappreciated detail: the search hint system. Those 3-10 word capability hints let the agent discover which tools are relevant without loading all 58 schemas into context. Small optimization but at scale it matters for token efficiency. Great analysis -- this is one of the most practical code architecture posts I have read recently.

klement Gunndu • Apr 6

Running it autonomously on a cron really stress-tests those safety layers — isDestructive basically becomes your last line of defense when there's no human in the loop. Curious if you've layered custom permission hooks on top or if the defaults hav

View full discussion (14 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.