TL;DR
The Claude Code source leak exposed a 512,000-line TypeScript codebase on March 31, 2026. Its architecture is a while-loop calling the Claude API, handling tool calls, and feeding results back. You can build your own version in Python using the Anthropic SDK and around 200 lines of code for the main loop. This guide breaks down each architectural layer and gives you ready-to-use code to build your own agent.
Introduction
On March 31, 2026, Anthropic accidentally shipped a 59.8 MB source map file in version 2.1.88 of their @anthropic-ai/claude-code npm package. Source maps reverse minified JavaScript to original source, and Bun’s bundler generated these by default—making the TypeScript codebase fully recoverable.
Within hours, developers mirrored the code across GitHub. The community dissected modules, from the master agent loop to hidden features like “undercover mode” and fake tool injection.
Reactions were split—some criticized security, others studied the architecture. The most actionable question: “Can I build this myself?”
Yes. The core patterns are straightforward. This guide walks through each architectural layer, explains Anthropic’s design decisions, and gives you working code to start. You’ll also see how to debug your agent’s API interactions using Apidog, which simplifies multi-turn API debugging.
What the Leak Revealed About Claude Code’s Architecture
The Codebase at a Glance
Claude Code (codename “Tengu”) spans about 1,900 files, organized as:
cli/ - Terminal UI (React + Ink)
tools/ - 40+ tool implementations
core/ - System prompts, permissions, constants
assistant/ - Agent orchestration
services/ - API calls, compaction, OAuth, telemetry
The CLI is a React app rendered through Ink (React for terminal), using Yoga for layout and ANSI codes for styling. Every UI element is a React component.
For your own agent, you don’t need this complexity. A simple REPL loop works.
The Master Agent Loop
At its core, Claude Code runs a while-loop:
- Send messages to the Claude API (system prompt + tool definitions)
- Receive a response with text and/or
tool_useblocks - Execute tools by dispatching handlers
- Append tool results to the message list
- If more tool calls, loop; else, return response to user
A "turn" is one round-trip. The loop continues until the response is plain text.
Here’s a minimal Python version of the core loop:
import anthropic
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
def agent_loop(system_prompt: str, tools: list, messages: list) -> str:
"""The core agent loop - keep calling until no more tool use."""
while True:
response = client.messages.create(
model=MODEL,
max_tokens=16384,
system=system_prompt,
tools=tools,
messages=messages,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return "".join(
block.text for block in response.content
if hasattr(block, "text")
)
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
Most complexity lives in tools, permissions, context, and memory—not the loop.
Building the Tool System
Why Dedicated Tools Beat a Single Bash Command
Claude Code uses dedicated tools for file operations—not just shell out to bash. Examples:
-
Readtool (notcat) -
Edittool (notsed) -
Greptool (notgrep) -
Globtool (notfind)
Why?
- Structured output: Tools return consistent, parseable results. Bash output is unpredictable.
- Safety: The BashTool blocks dangerous patterns; dedicated tools avoid this risk.
-
Token efficiency: Tool results can be truncated to save tokens; raw
catoutput wastes context.
The Essential Tool Set
For a minimal agent, start with these five tools:
TOOLS = [
{
"name": "read_file",
"description": "Read a file from the filesystem. Returns contents with line numbers.",
"input_schema": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Absolute path to the file"
},
"offset": {
"type": "integer",
"description": "Line number to start reading from (0-indexed)"
},
"limit": {
"type": "integer",
"description": "Max lines to read. Defaults to 2000."
}
},
"required": ["file_path"]
}
},
{
"name": "write_file",
"description": "Write content to a file. Creates the file if it doesn't exist.",
"input_schema": {
"type": "object",
"properties": {
"file_path": {"type": "string", "description": "Absolute path"},
"content": {"type": "string", "description": "File content to write"}
},
"required": ["file_path", "content"]
}
},
{
"name": "edit_file",
"description": "Replace a specific string in a file. The old_string must be unique.",
"input_schema": {
"type": "object",
"properties": {
"file_path": {"type": "string", "description": "Absolute path"},
"old_string": {"type": "string", "description": "Text to find"},
"new_string": {"type": "string", "description": "Replacement text"}
},
"required": ["file_path", "old_string", "new_string"]
}
},
{
"name": "run_command",
"description": "Execute a shell command and return stdout/stderr.",
"input_schema": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "Shell command to run"},
"timeout": {"type": "integer", "description": "Timeout in seconds. Default 120."}
},
"required": ["command"]
}
},
{
"name": "search_code",
"description": "Search for a regex pattern across files in a directory.",
"input_schema": {
"type": "object",
"properties": {
"pattern": {"type": "string", "description": "Regex pattern"},
"path": {"type": "string", "description": "Directory to search"},
"file_glob": {"type": "string", "description": "File pattern filter, e.g. '*.py'"}
},
"required": ["pattern"]
}
}
]
Tool Handler Dispatch
Map tool names to handler functions:
import subprocess
import os
import re
def execute_tool(name: str, params: dict) -> str:
"""Dispatch tool calls to their handlers."""
handlers = {
"read_file": handle_read_file,
"write_file": handle_write_file,
"edit_file": handle_edit_file,
"run_command": handle_run_command,
"search_code": handle_search_code,
}
handler = handlers.get(name)
if not handler:
return f"Error: Unknown tool '{name}'"
try:
return handler(params)
except Exception as e:
return f"Error: {str(e)}"
def handle_read_file(params: dict) -> str:
path = params["file_path"]
offset = params.get("offset", 0)
limit = params.get("limit", 2000)
with open(path, "r") as f:
lines = f.readlines()
selected = lines[offset:offset + limit]
numbered = [f"{i + offset + 1}\t{line}" for i, line in enumerate(selected)]
return "".join(numbered)
def handle_write_file(params: dict) -> str:
path = params["file_path"]
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w") as f:
f.write(params["content"])
return f"Successfully wrote to {path}"
def handle_edit_file(params: dict) -> str:
path = params["file_path"]
with open(path, "r") as f:
content = f.read()
old = params["old_string"]
if content.count(old) == 0:
return f"Error: '{old[:50]}...' not found in {path}"
if content.count(old) > 1:
return f"Error: '{old[:50]}...' matches {content.count(old)} locations. Be more specific."
new_content = content.replace(old, params["new_string"], 1)
with open(path, "w") as f:
f.write(new_content)
return f"Successfully edited {path}"
def handle_run_command(params: dict) -> str:
cmd = params["command"]
timeout = params.get("timeout", 120)
# Basic safety: block dangerous patterns
blocked = ["rm -rf /", "mkfs", "> /dev/"]
for pattern in blocked:
if pattern in cmd:
return f"Error: Blocked dangerous command pattern: {pattern}"
result = subprocess.run(
cmd, shell=True, capture_output=True, text=True,
timeout=timeout, cwd=os.getcwd()
)
output = ""
if result.stdout:
output += result.stdout
if result.stderr:
output += f"\nSTDERR:\n{result.stderr}"
if not output.strip():
output = f"Command completed with exit code {result.returncode}"
# Truncate large outputs
if len(output) > 30000:
output = output[:15000] + "\n\n... [truncated] ...\n\n" + output[-15000:]
return output
def handle_search_code(params: dict) -> str:
pattern = params["pattern"]
path = params.get("path", os.getcwd())
file_glob = params.get("file_glob", "")
cmd = ["grep", "-rn", "--include", file_glob, pattern, path] if file_glob else \
["grep", "-rn", pattern, path]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
if not result.stdout.strip():
return f"No matches found for pattern: {pattern}"
lines = result.stdout.strip().split("\n")
if len(lines) > 50:
return "\n".join(lines[:50]) + f"\n\n... ({len(lines) - 50} more matches)"
return result.stdout
Context Management: The Hard Problem
Why Context Matters More Than Prompt Engineering
Claude Code’s source shows more engineering went into context management than prompts. The context compressor (“wU2”) uses five strategies, but you need two for DIY:
- Auto-compaction: When conversation approaches context limit (trigger at ~92% usage), summarize and compact.
- CLAUDE.md re-injection: Re-inject project guidelines on every turn to keep the agent on track.
Building a Simple Compressor
def maybe_compact(messages: list, system_prompt: str, max_tokens: int = 180000) -> list:
"""Compact conversation when it gets too long."""
total_chars = sum(
len(str(m.get("content", ""))) for m in messages
)
estimated_tokens = total_chars // 4
if estimated_tokens < max_tokens * 0.85:
return messages # Not yet at the limit
summary_response = client.messages.create(
model=MODEL,
max_tokens=4096,
system="Summarize this conversation. Keep all file paths, decisions made, errors encountered, and current task state. Be specific about what was changed and why.",
messages=messages,
)
summary_text = summary_response.content[0].text
# Replace conversation with summary + recent messages
compacted = [
{"role": "user", "content": f"[Conversation summary]\n{summary_text}"},
{"role": "assistant", "content": "I have the context from our previous conversation. What should I work on next?"},
]
compacted.extend(messages[-4:])
return compacted
Re-injecting Project Context
Inject .claude/CLAUDE.md into every turn:
def build_system_prompt(project_dir: str) -> str:
"""Build system prompt with project context re-injection."""
base_prompt = """You are a coding assistant that helps with software engineering tasks.
You have access to tools for reading, writing, editing files, running commands, and searching code.
Always read files before modifying them. Prefer edit_file over write_file for existing files.
Keep responses concise. Focus on the code, not explanations."""
claude_md_path = os.path.join(project_dir, ".claude", "CLAUDE.md")
if os.path.exists(claude_md_path):
with open(claude_md_path, "r") as f:
project_context = f.read()
base_prompt += f"\n\n# Project guidelines\n{project_context}"
root_md = os.path.join(project_dir, "CLAUDE.md")
if os.path.exists(root_md):
with open(root_md, "r") as f:
root_context = f.read()
base_prompt += f"\n\n# Repository guidelines\n{root_context}"
return base_prompt
The Three-Layer Memory System
The leak shows Claude Code uses a three-tier memory architecture:
Layer 1: MEMORY.md (Always Loaded)
A short index, always in the system prompt, each entry <150 chars, capped at 200 lines.
- [User preferences](memory/user-prefs.md) - prefers TypeScript, uses Vim keybindings
- [API conventions](memory/api-conventions.md) - REST with JSON:API spec, snake_case
- [Deploy process](memory/deploy.md) - uses GitHub Actions, deploys to AWS EKS
Layer 2: Topic Files (Loaded On Demand)
Detailed files loaded when relevant, containing conventions and architectural details.
Layer 3: Session Transcripts (Searched, Never Read)
Full session logs, not loaded wholesale; agent searches for specific identifiers.
Building a Minimal Memory System
import json
MEMORY_DIR = ".agent/memory"
def load_memory_index() -> str:
"""Load the memory index for system prompt injection."""
index_path = os.path.join(MEMORY_DIR, "MEMORY.md")
if os.path.exists(index_path):
with open(index_path, "r") as f:
return f.read()
return ""
def save_memory(key: str, content: str, description: str):
"""Save a memory entry and update the index."""
os.makedirs(MEMORY_DIR, exist_ok=True)
filename = f"{key.replace(' ', '-').lower()}.md"
filepath = os.path.join(MEMORY_DIR, filename)
with open(filepath, "w") as f:
f.write(f"---\nname: {key}\ndescription: {description}\n---\n\n{content}")
index_path = os.path.join(MEMORY_DIR, "MEMORY.md")
index_lines = []
if os.path.exists(index_path):
with open(index_path, "r") as f:
index_lines = f.readlines()
new_entry = f"- [{key}]({filename}) - {description}\n"
updated = False
for i, line in enumerate(index_lines):
if filename in line:
index_lines[i] = new_entry
updated = True
break
if not updated:
index_lines.append(new_entry)
with open(index_path, "w") as f:
f.writelines(index_lines)
Add a save_memory tool so the agent can persist knowledge between sessions.
Adding a Permission System
The leak shows five permission modes: default, auto, bypass, yolo, and deny. Each tool is classified as LOW, MEDIUM, or HIGH risk.
For your agent, implement a simple three-tier system:
# Risk levels for operations
RISK_LEVELS = {
"read_file": "low",
"search_code": "low",
"edit_file": "medium",
"write_file": "medium",
"run_command": "high",
}
def check_permission(tool_name: str, params: dict, auto_approve_low: bool = True) -> bool:
"""Check if the user approves this tool call."""
risk = RISK_LEVELS.get(tool_name, "high")
if risk == "low" and auto_approve_low:
return True
print(f"\n--- Permission check ({risk.upper()} risk) ---")
print(f"Tool: {tool_name}")
for key, value in params.items():
display = str(value)[:200]
print(f" {key}: {display}")
response = input("Allow? [y/n/always]: ").strip().lower()
if response == "always":
RISK_LEVELS[tool_name] = "low" # Auto-approve this tool going forward
return True
return response == "y"
Testing Your Agent’s API Calls with Apidog
Building a coding agent requires frequent API calls to Claude. Debugging multi-turn tool-use conversations is tough with logs.
Apidog lets you inspect and test your agent’s requests. Here’s how to use it:
Capture and Replay API Requests
- Open Apidog and create a new project.
- Import the Anthropic Messages API endpoint:
POST https://api.anthropic.com/v1/messages. - Set up the request body with your system prompt, tools array, and messages.
- Replay captured requests with modified parameters to test individual turns.
This helps you isolate and debug tool-use turns, modify requests, and see how input changes affect responses.
Debug Multi-Turn Conversations
- Save the full
messagesarray as an environment variable after each turn. - Replay from any conversation point.
- Compare tool results between runs to track where behavior changes.
Validate Tool Schemas
Malformed tool schemas can cause silent failures. Import your tool schemas into Apidog and use its JSON Schema validator to catch issues before they hit the API.
Putting It All Together: The Complete REPL
Here’s a working coding agent REPL:
#!/usr/bin/env python3
"""A minimal Claude Code-style coding agent."""
import anthropic
import os
import sys
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
PROJECT_DIR = os.getcwd()
def main():
system_prompt = build_system_prompt(PROJECT_DIR)
memory = load_memory_index()
if memory:
system_prompt += f"\n\n# Memory\n{memory}"
messages = []
print("Coding agent ready. Type 'quit' to exit.\n")
while True:
user_input = input("> ").strip()
if user_input.lower() in ("quit", "exit"):
break
if not user_input:
continue
messages.append({"role": "user", "content": user_input})
# Compact if needed
messages = maybe_compact(messages, system_prompt)
# Re-inject project context every turn
current_system = build_system_prompt(PROJECT_DIR)
memory = load_memory_index()
if memory:
current_system += f"\n\n# Memory\n{memory}"
# Run the agent loop
result = agent_loop(current_system, TOOLS, messages)
print(f"\n{result}\n")
if __name__ == "__main__":
main()
This gives you a working agent in under 300 lines. It reads, edits, writes files, runs commands, searches codebases, manages context, and persists memory.
What to Add Next
After the core agent, consider adding:
Sub-Agents for Parallel Work
Claude Code spawns sub-agents for independent tasks. Spawn a new agent_loop() with a focused task and subset of tools, return the result.
File-Read Deduplication
Track file modification times. If unchanged since last read, skip and tell the model “file unchanged since last read”—saving tokens.
Output Truncation and Sampling
Truncate large tool outputs and report how many results were omitted. Prevents large outputs from blowing your context window.
Auto-Compaction with File Re-Injection
After summarizing, re-inject recently accessed file contents (up to 5,000 tokens/file) so the agent keeps working knowledge post-compaction.
What We Learned from the Leak
- The core loop is simple: All complexity is in tools and context, not prompt engineering.
- Dedicated tools outperform bash: Structured, purpose-built tools give better results per token.
- Memory needs layers: Always-loaded index, on-demand files, and grep-only transcripts balance recall and context cost.
- Context management is the product: Auto-compaction, guideline re-injection, and output truncation enable long coding sessions.
- The harness is the product, not the model: The model is just intelligence. Your code provides perception, action, and memory.
To test and debug your agent’s API interactions—including multi-turn tool-use conversations, request schemas, and response validation—try Apidog free. It handles API debugging so you can focus on building your agent logic.
FAQ
Can I legally use patterns from the Claude Code leak?
Yes. The architecture (while-loop, tool dispatch) follows patterns documented in Anthropic’s API docs. Do not copy code verbatim, but recreating the architecture with your own code is standard practice.
What model should I use for a DIY coding agent?
Claude Sonnet 4.6 is a good balance of speed and capability. Opus 4.6 is better for complex decisions but is slower and more expensive. Haiku 4.5 works for basic edits/searches and is much cheaper.
How much does it cost to run your own coding agent?
A typical session (30-50 turns) with Claude Sonnet 4.6 costs $1-5 in API fees. Main driver is context window size; aggressive compaction keeps costs down.
Why does Claude Code use React for a terminal app?
Ink (React for terminals) lets the team reuse React’s component model for UI interactions. For a DIY project, a simple input() / print() REPL is enough.
What’s the most important feature to build after the core loop?
The permission system. Without it, the agent can overwrite files and run commands without user oversight. Even a basic “confirm before write/execute” prevents most accidental damage.
How does Claude Code handle errors from tool calls?
Tool errors are returned as text in the tool_result message. The model decides whether to retry, use a different approach, or ask the user.
Can I use this with models other than Claude?
Yes. The tool-use pattern works with models that support function calling (GPT-4, Gemini, Llama, etc). Adapt the API call format as needed.
How do I prevent the agent from running dangerous commands?
Blocklist dangerous patterns (rm -rf /, mkfs, etc) and require explicit approval for all run_command calls. Classify each tool as LOW, MEDIUM, or HIGH risk and prompt or block accordingly.

Top comments (0)