DEV Community

Cover image for How to Build Your Own Claude Code?
Wanda
Wanda

Posted on • Originally published at apidog.com

How to Build Your Own Claude Code?

TL;DR

The Claude Code source leak exposed a 512,000-line TypeScript codebase on March 31, 2026. Its architecture is a while-loop calling the Claude API, handling tool calls, and feeding results back. You can build your own version in Python using the Anthropic SDK and around 200 lines of code for the main loop. This guide breaks down each architectural layer and gives you ready-to-use code to build your own agent.

Try Apidog today

Introduction

On March 31, 2026, Anthropic accidentally shipped a 59.8 MB source map file in version 2.1.88 of their @anthropic-ai/claude-code npm package. Source maps reverse minified JavaScript to original source, and Bun’s bundler generated these by default—making the TypeScript codebase fully recoverable.

Within hours, developers mirrored the code across GitHub. The community dissected modules, from the master agent loop to hidden features like “undercover mode” and fake tool injection.

Reactions were split—some criticized security, others studied the architecture. The most actionable question: “Can I build this myself?”

Yes. The core patterns are straightforward. This guide walks through each architectural layer, explains Anthropic’s design decisions, and gives you working code to start. You’ll also see how to debug your agent’s API interactions using Apidog, which simplifies multi-turn API debugging.

What the Leak Revealed About Claude Code’s Architecture

The Codebase at a Glance

Claude Code (codename “Tengu”) spans about 1,900 files, organized as:

cli/          - Terminal UI (React + Ink)
tools/        - 40+ tool implementations
core/         - System prompts, permissions, constants
assistant/    - Agent orchestration
services/     - API calls, compaction, OAuth, telemetry
Enter fullscreen mode Exit fullscreen mode

The CLI is a React app rendered through Ink (React for terminal), using Yoga for layout and ANSI codes for styling. Every UI element is a React component.

For your own agent, you don’t need this complexity. A simple REPL loop works.

The Master Agent Loop

At its core, Claude Code runs a while-loop:

  1. Send messages to the Claude API (system prompt + tool definitions)
  2. Receive a response with text and/or tool_use blocks
  3. Execute tools by dispatching handlers
  4. Append tool results to the message list
  5. If more tool calls, loop; else, return response to user

A "turn" is one round-trip. The loop continues until the response is plain text.

Here’s a minimal Python version of the core loop:

import anthropic

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"

def agent_loop(system_prompt: str, tools: list, messages: list) -> str:
    """The core agent loop - keep calling until no more tool use."""
    while True:
        response = client.messages.create(
            model=MODEL,
            max_tokens=16384,
            system=system_prompt,
            tools=tools,
            messages=messages,
        )

        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason != "tool_use":
            return "".join(
                block.text for block in response.content
                if hasattr(block, "text")
            )

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })

        messages.append({"role": "user", "content": tool_results})
Enter fullscreen mode Exit fullscreen mode

Most complexity lives in tools, permissions, context, and memory—not the loop.

Building the Tool System

Why Dedicated Tools Beat a Single Bash Command

Claude Code uses dedicated tools for file operations—not just shell out to bash. Examples:

  • Read tool (not cat)
  • Edit tool (not sed)
  • Grep tool (not grep)
  • Glob tool (not find)

Why?

  • Structured output: Tools return consistent, parseable results. Bash output is unpredictable.
  • Safety: The BashTool blocks dangerous patterns; dedicated tools avoid this risk.
  • Token efficiency: Tool results can be truncated to save tokens; raw cat output wastes context.

The Essential Tool Set

For a minimal agent, start with these five tools:

TOOLS = [
    {
        "name": "read_file",
        "description": "Read a file from the filesystem. Returns contents with line numbers.",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {
                    "type": "string",
                    "description": "Absolute path to the file"
                },
                "offset": {
                    "type": "integer",
                    "description": "Line number to start reading from (0-indexed)"
                },
                "limit": {
                    "type": "integer",
                    "description": "Max lines to read. Defaults to 2000."
                }
            },
            "required": ["file_path"]
        }
    },
    {
        "name": "write_file",
        "description": "Write content to a file. Creates the file if it doesn't exist.",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {"type": "string", "description": "Absolute path"},
                "content": {"type": "string", "description": "File content to write"}
            },
            "required": ["file_path", "content"]
        }
    },
    {
        "name": "edit_file",
        "description": "Replace a specific string in a file. The old_string must be unique.",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {"type": "string", "description": "Absolute path"},
                "old_string": {"type": "string", "description": "Text to find"},
                "new_string": {"type": "string", "description": "Replacement text"}
            },
            "required": ["file_path", "old_string", "new_string"]
        }
    },
    {
        "name": "run_command",
        "description": "Execute a shell command and return stdout/stderr.",
        "input_schema": {
            "type": "object",
            "properties": {
                "command": {"type": "string", "description": "Shell command to run"},
                "timeout": {"type": "integer", "description": "Timeout in seconds. Default 120."}
            },
            "required": ["command"]
        }
    },
    {
        "name": "search_code",
        "description": "Search for a regex pattern across files in a directory.",
        "input_schema": {
            "type": "object",
            "properties": {
                "pattern": {"type": "string", "description": "Regex pattern"},
                "path": {"type": "string", "description": "Directory to search"},
                "file_glob": {"type": "string", "description": "File pattern filter, e.g. '*.py'"}
            },
            "required": ["pattern"]
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

Tool Handler Dispatch

Map tool names to handler functions:

import subprocess
import os
import re

def execute_tool(name: str, params: dict) -> str:
    """Dispatch tool calls to their handlers."""
    handlers = {
        "read_file": handle_read_file,
        "write_file": handle_write_file,
        "edit_file": handle_edit_file,
        "run_command": handle_run_command,
        "search_code": handle_search_code,
    }

    handler = handlers.get(name)
    if not handler:
        return f"Error: Unknown tool '{name}'"

    try:
        return handler(params)
    except Exception as e:
        return f"Error: {str(e)}"


def handle_read_file(params: dict) -> str:
    path = params["file_path"]
    offset = params.get("offset", 0)
    limit = params.get("limit", 2000)

    with open(path, "r") as f:
        lines = f.readlines()

    selected = lines[offset:offset + limit]
    numbered = [f"{i + offset + 1}\t{line}" for i, line in enumerate(selected)]
    return "".join(numbered)


def handle_write_file(params: dict) -> str:
    path = params["file_path"]
    os.makedirs(os.path.dirname(path), exist_ok=True)
    with open(path, "w") as f:
        f.write(params["content"])
    return f"Successfully wrote to {path}"


def handle_edit_file(params: dict) -> str:
    path = params["file_path"]
    with open(path, "r") as f:
        content = f.read()

    old = params["old_string"]
    if content.count(old) == 0:
        return f"Error: '{old[:50]}...' not found in {path}"
    if content.count(old) > 1:
        return f"Error: '{old[:50]}...' matches {content.count(old)} locations. Be more specific."

    new_content = content.replace(old, params["new_string"], 1)
    with open(path, "w") as f:
        f.write(new_content)
    return f"Successfully edited {path}"


def handle_run_command(params: dict) -> str:
    cmd = params["command"]
    timeout = params.get("timeout", 120)

    # Basic safety: block dangerous patterns
    blocked = ["rm -rf /", "mkfs", "> /dev/"]
    for pattern in blocked:
        if pattern in cmd:
            return f"Error: Blocked dangerous command pattern: {pattern}"

    result = subprocess.run(
        cmd, shell=True, capture_output=True, text=True,
        timeout=timeout, cwd=os.getcwd()
    )

    output = ""
    if result.stdout:
        output += result.stdout
    if result.stderr:
        output += f"\nSTDERR:\n{result.stderr}"
    if not output.strip():
        output = f"Command completed with exit code {result.returncode}"

    # Truncate large outputs
    if len(output) > 30000:
        output = output[:15000] + "\n\n... [truncated] ...\n\n" + output[-15000:]

    return output


def handle_search_code(params: dict) -> str:
    pattern = params["pattern"]
    path = params.get("path", os.getcwd())
    file_glob = params.get("file_glob", "")

    cmd = ["grep", "-rn", "--include", file_glob, pattern, path] if file_glob else \
          ["grep", "-rn", pattern, path]

    result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)

    if not result.stdout.strip():
        return f"No matches found for pattern: {pattern}"

    lines = result.stdout.strip().split("\n")
    if len(lines) > 50:
        return "\n".join(lines[:50]) + f"\n\n... ({len(lines) - 50} more matches)"
    return result.stdout
Enter fullscreen mode Exit fullscreen mode

Context Management: The Hard Problem

Why Context Matters More Than Prompt Engineering

Claude Code’s source shows more engineering went into context management than prompts. The context compressor (“wU2”) uses five strategies, but you need two for DIY:

  • Auto-compaction: When conversation approaches context limit (trigger at ~92% usage), summarize and compact.
  • CLAUDE.md re-injection: Re-inject project guidelines on every turn to keep the agent on track.

Building a Simple Compressor

def maybe_compact(messages: list, system_prompt: str, max_tokens: int = 180000) -> list:
    """Compact conversation when it gets too long."""
    total_chars = sum(
        len(str(m.get("content", ""))) for m in messages
    )
    estimated_tokens = total_chars // 4

    if estimated_tokens < max_tokens * 0.85:
        return messages  # Not yet at the limit

    summary_response = client.messages.create(
        model=MODEL,
        max_tokens=4096,
        system="Summarize this conversation. Keep all file paths, decisions made, errors encountered, and current task state. Be specific about what was changed and why.",
        messages=messages,
    )

    summary_text = summary_response.content[0].text

    # Replace conversation with summary + recent messages
    compacted = [
        {"role": "user", "content": f"[Conversation summary]\n{summary_text}"},
        {"role": "assistant", "content": "I have the context from our previous conversation. What should I work on next?"},
    ]

    compacted.extend(messages[-4:])

    return compacted
Enter fullscreen mode Exit fullscreen mode

Re-injecting Project Context

Inject .claude/CLAUDE.md into every turn:

def build_system_prompt(project_dir: str) -> str:
    """Build system prompt with project context re-injection."""
    base_prompt = """You are a coding assistant that helps with software engineering tasks.
You have access to tools for reading, writing, editing files, running commands, and searching code.
Always read files before modifying them. Prefer edit_file over write_file for existing files.
Keep responses concise. Focus on the code, not explanations."""

    claude_md_path = os.path.join(project_dir, ".claude", "CLAUDE.md")
    if os.path.exists(claude_md_path):
        with open(claude_md_path, "r") as f:
            project_context = f.read()
        base_prompt += f"\n\n# Project guidelines\n{project_context}"

    root_md = os.path.join(project_dir, "CLAUDE.md")
    if os.path.exists(root_md):
        with open(root_md, "r") as f:
            root_context = f.read()
        base_prompt += f"\n\n# Repository guidelines\n{root_context}"

    return base_prompt
Enter fullscreen mode Exit fullscreen mode

The Three-Layer Memory System

The leak shows Claude Code uses a three-tier memory architecture:

Layer 1: MEMORY.md (Always Loaded)

A short index, always in the system prompt, each entry <150 chars, capped at 200 lines.

- [User preferences](memory/user-prefs.md) - prefers TypeScript, uses Vim keybindings
- [API conventions](memory/api-conventions.md) - REST with JSON:API spec, snake_case
- [Deploy process](memory/deploy.md) - uses GitHub Actions, deploys to AWS EKS
Enter fullscreen mode Exit fullscreen mode

Layer 2: Topic Files (Loaded On Demand)

Detailed files loaded when relevant, containing conventions and architectural details.

Layer 3: Session Transcripts (Searched, Never Read)

Full session logs, not loaded wholesale; agent searches for specific identifiers.

Building a Minimal Memory System

import json

MEMORY_DIR = ".agent/memory"

def load_memory_index() -> str:
    """Load the memory index for system prompt injection."""
    index_path = os.path.join(MEMORY_DIR, "MEMORY.md")
    if os.path.exists(index_path):
        with open(index_path, "r") as f:
            return f.read()
    return ""


def save_memory(key: str, content: str, description: str):
    """Save a memory entry and update the index."""
    os.makedirs(MEMORY_DIR, exist_ok=True)

    filename = f"{key.replace(' ', '-').lower()}.md"
    filepath = os.path.join(MEMORY_DIR, filename)
    with open(filepath, "w") as f:
        f.write(f"---\nname: {key}\ndescription: {description}\n---\n\n{content}")

    index_path = os.path.join(MEMORY_DIR, "MEMORY.md")
    index_lines = []
    if os.path.exists(index_path):
        with open(index_path, "r") as f:
            index_lines = f.readlines()

    new_entry = f"- [{key}]({filename}) - {description}\n"
    updated = False
    for i, line in enumerate(index_lines):
        if filename in line:
            index_lines[i] = new_entry
            updated = True
            break
    if not updated:
        index_lines.append(new_entry)

    with open(index_path, "w") as f:
        f.writelines(index_lines)
Enter fullscreen mode Exit fullscreen mode

Add a save_memory tool so the agent can persist knowledge between sessions.

Adding a Permission System

The leak shows five permission modes: default, auto, bypass, yolo, and deny. Each tool is classified as LOW, MEDIUM, or HIGH risk.

For your agent, implement a simple three-tier system:

# Risk levels for operations
RISK_LEVELS = {
    "read_file": "low",
    "search_code": "low",
    "edit_file": "medium",
    "write_file": "medium",
    "run_command": "high",
}

def check_permission(tool_name: str, params: dict, auto_approve_low: bool = True) -> bool:
    """Check if the user approves this tool call."""
    risk = RISK_LEVELS.get(tool_name, "high")

    if risk == "low" and auto_approve_low:
        return True

    print(f"\n--- Permission check ({risk.upper()} risk) ---")
    print(f"Tool: {tool_name}")
    for key, value in params.items():
        display = str(value)[:200]
        print(f"  {key}: {display}")

    response = input("Allow? [y/n/always]: ").strip().lower()
    if response == "always":
        RISK_LEVELS[tool_name] = "low"  # Auto-approve this tool going forward
        return True
    return response == "y"
Enter fullscreen mode Exit fullscreen mode

Testing Your Agent’s API Calls with Apidog

Building a coding agent requires frequent API calls to Claude. Debugging multi-turn tool-use conversations is tough with logs.

Apidog Screenshot

Apidog lets you inspect and test your agent’s requests. Here’s how to use it:

Capture and Replay API Requests

  1. Open Apidog and create a new project.
  2. Import the Anthropic Messages API endpoint: POST https://api.anthropic.com/v1/messages.
  3. Set up the request body with your system prompt, tools array, and messages.
  4. Replay captured requests with modified parameters to test individual turns.

This helps you isolate and debug tool-use turns, modify requests, and see how input changes affect responses.

Debug Multi-Turn Conversations

  • Save the full messages array as an environment variable after each turn.
  • Replay from any conversation point.
  • Compare tool results between runs to track where behavior changes.

Validate Tool Schemas

Malformed tool schemas can cause silent failures. Import your tool schemas into Apidog and use its JSON Schema validator to catch issues before they hit the API.

Putting It All Together: The Complete REPL

Here’s a working coding agent REPL:

#!/usr/bin/env python3
"""A minimal Claude Code-style coding agent."""

import anthropic
import os
import sys

client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
PROJECT_DIR = os.getcwd()


def main():
    system_prompt = build_system_prompt(PROJECT_DIR)
    memory = load_memory_index()
    if memory:
        system_prompt += f"\n\n# Memory\n{memory}"

    messages = []
    print("Coding agent ready. Type 'quit' to exit.\n")

    while True:
        user_input = input("> ").strip()
        if user_input.lower() in ("quit", "exit"):
            break
        if not user_input:
            continue

        messages.append({"role": "user", "content": user_input})

        # Compact if needed
        messages = maybe_compact(messages, system_prompt)

        # Re-inject project context every turn
        current_system = build_system_prompt(PROJECT_DIR)
        memory = load_memory_index()
        if memory:
            current_system += f"\n\n# Memory\n{memory}"

        # Run the agent loop
        result = agent_loop(current_system, TOOLS, messages)
        print(f"\n{result}\n")


if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

This gives you a working agent in under 300 lines. It reads, edits, writes files, runs commands, searches codebases, manages context, and persists memory.

What to Add Next

After the core agent, consider adding:

Sub-Agents for Parallel Work

Claude Code spawns sub-agents for independent tasks. Spawn a new agent_loop() with a focused task and subset of tools, return the result.

File-Read Deduplication

Track file modification times. If unchanged since last read, skip and tell the model “file unchanged since last read”—saving tokens.

Output Truncation and Sampling

Truncate large tool outputs and report how many results were omitted. Prevents large outputs from blowing your context window.

Auto-Compaction with File Re-Injection

After summarizing, re-inject recently accessed file contents (up to 5,000 tokens/file) so the agent keeps working knowledge post-compaction.

What We Learned from the Leak

  • The core loop is simple: All complexity is in tools and context, not prompt engineering.
  • Dedicated tools outperform bash: Structured, purpose-built tools give better results per token.
  • Memory needs layers: Always-loaded index, on-demand files, and grep-only transcripts balance recall and context cost.
  • Context management is the product: Auto-compaction, guideline re-injection, and output truncation enable long coding sessions.
  • The harness is the product, not the model: The model is just intelligence. Your code provides perception, action, and memory.

To test and debug your agent’s API interactions—including multi-turn tool-use conversations, request schemas, and response validation—try Apidog free. It handles API debugging so you can focus on building your agent logic.

FAQ

Can I legally use patterns from the Claude Code leak?

Yes. The architecture (while-loop, tool dispatch) follows patterns documented in Anthropic’s API docs. Do not copy code verbatim, but recreating the architecture with your own code is standard practice.

What model should I use for a DIY coding agent?

Claude Sonnet 4.6 is a good balance of speed and capability. Opus 4.6 is better for complex decisions but is slower and more expensive. Haiku 4.5 works for basic edits/searches and is much cheaper.

How much does it cost to run your own coding agent?

A typical session (30-50 turns) with Claude Sonnet 4.6 costs $1-5 in API fees. Main driver is context window size; aggressive compaction keeps costs down.

Why does Claude Code use React for a terminal app?

Ink (React for terminals) lets the team reuse React’s component model for UI interactions. For a DIY project, a simple input() / print() REPL is enough.

What’s the most important feature to build after the core loop?

The permission system. Without it, the agent can overwrite files and run commands without user oversight. Even a basic “confirm before write/execute” prevents most accidental damage.

How does Claude Code handle errors from tool calls?

Tool errors are returned as text in the tool_result message. The model decides whether to retry, use a different approach, or ask the user.

Can I use this with models other than Claude?

Yes. The tool-use pattern works with models that support function calling (GPT-4, Gemini, Llama, etc). Adapt the API call format as needed.

How do I prevent the agent from running dangerous commands?

Blocklist dangerous patterns (rm -rf /, mkfs, etc) and require explicit approval for all run_command calls. Classify each tool as LOW, MEDIUM, or HIGH risk and prompt or block accordingly.

Top comments (0)