DEV Community

ONE WALL AI Publishing
ONE WALL AI Publishing

Posted on

I Used 510,000 Lines of Leaked Claude Code to Build a Local AI Agent on a Consumer GPU

Last month, Anthropic accidentally leaked 510,000 lines of Claude Code's TypeScript source via an npm packaging error. Inside was the complete architecture of how Claude Code manages tools, memory, and model behavior.

I took those design principles and applied them to a 6.6GB open-source model (qwen3.5:9b) running on my RTX 5070 Ti. No cloud API. No monthly fees. Everything runs locally.

What I Built

A 605-line Python engine that turns a local 9B model into a working AI agent. It can:

  • Read your code and do professional-level code review (found 8 real bugs in my 800-line production script)
  • Write complete Python projects with tests and run them
  • Debug code — found and fixed 3 intentionally planted bugs in 44 seconds
  • Search the web and compile structured reports
  • Handle errors autonomously — when pip install failed, it found --break-system-packages on its own

18 tests. Zero failures. On a consumer GPU.

The Key Insight: Model Intelligence ≠ Agent Capability

I tested 4 models. Google's Gemma 4 was faster (144 tok/s vs 106), more token-efficient (14x less), and had better tool selection accuracy (5/5 vs 3/5).

But when I plugged it into the full engine? It refused to use tools. Zero tool calls on the first task. Zero output.

qwen3.5:9b won because it's obedient, not because it's smarter. It follows the shell's discipline — hard cutoffs, structured prompts, memory management — while Gemma 4 ignored them.

Choose the most disciplined model, not the smartest one.

13 Optimizations from the Leaked Architecture

Every optimization was A/B tested:

# Optimization Effect
1 Structured prompts +600% quality
2 MicroCompact compression +500% context efficiency
3 think=false +800% token efficiency
4 ToolSearch deferred loading -60% prompt space
5 4-type memory system Personalized responses
6 Write-then-verify Prevents memory pollution
7-13 Hard cutoff, parallel boot, cache optimization... Various

The Real Ceiling: Not Intelligence — Self-Discipline

The 9B model can do code review on 800 lines of bash. But it can't follow "stop reading at step 6 and write your report."

The fix came straight from the leaked source: the model thinks, the shell disciplines. Remove all tools at step N+1, force text output. From 0 words to 6,080 bytes with one line of code: tools=None.

Everything is Open Source

  • Book: "Local AI Agent Playbook" — full walkthrough with real test data ($19.99)
  • Code: GitHub repo — 605-line engine, 44 tool definitions, Telegram bot, all test data (FREE, MIT License)
  • 44 tools: Inspired by Claude Code's architecture, covering file ops, git, web, docker, system monitoring, and more

The code is free. The book teaches you why each decision was made and what traps to avoid.

Get Started

git clone https://github.com/jack19880620/local-agent-playbook
cd local-agent-playbook
ollama pull qwen3.5:9b
python3 my-agent.py "list files in /tmp"
Enter fullscreen mode Exit fullscreen mode

If you build something with it, I'd love to see it.


25% affiliate commission available on Gumroad if you want to share the book.

Top comments (0)