I Tested the Fastest-Growing GitHub Repo Ever — Here Is What Claw Code Actually Does

#opensource #coding #ai

Last week, a packaging mistake at Anthropic leaked 512,000 lines of Claude Code's source. Within 48 hours, someone rebuilt the entire agent harness from scratch. That project — Claw Code — just crossed 100,000 GitHub stars.

I installed it, pointed it at three of my projects, and ran it alongside Claude Code to see what holds up.

What Claw Code Actually Is

Claw Code is a Python/Rust rewrite of the agent architecture behind Claude Code. It reads your codebase, plans edits, runs terminal commands, and iterates until the task is done. The key difference: it works with any LLM — GPT-4.1, Claude via API, Gemini, even local models through Ollama.

The project calls itself a "clean-room reimplementation," meaning the developers studied the leaked architecture and rebuilt without copying Anthropic's proprietary code. Whether that distinction matters legally is still an open question.

How It Performed

Simple bug fix (missing null check in a Prisma query): Claw Code found and fixed it in 47 seconds. Claude Code did it in 30. Close enough for a v0.3 project.

Multi-file feature (adding a user preference toggle across 5 files): Claw Code got 4 out of 5 right on the first pass. It missed the test file. Claude Code nailed all 5.

Major refactoring (extracting a payment service from a 2,000-line monolith): This is where things broke down. Claw Code got stuck in a loop trying to import a module it hadn't created yet. After three restarts, it worked — but took roughly 8x longer than Claude Code.

Where It Falls Short

Multi-agent orchestration crashes on complex tasks
No rollback mechanism — bad edits mean manual git revert
Windows support is flaky (path handling bugs)
Memory management is primitive — it truncates rather than compresses context, so it "forgets" earlier parts of long sessions

The Model-Agnostic Advantage

The one area where Claw Code clearly wins: LLM flexibility. I ran it with GPT-4.1, Claude Sonnet 4.6 (via API), and a local Qwen3 30B through Ollama. Each worked, though quality varied dramatically. For teams locked into a specific LLM provider or running sensitive code that can't leave their network, this matters.

Should You Try It?

If you want to understand how AI coding agents work under the hood — absolutely. The codebase is readable, well-structured, and teaches you more about agent architecture than any blog post.

If you need a reliable daily driver — stick with Claude Code, Cursor, or Codex. Claw Code is a fascinating experiment, not a production tool.

For a detailed comparison with benchmarks and methodology, I wrote a full review here.