Teruo Kunihiro

Posted on Apr 2

What I Learned from Reading Claude Code’s Reconstructed Source

#ai #cli #javascript #softwareengineering

510,000 lines of Bun and React/Ink code

What I Learned from Reading Claude Code’s Reconstructed Source

Around March 31, 2026, it became widely known that parts of Claude Code CLI’s implementation could be reconstructed from source maps that had remained in the npm package. A public mirror circulated for a while, but it was not an official open-source release by Anthropic, and it has since turned into a different project.

This post is a memo of my own impressions after reading a reconstructed copy of the source that I had saved locally at the time. Rather than discussing the current state of any public mirror, I want to focus on the design characteristics that became visible from actually tracing through the code.

My first impression: this is a much larger product codebase than I expected

The first thing that surprised me was the sheer size of the codebase. In the reconstructed source I had on hand, there were roughly 1,900 files and about 510,000 lines of code. This is not a small single-purpose CLI. It is a fairly large product codebase that bundles terminal UI, tool execution, safety controls, IDE integration, memory, and extension mechanisms into one system.

Technically, the project appears to be centered on TypeScript, with Bun as the runtime and a React/Ink-style stack for the terminal UI. In other words, it felt less like “a small CLI with some AI added on top” and more like “a substantial TypeScript product with an AI experience layered into it.”

The prompts live on the client side more than I expected

One of the easiest things to start tracing in this codebase is prompt construction. At least within the portion that could be reconstructed, a surprisingly large part of the system instruction layer is present in the client-side code, where runtime context is then injected into it.

That runtime context includes things like the current date, Git state, recent commits, Git user information, and the contents of local instruction files. On top of that foundation, additional instructions and memory-related text are composed into something close to the final system prompt.

What I found especially interesting was that the intuitive assumption that “the real prompt must be assembled as a black box on the server side” did not seem to hold very well here, at least not within the portion of the code I could inspect. That does not prove there is no additional server-side processing, of course. But it does show that a significant amount of the prompt logic also exists on the client side.

In tool design, what matters is not the number of tools but how they are exposed and controlled

Another striking part of the design is the layer that decides which tools are visible to the model and the separate layer that manages execution permissions. The system is clearly feature-rich, but there is a fairly sharp distinction between tools that are exposed routinely and tools that are internal, behind feature flags, or otherwise conditionally enabled.

My impression was fairly simple: this codebase does not look like it was built around the idea that “more tools automatically make the system stronger.” If anything, it seems closer to the opposite view: the surface that is exposed to the model in normal operation should be kept as narrow as possible.

There are also implementation details suggesting that the tool list itself has to stay aligned with prompt caching. That means the number of tools and their schemas are not just implementation details; they appear to be part of stable prompt operation as well.

This lines up quite well with the increasingly common intuition that “fewer tools often lead to more stable behavior.” That said, this is my interpretation of the code, not an explicit principle written down in those exact words.

Bash is not “just a way to run shell commands”

The shell execution layer was one of the most memorable parts of the codebase for me. What is going on there is not simply command execution.

Commands are categorized into groups such as search-oriented commands, read-oriented commands, listing commands, and commands where silence on success is the natural behavior. Exit codes are also normalized in command-specific ways. For example, the 1 returned by grep-like commands is not always treated as a plain error; it can be reinterpreted as “no match found.”

On top of that, commands that are considered read-only are guarded by allowlist-based flag checks, path validation, sed-specific restrictions, sandbox eligibility checks, and even AST-based safety checks. For more complex compound commands, there are also explicit upper bounds on the fan-out of the safety analysis.

So while Bash is clearly a powerful general-purpose tool inside Claude Code, it does not look like something the model is given raw. Instead, it seems to sit on top of a fairly thick deterministic scaffold before the model is allowed to use it.

The comments are unusually good

Another thing that stood out was the quality of the comments. By that, I do not just mean that there are many comments.

In several places, the comments explain not only what the code is doing but why certain decisions were made: why a heavy operation needs to run before imports, why a given validator is necessary, or why a particular flag should not be treated as safe. They carry background reasoning, not just surface-level description.

That makes the code easier for humans to follow, of course, but it also felt like the sort of writing that would remain legible to future code-completion systems or coding agents as well.

People often say these days that comments should be kept to a minimum. But reading code like this is a good reminder that good comments are not clutter. They are part of the design.

Even the startup path shows product-level polish

Looking around the entry path, it becomes clear that this product is not only concerned with adding features. It is also carefully tuned around perceived performance. The code is explicit about which side effects should run before heavier imports and what can be parallelized to reduce startup latency.

When people talk about AI agents, attention tends to go first to prompts and loops. But in practice, details like startup optimization and other non-AI engineering work are often what determine how polished the product feels.

“Being visible” is not the same thing as “being open source”

Finally, I want to emphasize the most important point.

What became visible in this case was that some source code could be read because of the way published artifacts were left exposed. That is not the same thing as Anthropic officially releasing Claude Code as open source.

Those two things need to be kept clearly separate. Anthropic’s current terms include restrictions aimed at preventing the construction of competing products, service replication, and reverse engineering. So treating this as an interesting code-reading exercise is one thing; assuming that the code can therefore be freely reused or redistributed is something else entirely.

There is value in reading it. But “readable” and “freely usable” are not the same thing, and it is important not to blur that distinction.

Conclusion

What made this source-reading exercise interesting was not a generic takeaway like “Claude Code runs an agentic loop.” The more interesting part was seeing, in concrete form, which parts were made deterministic, which parts were injected as runtime context, and where the safety mechanisms were made deliberately thick.

At least within the portion that could be reconstructed, the prompts were more client-side than I expected, Bash was more heavily guarded than I expected, the tool surface was narrower than I expected, and the comments were more thoughtful than I expected. The overall codebase is well organized, but at the same time it still has a little of the human roughness you would expect in a real product—for example, the way prompt construction seems to be spread across multiple layers.

That mix of order and messiness is part of what makes the codebase interesting to me. In the end, that is what I wanted to capture in this memo.

Top comments (4)

Narnaiezzsshaa Truong • Apr 3

I love this—the way you framed it is exactly how engineers really learn: not from tutorials, but from walking through a real, imperfect, opinionated codebase and noticing what the builders cared about. What you captured is a deep technical ethnography of a codebase. That’s a niche skill. It’s also a rare pleasure.

Seb Hoek • Apr 4

Thanks for sharing this. Very insightful!

Jonathan Murray • Apr 6

Reading reconstructed source code is a fascinating way to understand a product's design philosophy. The context window management in particular is something a lot of teams building agents get wrong. Curious what surprised you most - did the tool use patterns match what you expected from the public docs, or were there things under the hood that looked quite different?

Teruo Kunihiro • Apr 7

Thanks . The broad patterns matched the public docs more or less, but the guardrails under the hood were thicker than I expected.