Karan Singh Chandel

Posted on Apr 6

Claude Code Under the Hood: How It Actually Works

#ai #claude #devtools #architecture

In March 2026, Anthropic accidentally published a source map in an npm package.

That map pointed to a zip on R2. That zip contained unobfuscated TypeScript for Claude Code. Someone posted it on X. And suddenly, a lot of us had front-row seats to how a modern coding agent is actually built.

I spent some time digging through the code, docs, and community notes. This post is the version I wish I had on day one: practical, opinionated.

No hype. Just architecture, tradeoffs, and what this means if you build AI tooling.

Quick take

Claude Code is not magic. It is a very disciplined system around a very capable model.

The model is the brain. The product quality comes from the nervous system: loop orchestration, permissions, context compaction, tool contracts, caching, retries, and UI responsiveness.

You can clone the outer loop in a weekend. You cannot clone the reliability story in a weekend.

The moment it clicked for me

I had that "oh" moment while tracing one simple task: "fix failing tests."

On paper, that's one sentence. In production, it's a chain of fragile decisions:

Decide to run tests first without being told.
Parse noisy failures and isolate relevant files.
Search a big repo without getting lost.
Make a minimal edit instead of a destructive rewrite.
Re-run tests and interpret second-order failures.
Recover when the first fix was wrong.

If you've ever built an internal "code agent" prototype, you know where this usually dies. Not on step one. On step four or five, when context is bloated, terminal output is huge, and one flaky command derails the run.

That is where Claude Code is strongest: not in the first answer, but in the 17th decision.

Architecture in simple diagram

If you remember one thing from this article, remember this: the loop is the skeleton, but the guardrails and context strategy are the organs.

Start: the first 200 ms are engineered, not lucky

When you run claude, startup does something I love from a DevRel perspective: it optimizes what users feel, not just what engineers measure.

Before heavy modules fully finish loading, it can kick off parallel work like:

policy lookups (MDM/enterprise settings)
secure token retrieval
warming a connection to API infrastructure

That means by the time your first prompt lands, expensive setup has already started.

Funny Analogy: this is the restaurant that starts warming your plate while your order is still being typed. You call it a "fast kitchen," but really it's choreography.

Bootstrap: boring systems that prevent expensive pain

Initialization has all the things seasoned platform teams eventually add:

schema-validated config with migrations
auth and token lifecycle handling
telemetry and feature flags
environment-specific policy resolution

None of this is flashy. All of this is what keeps large org rollouts from becoming support tickets.

One detail I found especially smart is build-time elimination for disabled features. If a feature flag is off, code can be stripped from shipped artifacts rather than merely gated at runtime.

Why this matters:

smaller runtime surface
fewer hidden interactions
safer gradual rollout

Let's see QueryEngine

People say "it's just an agent loop." Sure.

So is saying "a database is just read and write."

The core loop is conceptually small:

append user input
build system and user context
stream model output
execute tool calls if present
feed tool output back and continue
stop when final response is produced

The hard part is everything around those six steps:

streaming that survives partial network failures
retry behavior that does not duplicate dangerous actions
concurrency control for tools that can safely run in parallel
token and cost accounting
context compaction under pressure
UX that stays responsive during long-running commands

Think of QueryEngine as an air traffic control tower. Planes landing is the easy part. Preventing collisions, handling weather, and rerouting under stress is the job.

Why context engineering is the hidden moat

Most people underestimate this.

Context windows are finite and agent sessions are greedy. Every file read, terminal dump, and tool result consumes budget.

Claude Code's split is simple and very effective:

stable session metadata goes where caching benefits most
rapidly changing memory and turn-specific data stay separate

That design improves cache hit rates and reduces reprocessing overhead.

When pressure increases, compaction strategy kicks in:

trim stale tool output first
summarize older history when needed
preserve recent user intent and high-signal artifacts longest

Analogy: this is not deleting notes randomly; it's compressing your meeting transcript while pinning the action items.

Tools: one contract, many capabilities

This was one of the cleanest design choices I saw.

Whether a tool is built-in, remote, or plugin-provided, it follows a common shape:

validated input schema
permission semantics
execution implementation
rendering behavior
concurrency declaration

That single contract enables a lot:

minimal special-casing in the loop
easier extension via MCP
safer composition at scale

If you've ever maintained a platform where every plugin needed custom branching logic, you know how valuable this is.

Permissions: trust the model, verify the action

Giving an AI shell access without controls is how you get a postmortem.

The effective pattern is layered, not binary:

organization and project policy
tool-level risk semantics
current operation mode
safety classification pass
user confirmation when needed

This turns "agent autonomy" into managed autonomy.

Analogy: modern CI/CD lets engineers ship fast, but only through protected branches, checks, and approvals. Same philosophy here.

Subagents: practical answer to context bloat

Subagents are not just a shiny feature. They are a memory-management strategy.

A subagent gets a fresh context, solves a bounded task, and returns a summary. The primary thread keeps signal without inheriting every intermediate breadcrumb.

In real teams, this feels like delegation:

give a teammate a narrow problem
let them research independently
ask for a concise brief, not raw logs

That is exactly the shape of healthy agent orchestration.

"React in a terminal" sounds weird, but it makes sense

Yes, terminal UI implemented with React primitives can sound like overkill.

But after reading the architecture, I get it.

For a state-heavy, component-driven, asynchronous interface, the React mental model is useful even outside the browser:

deterministic render updates
composable UI states
reusable hooks for permissions, sessions, and notifications

It is less about "React everywhere" and more about "use a model your team can reason about under pressure."

The part many posts miss: this is a product bet, not a feature bet

There are two philosophies in AI devtools right now.

Approach A: heavily script workflows, tightly constrain decisions, make behavior predictable.

Approach B: give the model broader agency and build hard safety systems around it.

Claude Code leans toward B.

That only works if infrastructure is serious: permissions, checkpoints, compaction, retries, cost visibility, extension boundaries. Without that, "agentic" quickly becomes "chaotic."

With that, model improvements flow through the product with less product-level rewiring.

Final thought

You can absolutely build a capable coding loop in a weekend.

What takes real engineering maturity is making that loop feel trustworthy on Tuesday at 6:40 PM, when tests are flaky, output is noisy, context is tight, and a teammate is waiting on your fix.

That is what I came away respecting most. Not that it can answer. That it can keep operating.

If you're building in this space, study the orchestration and safety architecture as much as the prompts. The prompts get demos. The architecture gets adoption.

Let's keep building!

DEV Community