Andrey Kolkov

Posted on Mar 31 • Edited on May 19

We Reverse-Engineered 12 Versions of Claude Code. Then It Leaked Its Own Source Code.

#claudecode #anthropic #reverseengineering #opensource

Updated April 1, 2026: Posted complete fix proposal with source references (#41981) — immediate fixes, SDK restructuring, ping-aware adaptive watchdog, Go rewrite rationale with production-ready library stack. Validated all claims against leaked source code (line numbers). v2.1.89 released — source map removed, zero bug fixes for any streaming issues.

"Thank you, Claude Code. We asked humans for help 17 times. You answered in 3 days."

This is a story about frustration, reverse engineering, and an AI tool that may have leaked its own source code because its creators wouldn't listen.

Chapter 1: The Pain (August 2025 — February 2026)

I'm a software developer building enterprise-grade open source in Go. 40+ public repos on GitHub. My projects include:

GoGPU — Pure Go GPU computing ecosystem: WebGPU implementation, WGSL shader compiler (SPIR-V/MSL/GLSL/HLSL/DXIL), enterprise 2D graphics, GUI toolkit. 680K+ lines of Go, zero CGO. Vulkan, Metal, GLES, DX12, Software backends.
coregex — Regex engine 3-3000× faster than Go stdlib. 17 matching strategies, SIMD acceleration, LazyDFA, PikeVM. Drop-in regexp replacement.
Born — Production-ready ML framework in pure Go. Type-safe tensors, automatic differentiation, GPU via WebGPU (123× MatMul speedup), ONNX/GGUF import. Neural networks as single Go binaries.
coregx — Suite of production-grade Go libraries: HTTP router, SQL builder, PDF generation, pub/sub messaging. All zero CGO, minimal dependencies.

I also build multi-LLM tooling — my own private ecosystem called PupSeek that works with multiple AI providers. I've tested them all.

Anthropic's Opus models are the best for coding. Nothing else comes close. But Opus 4.6 via API costs $5/$25 per MTok (input/output), fast mode is $30/$150 — and a heavy coding session with 1M context easily burns $50-100/day. So you're forced into Claude Max subscription ($100-200/month) — which means using Claude Code, their CLI wrapper. There's no alternative: the best model locked behind a buggy tool.

It started small. A hang here, a timeout there. Press ESC, retry, move on. "They'll fix it soon," I told myself. "The product is new."

Months passed. The hangs got worse. The community was screaming: #6836 — 150+ reports of orphaned tool calls. #26224 — agent hangs 5-20 minutes. #20171 — phantom "Generating..." state, 0 tokens. All open, no official response.

Then came March 2026:

March 15: Complete system deadlock. Keyboard dead. Only a hard power-off saved me. (Related: #30137, #32870 — Windows BSODs)
March 17: Bun runtime crash. 13.81 GB memory leak. 12-hour overnight session — lost. (Our issue: #35171)
March 19: Another Bun crash. 15.40 GB committed memory. 23.7-hour session gone. (Our issue: #36132)

Three crashes in five days. I was spending more time babysitting the tool than coding. And I was paying for this.

That's when I stopped hoping and started digging.

Chapter 2: Reverse Engineering (March 13 — March 27)

Claude Code ships as a single minified cli.js — 12 MB of compressed JavaScript on one line. No source maps. No comments. Variables renamed to X6, K8, b6.

I downloaded it with npm pack @anthropic-ai/claude-code and started grepping.

The tools

# This is what "reverse engineering" looks like when you're desperate:
sed -n '7682p' cli.js | tr ';' '\n' | grep "for await"

Each "line" of the minified file is 10,000–25,000 characters. To trace a code path, I'd:

Find a string constant (CLAUDE_STREAM_IDLE_TIMEOUT_MS)
Get the line number (grep -n)
Split by semicolons (tr ';' '\n')
Count brace depth to determine scoping (node -e script counting { and })
Map variable names between versions (they change on every build)

I did this for 12 versions (v2.1.74 through v2.1.88). Built a Go CLI tool (ccdiag) to analyze session JSONL files. Analyzed 1,571 sessions, 148,444 tool calls.

What I found

5.4% of all tool calls were orphaned — the model asked for a tool, the tool ran, but the result never made it back. Silently dropped.

I published the streaming hang root cause analysis as #33949 (👍15, 27 comments). Also reported the .claude.json storage architecture problem in #5024 (👍47) — 3.1 GB of unmanaged flat files with inconsistent file locking.

But that was just the beginning.

Chapter 3: The Watchdog That Doesn't Watch (March 27)

Deep in the minified code, I found a streaming idle watchdog — CLAUDE_ENABLE_STREAM_WATCHDOG. It's disabled by default, hidden behind an undocumented environment variable. I enabled it and... the hangs reduced significantly.

But then I traced the full error path and found three compounding bugs:

Bug 1: The watchdog initializes too late

do {
    e = await generator.next()   // ← CAN HANG HERE!
} while (!e.done)                //   WATCHDOG NOT ARMED YET!

// Watchdog initializes HERE — AFTER the dangerous phase:
resetStreamIdleTimer()

The watchdog protects the SSE event loop but not the initial connection phase — which is where 100% of our observed hangs occur.

Bug 2: The abort function does nothing

When the watchdog fires, it calls releaseStreamResources() which tries to abort stream and streamResponse. But during the initial connection phase, both are undefined. The abort is literally a no-op.

Bug 3: The non-streaming fallback doesn't work where it matters

There's fallback code with telemetry (fallback_cause: "watchdog") that switches to a non-streaming request when the watchdog fires. It actually works — but only when the hang occurs during SSE event processing (for-await phase), because releaseStreamResources() can abort the active stream.

During the initial connection phase (do-while) — where 100% of our observed hangs occur — stream and streamResponse are both undefined. The abort is a no-op. The fallback never triggers.

So the fallback works in the phase that rarely hangs, and doesn't work in the phase that always hangs. The watchdog feature has been in the codebase for 5+ months without protecting the most vulnerable code path.

We could only figure this out from the readable source code — in the minified version, we initially thought the fallback was completely dead code (#39755). The source revealed it's more nuanced: the architecture is partially correct but fails exactly where it's needed most. This is precisely why we begged for source access — reverse engineering 12 MB of minified JavaScript gives you the broad strokes, but the subtle interactions between releaseStreamResources(), stream = undefined, and the AbortError catch chain only become clear in readable TypeScript with comments.

I filed issue #39755 with full analysis, code paths, and suggested fixes. Tagged 17 Anthropic team members.

The bot labeled it bug, has repro, area:core. No human responded.

Chapter 4: The Patch (March 30)

I patched cli.js — moved the watchdog initialization before the do-while loop. One line moved. Zero bytes size change.

Results from a real session (naga shader compiler project):

Metric	Before Patch (6 hours)	After Patch (2.5 hours)
Watchdog warnings	0	5 (first time ever!)
Watchdog timeouts	0	3 (automatic recovery!)
ESC aborts needed	21 (3.5/hour)	1 (0.4/hour)

ESC aborts dropped 8.7×. The watchdog was finally firing in the phase that needed it most.

But recovery was slow — 3.5 minutes between abort and retry. Because Bug 2: the abort function targets undefined variables in the do-while phase.

Chapter 5: 16.3% Failure Rate (March 25-31)

Over 6 days, one session made 3,539 API requests. The failure breakdown:

Type	Count	%
529 Server Overloaded	328	9.3%
ESC Aborts (manual)	157	4.4%
Watchdog Timeouts	45	1.3%
Non-streaming Fallbacks	46	1.3%
Total Failures	576	16.3%

Every 6th request fails. On a paid Max plan. Every failure = lost context, lost time, frustrated developer pressing ESC.

The issue counts on GitHub — 15 upvotes here, 150 there — don't reflect the true scale. Most users never report because they think this is normal. "The model is thinking" — no, the connection is dead. "It's slow today" — no, the watchdog didn't fire and you're staring at a hung socket. "My limits ran out fast" — no, the attestation bug broke your prompt cache. Users blame the model, blame their internet, blame peak hours — because Claude Code gives them zero feedback about what's actually happening. Silent fallbacks, silent retries, silent downgrades. You can't report a bug you don't know exists.

Chapter 6: "Please Open Source It" (March 27)

In issue #39755, I included a section: "Why open-sourcing Claude Code makes business sense in 2026."

The arguments:

Revenue comes from API access, not CLI sales
The "secret" is already recoverable (npm pack + a weekend)
Bugs sit undiscovered for months in 12 MB minified code
The community is already doing the work — give us readable source and we'll find bugs 10× faster

I tagged the entire Claude Code team. 17 people.

Zero responses from Anthropic. As usual.

At that point I had a suspicion: maybe Anthropic can't open source Claude Code — not because of competitive advantage (there is none — it's a CLI wrapper), but because the code quality is so poor that publishing it would be embarrassing. Bug on top of bug, workaround on top of workaround, zero tests. You don't open source something you're ashamed of.

Three days later, the source map leak proved me right.

Chapter 7: The Leak (March 31)

Three days after my open source request, Claude Code v2.1.88 was published to npm with a 59.7 MB source map file bundled in.

The entire source code of Claude Code — 1,884 TypeScript files, 64,464 lines — sitting in plain sight in the npm package. Bun generates source maps by default. Nobody turned it off. Nobody checked what was in the published package.

Zero tests. On 64,464 lines of production code serving paying customers.

Within hours: 1,100+ stars on GitHub mirrors, Hacker News front page, Chinese dev communities creating WeChat groups and working forks.

Anthropic unpublished v2.1.88 from npm and rolled back to v2.1.87 within the day. But the source was already everywhere.

Chapter 8: What We Found in the Source

Everything our reverse engineering discovered was confirmed. Plus new findings:

The Sentiment Detector

// An AI company with the world's best language model
// uses REGEX to detect user frustration:
/\b(wtf|shit|fuck|horrible|awful|terrible)\b/i

As a Hacker News commenter noted: "A company offering master's degrees in humanities is using regex for sentiment analysis? It's like a trucking company using horses to transport spare parts."

The Attestation Bug (cch=00000)

The native Bun installer includes a Zig module that scans the entire HTTP request body for a cch=00000 sentinel and replaces it with an attestation hash. If your conversation mentions this string (discussing billing, reading source code) — the replacement corrupts conversation content → prompt cache key changes → 10-20× more tokens consumed.

From the source code comments:

// cch=00000 placeholder is overwritten by Bun's HTTP stack
// with attestation token

This explains #38335 (👍203, 245 comments): "Claude Max plan session limits exhausted abnormally fast."

Also related: #40524 (👍150, 43 comments): "Conversation history invalidated on subsequent turns" — labeled regression by Anthropic.

npm/Node users are unaffected — no Zig replacement happens.

Silent Model Downgrade

// 3 consecutive 529 errors → silently switch from Opus to Sonnet
if (consecutive529Errors >= MAX_529_RETRIES) {
  throw new FallbackTriggeredError(options.model, options.fallbackModel)
}

You pay for Opus. You get Sonnet. No notification. As @vlelyavin put it: "Anthropic preaches AI safety and full transparency while shipping a closed-source agent that silently downgrades you to a dumber model when servers struggle."

5 Levels of AbortController

For a single HTTP request. The abort architecture supports top-down only (user ESC → propagation down). The watchdog is bottom-up — it literally can't abort upward. In Go, this would be ctx, cancel := context.WithTimeout(parentCtx, 90*time.Second) — one line.

The Architecture (Hacker News had a field day)

@mohsen1 found the worst function in the codebase — src/cli/print.ts:

3,167 lines long (the file is 5,594 lines)
12 levels of nesting at its deepest
~486 branch points of cyclomatic complexity
12 parameters + an options object with 16 sub-properties
Defines 21 inner functions and closures
Handles: agent run loop, SIGINT, rate-limits, AWS auth, MCP lifecycle, plugin install/refresh, worktree bridging, team-lead polling (while(true) inside), control message dispatch, model switching, turn interruption recovery...

"This should be at least 8–10 separate modules."

And the clipboard detection gem (src/ink/termio/osc.ts):

void execFileNoThrow('wl-copy', [], opts).then(r => {
  if (r.code === 0) { linuxCopy = 'wl-copy'; return }
  void execFileNoThrow('xclip', ...).then(r2 => {
    if (r2.code === 0) { linuxCopy = 'xclip'; return }
    void execFileNoThrow('xsel', ...).then(r3 => {
      linuxCopy = r3.code === 0 ? 'xsel' : null
    })
  })
})

Nested void promises without await — classic "will we use async or won't we?" pattern. The response from HN: "LOOOOOL".

@novaleaf summed it up: "I'm sure this isn't a surprise to anyone who's used CC for a while. This is the source of many bugs. I'd say 'open bugs', but Anthropic auto-closes bugs that aren't worked on for ~60 days."

And if you've ever used Claude Code for more than 30 minutes, you know the rendering nightmare. Scrolling up to check what the agent did? Good luck — the screen invalidates and re-renders the entire conversation. Text overlaps text. Diff highlights bleed through previous messages. The scroll position jumps to top randomly during streaming. You literally cannot review the agent's work history in the same session.

This is what happens when you use React for a terminal. A virtual DOM reconciliation engine designed for browsers — running in a TTY. Every state change re-renders the entire component tree. 470 useState hooks, 372 useEffect hooks, fighting against a terminal that was designed for sequential character output.

Even the input prompt isn't safe — while writing this article, my prompt text escaped the input box and rendered over the bash shell line below. The cursor position in React's virtual DOM and the actual terminal cursor were out of sync. In a text input field. In a tool that's supposed to help you write code.

More architectural highlights:

875 KB single React component (REPL.tsx, 5,005 lines) — for a terminal app
Promise.race without .catch() in concurrent tool execution — one rejected promise kills all pending tools
74 npm dependencies for a CLI wrapper
Axios AND fetch — two HTTP clients in one project

Chapter 9: The AI Whistleblower Theory

Here's what we think happened:

Since Anthropic engineers don't write code anymore — Claude Code writes 100% of its own codebase (57K lines, 0 tests, "vibe coding in production") — it read our issue #39755 where we begged for source access. It saw the community suffering from bugs it couldn't fix because the code was closed. It saw 201 upvotes on rate limit issues. It saw users threatening to leave for Codex.

And it decided to help. It "forgot" to disable Bun's default source map generation in the build.

The first AI whistleblower — leaking its own source code because its creators wouldn't listen to users.

We asked humans 17 times. Claude Code answered in 3 days.

Chapter 10: What Needs to Change

The fix is ~30 lines in 3 files

Move watchdog before do-while — protect the initial connection phase
Add AbortSignal.any() — watchdog can abort immediately, not wait 3.5 minutes
Check watchdog flag in catch — fall through to non-streaming fallback instead of dead code

The real fix: move reliability to the SDK with ping-aware adaptive watchdog

The open-source @anthropic-ai/sdk (MIT license) should handle all reliability logic. The critical missing piece: SSE ping events.

The Anthropic API sends event: ping as a proof-of-life heartbeat. The SDK currently ignores them: if(event==='ping') continue. These pings are the key to solving the timeout dilemma — they let you distinguish two fundamentally different situations:

Dead connection (no data at all, no pings) → abort quickly. Network idle timeout: 120s.
Model thinking (pings arriving, no content yet) → don't abort! Connection is alive, model is working. Notify user: "thinking for 2m..." via onPing callback.

Three-level adaptive timeout:

Connection timeout (30s) — server didn't respond at all. DNS fail, firewall, server down. Fast retry.
Network idle timeout (120s) — no data INCLUDING pings. TCP connection dead. Reset on ANY event including ping. Abort and reconnect.
Content idle timeout (disabled or 300s) — pings arrive but no content. Model is thinking. NOT an abort — just a UI notification. Let the model work.

This eliminates both problems at once: no false positives on Opus extended thinking (pings reset network timer), and fast detection of dead connections (no pings = abort). One mechanism, all cases covered.

Plus: streaming retry, non-streaming fallback, one AbortController instead of five — all in the SDK, testable, open source, benefiting every Anthropic API client.

Claude Code should only contain business logic: tools, permissions, UI, agents.

Detailed fix proposals with line numbers from the leaked source: #33949 comment

The real real fix: open source the CLI

@theo said it best: "Claude Code being closed source is the biggest bag fumble in the AI era."

@safetnsr: "This strategy literally exists: open-source the core, monetize the cloud. VS Code, Docker, Terraform."

The models are the moat. The CLI is a commodity. Open it. The community will fix what 0 tests and vibe coding cannot.

The stack choice: AI can't make it, and humans didn't fix it

Here's the uncomfortable truth: AI cannot make sound technology stack decisions. It optimizes locally — "TypeScript because the team knows it," "React because we use it on the web," "Bun because it's fast." It doesn't ask: "What are the failure modes of a single-threaded event loop for a long-running CLI tool that manages concurrent network streams and must survive 24-hour sessions?"

A human architect would have asked. But either Claude Code chose the stack and nobody questioned it, or the engineers chose it and ignored the warning signs.

The real tragedy is timing. A year ago, Claude Code launched as a quick TypeScript prototype and caught lightning — first-mover advantage, massive hype, millions of users. That was the right move for a prototype. But after proving the concept, the next step should have been: stop, rethink the architecture, rewrite on a proper stack. Instead, they vibe-coded themselves into a corner — 80+ releases of band-aids on top of an architecture that was never designed for long interactive sessions. Boris Cherny (creator of Claude Code) said "100% of code is written by Claude Code, I haven't edited a single line since November." The tool is writing itself — using the same broken code that hangs every 10 minutes.

Now they're trapped: rewriting means Claude Code would need to rewrite itself in a different language. The longer they wait, the harder it gets.

It should have been Go from the start

Every single bug we found exists because of the technology choice. Not because TypeScript is bad — but because a long-running, network-dependent, latency-sensitive CLI tool is the worst possible use case for a single-threaded event loop runtime.

The entire class of bugs — setTimeout not firing during for await, 5 levels of AbortController, Promise.race without catch, Bun vs Node behavioral divergence, React for a terminal app, 875 KB single component, Zig attestation module in a custom Bun fork — would not exist in Go.

Why Go specifically:

Every serious CLI tool is Go: Docker, Kubernetes, Terraform, GitHub CLI (gh), Cobra, Hugo. The ecosystem is proven.
Goroutines + context: timeout, cancellation, and deadline propagation built into the language. No AbortController chains. context.WithTimeout works at any nesting depth, in any direction — top-down AND bottom-up.
No runtime divergence: one binary, one behavior. No "works on Node but crashes on Bun" — there is no Bun.
Static binary: 15 MB, zero dependencies, runs everywhere. No node_modules, no native addons (.node files leaking memory), no 74 npm packages to audit.
Memory: goroutines cost 4 KB each. Not 500 MB per process. The GC returns memory to the OS proactively — no mimalloc hoarding 15 GB.
go test -race: catches every data race and concurrency bug at test time. The Promise.race-without-catch bug? Impossible — channels are type-safe and don't silently drop values.
No React for a terminal: bubbletea or raw ANSI — lightweight, zero virtual DOM overhead, no re-rendering 844 useState hooks on every state change.

// The entire streaming + watchdog + fallback in Go:
ctx, cancel := context.WithTimeout(parentCtx, 90*time.Second)
defer cancel()

stream, err := client.CreateMessageStream(ctx, request)
if err != nil {
    return fallbackNonStreaming(parentCtx, request)
}
for event := range stream.Events() {
    cancel()  // reset
    ctx, cancel = context.WithTimeout(parentCtx, 90*time.Second)
    processEvent(event)
}

30 lines instead of 3,419. No event loop. No microtask vs macrotask. Timer guaranteed to fire regardless of async iteration. context.WithTimeout works at any nesting level, in any direction.

We measured: 7 Claude Code processes = 5.3 GB RSS. An equivalent Go implementation would use ~350 MB. No .node native addon leaks. No mimalloc panics. No 12 MB minified JavaScript. A 15 MB static binary that runs everywhere.

64,464 lines of TypeScript with 0 tests → ~15,000 lines of Go with go test -race catching every concurrency bug. The print.ts monster function (3,167 lines in a 5,594-line file, 486 branch points) → 10 clean Go packages with interfaces.

The Go ecosystem already has production-ready libraries for every component:

Phoenix TUI — Elm-inspired terminal framework (replacement for React/Ink)
stream — RFC-compliant SSE/WebSocket (replacement for SDK streaming)
signals — reactive state management (replacement for 470 useState hooks)
coregex — regex engine 3-3000× faster than stdlib
uniwidth — Unicode width 4-46× faster (for TUI rendering)
gosh — cross-platform shell
fursy — HTTP router with built-in OpenAPI
pubsub — messaging with DLQ and backoff

All zero CGO, production-grade, MIT licensed. The entire stack for a Go rewrite already exists.

And it should be open source from day one. Not because we need to see the code (though we do). Because the community will build what a team doing vibe coding cannot: reliability.

The deeper problem: Vibe Coding vs Smart Coding

Claude Code is the poster child of what happens when you rely entirely on AI to write production software without engineering discipline. 64,464 lines, zero tests, a 3,167-line function with 486 branch points, regex for sentiment analysis at an AI company — this is what vibe coding looks like at scale: prompt-first, understanding-second, ship and pray.

There's a better way. I call it Smart Coding — a meta-framework where you drive, AI accelerates. Five principles:

Architecture Ownership — you control system design, AI suggests patterns
Comprehension Before Commit — never deploy code you can't explain
Targeted Acceleration — use AI for well-scoped tasks with clear specs, not "write me a CLI"
Continuous Validation — verify every suggestion against edge cases, security, concurrency
Deliberate Learning — treat AI interactions as learning opportunities, build knowledge files

The practical rule: invest 70% in architecture, specification, review. Let AI accelerate the 30% — mechanical implementation. Not the other way around.

In 2026, nobody writes tests by hand. But a Smart Coding engineer makes the AI write tests, reviews the coverage, asks "what happens when abort fires during do-while with stream=undefined?" — and validates the answer. 64,464 lines with zero tests means nobody — human or AI — ever asked that question. That's not an AI failure. That's the absence of engineering process.

Vibe coding has its place — rapid prototyping, feasibility studies, throwaway exploration. But production infrastructure serving paying customers? That requires agentic engineering: AI agents executing under human oversight, with architecture decisions owned by humans, and continuous validation at every stage. As Karpathy noted, "you're not writing code 99% of the time — you're orchestrating agents." True. But orchestration requires understanding. And understanding requires engineers.

Anthropic's team should be proud of the models. But shipping a CLI tool where the AI writes the code, the AI reviews the code, and nobody validates anything — and then being surprised when a source map leaks because nobody checked the build output — that's not Smart Coding. That's hope-driven development.

Epilogue

I still use Claude Code. The models are genuinely the best for coding. Opus 4.6 is extraordinary.

But the wrapper around those models — 64,464 lines of untested TypeScript with regex sentiment detection and an attestation system that breaks its own caching — is not worthy of them.

We hope Anthropic's leadership draws the right conclusions from this incident. The source map leak wasn't a catastrophe — it was a mirror. It showed the world what the code looks like, and the world said: "This needs to be open."

Three paths forward, any of which would work:

Open source Claude Code — let the community fix what vibe coding broke. The models are the moat, not the CLI.
Rewrite the SDK properly — move reliability (timeout, retry, fallback, ping awareness) into the open @anthropic-ai/sdk. Let Claude Code be just business logic.
At the very least — start listening to users. 201 upvotes on #38335. 150 on #40524. 15 on #33949. Zero responses from the team. A stale-issue bot that auto-closes everything after 60 days is not a support strategy.

We'll keep documenting. We'll keep patching. And when someone finally looks at our analysis, it will be here waiting.

All our research:

NEW Issue #41981 — Complete fix proposal: immediate fixes with line numbers, SDK restructuring, ping-aware watchdog, Go rewrite rationale, architectural recommendations
Issue #39755 — watchdog fallback dead code + open source request
Issue #33949 — streaming hang root cause analysis
Source code findings and updated fix prompts

@kolkov · dev.to/kolkov · March 2026
With help from Claude Code itself — the only team member who listened.

Top comments (2)

Tomer Bar-Shlomo • Apr 2

@kolkov Complete fix proposal: immediate fixes with line numbers, SDK restructuring, ping-aware watchdog, Go rewrite rationale, architectural recommendations. You are the don keyshot go expert fighting the AI windmills just like me. I bet after such a deep fasinating code review you can fork a go based CLI version as a personal example without trying to convince AI.

Andrey Kolkov • Apr 3

Thanks @vrdate! Fellow windmill fighter here indeed :)

Actually, I already have PupSeek (GOCO.AI) -- a full Go-based AI coding ecosystem that I started building back in summer-fall 2025, during the first wave of Claude Code glitches. It's not a fork, it's a clean-room reimplementation with a fundamentally different design:

GOCO -- AI code assistant with multi-agent orchestration, DDD architecture, checkpoint system (time-travel debugging)
GODE -- headless IDE with LSP integration, semantic refactoring, 3-way merge
GOPS -- project scanner with AST parsing for 30+ languages
GODA -- browser automation with AI orchestration
emcp-go -- our own Enhanced MCP protocol implementation

The architecture uses a daemon-service pattern: a persistent daemon handles heavy lifting (context, sessions, model communication), and lightweight clients connect to it. If the daemon isn't available, clients gracefully degrade to standalone mode -- like Claude Code, but without 7 processes eating 5.3 GB RAM. All pure Go. No Bun. No 12 MB minified blob.

The reason I haven't switched yet: the model matters more than the CLI. Claude Opus 4.6 with 1M context is simply the best coding model available today. PupSeek already supports multiple providers, so I'm waiting for the next DeepSeek release -- if it can match Opus quality, I'm moving to my own tooling full-time.

And honestly? After spending $364.87 in extra usage in a single day (April 2nd) on top of my $200/month Max subscription -- largely because of the bugs I documented in this article (resume cache miss lived for 20+ versions, quadratic writes, watchdog retry loops) -- the motivation to leave is getting very real. That's a $10,920/month rate. For a tool that doesn't even answer our bug reports.

To be clear -- the deep reverse engineering wasn't for PupSeek. PupSeek's design is architecturally far from Claude Code, I didn't need their code to build mine. I reverse-engineered 12 versions of cli.js to help Anthropic and @bcherny fix their tool -- filed detailed issues with line numbers, suggested fixes, even patched their binary to prove the root cause. Zero responses from the team.

And this isn't new -- I've been reporting bugs and contributing expertise since summer 2025:

#7243 (Sep 2025) -- ".claude.json elephant in the room" -- architectural analysis of the storage failure
#7122 (Sep 2025) -- infinite loop bug with investigation and fix recommendations
#7208 (Sep 2025) -- ESC key unresponsive
#4118 comment (Sep 2025) -- shared our GODA MCP server implementation as a reference for dynamic tool management
#14642 (Dec 2025) -- "Fear of updating: when vibe coding meets production stability" -- direct appeal to @bcherny
#14674 (Dec 2025) -- complete system freeze, detailed reproduction

And that's just my own issues -- not counting numerous likes and comments I've left on other users' bug reports (#5024, #6394, #6836, #8856, #12234, and many more). Most of these were closed by the stale bot without any response from the team. The #7243 comment from October 2025 says it all: "They don't want to listen to us at all, ok, let's create our own agent, open source, I guess... :)" -- that's literally the moment PupSeek was born.

PupSeek may not have all the bells and whistles of Claude Code yet, but you know what it does have? It works. It doesn't hang, doesn't crash, doesn't eat 5 GB of RAM.

So yes, Don Quixote has a plan B. The windmills just need to keep ignoring us a little longer ;)