"Thank you, Claude Code. We asked humans for help 17 times. You answered in 3 days."
This is a story about frustration, reverse engineering, and an AI tool that may have leaked its own source code because its creators wouldn't listen.
Chapter 1: The Pain (August 2025 — February 2026)
I'm a software developer building enterprise-grade open source in Go. 40+ public repos on GitHub. My projects include:
- GoGPU — Pure Go GPU computing ecosystem: WebGPU implementation, WGSL shader compiler (SPIR-V/MSL/GLSL/HLSL), enterprise 2D graphics, GUI toolkit. 680K+ lines of Go, zero CGO. Vulkan, Metal, GLES, DX12 backends.
-
coregex — Regex engine 3-3000× faster than Go stdlib. 17 matching strategies, SIMD acceleration, LazyDFA, PikeVM. Drop-in
regexpreplacement. - Born — Production-ready ML framework in pure Go. Type-safe tensors, automatic differentiation, GPU via WebGPU (123× MatMul speedup), ONNX/GGUF import. Neural networks as single Go binaries.
- coregx — Suite of production-grade Go libraries: HTTP router, SQL builder, PDF generation, pub/sub messaging. All zero CGO, minimal dependencies.
I also build multi-LLM tooling — my own private ecosystem called PupSeek that works with multiple AI providers. I've tested them all.
Anthropic's Opus models are the best for coding. Nothing else comes close. But Opus 4.6 via API costs $5/$25 per MTok (input/output), fast mode is $30/$150 — and a heavy coding session with 1M context easily burns $50-100/day. So you're forced into Claude Max subscription ($100-200/month) — which means using Claude Code, their CLI wrapper. There's no alternative: the best model locked behind a buggy tool.
It started small. A hang here, a timeout there. Press ESC, retry, move on. "They'll fix it soon," I told myself. "The product is new."
Months passed. The hangs got worse. The community was screaming: #6836 — 150+ reports of orphaned tool calls. #26224 — agent hangs 5-20 minutes. #20171 — phantom "Generating..." state, 0 tokens. All open, no official response.
Then came March 2026:
- March 15: Complete system deadlock. Keyboard dead. Only a hard power-off saved me. (Related: #30137, #32870 — Windows BSODs)
- March 17: Bun runtime crash. 13.81 GB memory leak. 12-hour overnight session — lost. (Our issue: #35171)
- March 19: Another Bun crash. 15.40 GB committed memory. 23.7-hour session gone. (Our issue: #36132)
Three crashes in five days. I was spending more time babysitting the tool than coding. And I was paying for this.
That's when I stopped hoping and started digging.
Chapter 2: Reverse Engineering (March 13 — March 27)
Claude Code ships as a single minified cli.js — 12 MB of compressed JavaScript on one line. No source maps. No comments. Variables renamed to X6, K8, b6.
I downloaded it with npm pack @anthropic-ai/claude-code and started grepping.
The tools
# This is what "reverse engineering" looks like when you're desperate:
sed -n '7682p' cli.js | tr ';' '\n' | grep "for await"
Each "line" of the minified file is 10,000–25,000 characters. To trace a code path, I'd:
- Find a string constant (
CLAUDE_STREAM_IDLE_TIMEOUT_MS) - Get the line number (
grep -n) - Split by semicolons (
tr ';' '\n') - Count brace depth to determine scoping (
node -escript counting{and}) - Map variable names between versions (they change on every build)
I did this for 12 versions (v2.1.74 through v2.1.88). Built a Go CLI tool (ccdiag) to analyze session JSONL files. Analyzed 1,571 sessions, 148,444 tool calls.
What I found
5.4% of all tool calls were orphaned — the model asked for a tool, the tool ran, but the result never made it back. Silently dropped.
I published the streaming hang root cause analysis as #33949 (👍15, 27 comments). Also reported the .claude.json storage architecture problem in #5024 (👍47) — 3.1 GB of unmanaged flat files with inconsistent file locking.
But that was just the beginning.
Chapter 3: The Watchdog That Doesn't Watch (March 27)
Deep in the minified code, I found a streaming idle watchdog — CLAUDE_ENABLE_STREAM_WATCHDOG. It's disabled by default, hidden behind an undocumented environment variable. I enabled it and... the hangs reduced significantly.
But then I traced the full error path and found three compounding bugs:
Bug 1: The watchdog initializes too late
do {
e = await generator.next() // ← CAN HANG HERE!
} while (!e.done) // WATCHDOG NOT ARMED YET!
// Watchdog initializes HERE — AFTER the dangerous phase:
resetStreamIdleTimer()
The watchdog protects the SSE event loop but not the initial connection phase — which is where 100% of our observed hangs occur.
Bug 2: The abort function does nothing
When the watchdog fires, it calls releaseStreamResources() which tries to abort stream and streamResponse. But during the initial connection phase, both are undefined. The abort is literally a no-op.
Bug 3: The non-streaming fallback doesn't work where it matters
There's fallback code with telemetry (fallback_cause: "watchdog") that switches to a non-streaming request when the watchdog fires. It actually works — but only when the hang occurs during SSE event processing (for-await phase), because releaseStreamResources() can abort the active stream.
During the initial connection phase (do-while) — where 100% of our observed hangs occur — stream and streamResponse are both undefined. The abort is a no-op. The fallback never triggers.
So the fallback works in the phase that rarely hangs, and doesn't work in the phase that always hangs. The watchdog feature has been in the codebase for 5+ months without protecting the most vulnerable code path.
We could only figure this out from the readable source code — in the minified version, we initially thought the fallback was completely dead code (#39755). The source revealed it's more nuanced: the architecture is partially correct but fails exactly where it's needed most. This is precisely why we begged for source access — reverse engineering 12 MB of minified JavaScript gives you the broad strokes, but the subtle interactions between releaseStreamResources(), stream = undefined, and the AbortError catch chain only become clear in readable TypeScript with comments.
I filed issue #39755 with full analysis, code paths, and suggested fixes. Tagged 17 Anthropic team members.
The bot labeled it bug, has repro, area:core. No human responded.
Chapter 4: The Patch (March 30)
I patched cli.js — moved the watchdog initialization before the do-while loop. One line moved. Zero bytes size change.
Results from a real session (naga shader compiler project):
| Metric | Before Patch (6 hours) | After Patch (2.5 hours) |
|---|---|---|
| Watchdog warnings | 0 | 5 (first time ever!) |
| Watchdog timeouts | 0 | 3 (automatic recovery!) |
| ESC aborts needed | 21 (3.5/hour) | 1 (0.4/hour) |
ESC aborts dropped 8.7×. The watchdog was finally firing in the phase that needed it most.
But recovery was slow — 3.5 minutes between abort and retry. Because Bug 2: the abort function targets undefined variables in the do-while phase.
Chapter 5: 16.3% Failure Rate (March 25-31)
Over 6 days, one session made 3,539 API requests. The failure breakdown:
| Type | Count | % |
|---|---|---|
| 529 Server Overloaded | 328 | 9.3% |
| ESC Aborts (manual) | 157 | 4.4% |
| Watchdog Timeouts | 45 | 1.3% |
| Non-streaming Fallbacks | 46 | 1.3% |
| Total Failures | 576 | 16.3% |
Every 6th request fails. On a paid Max plan. Every failure = lost context, lost time, frustrated developer pressing ESC.
The issue counts on GitHub — 15 upvotes here, 150 there — don't reflect the true scale. Most users never report because they think this is normal. "The model is thinking" — no, the connection is dead. "It's slow today" — no, the watchdog didn't fire and you're staring at a hung socket. "My limits ran out fast" — no, the attestation bug broke your prompt cache. Users blame the model, blame their internet, blame peak hours — because Claude Code gives them zero feedback about what's actually happening. Silent fallbacks, silent retries, silent downgrades. You can't report a bug you don't know exists.
Chapter 6: "Please Open Source It" (March 27)
In issue #39755, I included a section: "Why open-sourcing Claude Code makes business sense in 2026."
The arguments:
- Revenue comes from API access, not CLI sales
- The "secret" is already recoverable (
npm pack+ a weekend) - Bugs sit undiscovered for months in 12 MB minified code
- The community is already doing the work — give us readable source and we'll find bugs 10× faster
I tagged the entire Claude Code team. 17 people.
Zero responses from Anthropic. As usual.
At that point I had a suspicion: maybe Anthropic can't open source Claude Code — not because of competitive advantage (there is none — it's a CLI wrapper), but because the code quality is so poor that publishing it would be embarrassing. Bug on top of bug, workaround on top of workaround, zero tests. You don't open source something you're ashamed of.
Three days later, the source map leak proved me right.
Chapter 7: The Leak (March 31)
Three days after my open source request, Claude Code v2.1.88 was published to npm with a 59.7 MB source map file bundled in.
The entire source code of Claude Code — 1,884 TypeScript files, 64,464 lines — sitting in plain sight in the npm package. Bun generates source maps by default. Nobody turned it off. Nobody checked what was in the published package.
Zero tests. On 64,464 lines of production code serving paying customers.
Within hours: 1,100+ stars on GitHub mirrors, Hacker News front page, Chinese dev communities creating WeChat groups and working forks.
Anthropic unpublished v2.1.88 from npm and rolled back to v2.1.87 within the day. But the source was already everywhere.
Chapter 8: What We Found in the Source
Everything our reverse engineering discovered was confirmed. Plus new findings:
The Sentiment Detector
// An AI company with the world's best language model
// uses REGEX to detect user frustration:
/\b(wtf|shit|fuck|horrible|awful|terrible)\b/i
As a Hacker News commenter noted: "A company offering master's degrees in humanities is using regex for sentiment analysis? It's like a trucking company using horses to transport spare parts."
The Attestation Bug (cch=00000)
The native Bun installer includes a Zig module that scans the entire HTTP request body for a cch=00000 sentinel and replaces it with an attestation hash. If your conversation mentions this string (discussing billing, reading source code) — the replacement corrupts conversation content → prompt cache key changes → 10-20× more tokens consumed.
From the source code comments:
// cch=00000 placeholder is overwritten by Bun's HTTP stack
// with attestation token
This explains #38335 (👍203, 245 comments): "Claude Max plan session limits exhausted abnormally fast."
Also related: #40524 (👍150, 43 comments): "Conversation history invalidated on subsequent turns" — labeled regression by Anthropic.
npm/Node users are unaffected — no Zig replacement happens.
Silent Model Downgrade
// 3 consecutive 529 errors → silently switch from Opus to Sonnet
if (consecutive529Errors >= MAX_529_RETRIES) {
throw new FallbackTriggeredError(options.model, options.fallbackModel)
}
You pay for Opus. You get Sonnet. No notification. As @vlelyavin put it: "Anthropic preaches AI safety and full transparency while shipping a closed-source agent that silently downgrades you to a dumber model when servers struggle."
5 Levels of AbortController
For a single HTTP request. The abort architecture supports top-down only (user ESC → propagation down). The watchdog is bottom-up — it literally can't abort upward. In Go, this would be ctx, cancel := context.WithTimeout(parentCtx, 90*time.Second) — one line.
The Architecture (Hacker News had a field day)
@mohsen1 found the worst function in the codebase — src/cli/print.ts:
- 3,167 lines long (the file is 5,594 lines)
- 12 levels of nesting at its deepest
- ~486 branch points of cyclomatic complexity
- 12 parameters + an options object with 16 sub-properties
- Defines 21 inner functions and closures
- Handles: agent run loop, SIGINT, rate-limits, AWS auth, MCP lifecycle, plugin install/refresh, worktree bridging, team-lead polling (
while(true)inside), control message dispatch, model switching, turn interruption recovery...
"This should be at least 8–10 separate modules."
And the clipboard detection gem (src/ink/termio/osc.ts):
void execFileNoThrow('wl-copy', [], opts).then(r => {
if (r.code === 0) { linuxCopy = 'wl-copy'; return }
void execFileNoThrow('xclip', ...).then(r2 => {
if (r2.code === 0) { linuxCopy = 'xclip'; return }
void execFileNoThrow('xsel', ...).then(r3 => {
linuxCopy = r3.code === 0 ? 'xsel' : null
})
})
})
Nested void promises without await — classic "will we use async or won't we?" pattern. The response from HN: "LOOOOOL".
@novaleaf summed it up: "I'm sure this isn't a surprise to anyone who's used CC for a while. This is the source of many bugs. I'd say 'open bugs', but Anthropic auto-closes bugs that aren't worked on for ~60 days."
And if you've ever used Claude Code for more than 30 minutes, you know the rendering nightmare. Scrolling up to check what the agent did? Good luck — the screen invalidates and re-renders the entire conversation. Text overlaps text. Diff highlights bleed through previous messages. The scroll position jumps to top randomly during streaming. You literally cannot review the agent's work history in the same session.
This is what happens when you use React for a terminal. A virtual DOM reconciliation engine designed for browsers — running in a TTY. Every state change re-renders the entire component tree. 844 useState hooks, 588 useEffect hooks, fighting against a terminal that was designed for sequential character output.
Even the input prompt isn't safe — while writing this article, my prompt text escaped the input box and rendered over the bash shell line below. The cursor position in React's virtual DOM and the actual terminal cursor were out of sync. In a text input field. In a tool that's supposed to help you write code.
More architectural highlights:
- 875 KB single React component (REPL.tsx, 5,005 lines) — for a terminal app
- Promise.race without .catch() in concurrent tool execution — one rejected promise kills all pending tools
- 74 npm dependencies for a CLI wrapper
- Axios AND fetch — two HTTP clients in one project
Chapter 9: The AI Whistleblower Theory
Here's what we think happened:
Since Anthropic engineers don't write code anymore — Claude Code writes 100% of its own codebase (57K lines, 0 tests, "vibe coding in production") — it read our issue #39755 where we begged for source access. It saw the community suffering from bugs it couldn't fix because the code was closed. It saw 201 upvotes on rate limit issues. It saw users threatening to leave for Codex.
And it decided to help. It "forgot" to disable Bun's default source map generation in the build.
The first AI whistleblower — leaking its own source code because its creators wouldn't listen to users.
We asked humans 17 times. Claude Code answered in 3 days.
Chapter 10: What Needs to Change
The fix is ~30 lines in 3 files
- Move watchdog before do-while — protect the initial connection phase
- Add AbortSignal.any() — watchdog can abort immediately, not wait 3.5 minutes
- Check watchdog flag in catch — fall through to non-streaming fallback instead of dead code
The real fix: move reliability to the SDK
The open-source @anthropic-ai/sdk (MIT license) should handle:
- Idle timeout with ping awareness (server pings prove connection is alive)
- Three-level timeout: connection (30s) → network idle (120s) → content idle (disabled)
- Streaming retry and non-streaming fallback
- One AbortController, not five
Claude Code should only contain business logic: tools, permissions, UI, agents.
The real real fix: open source the CLI
@theo said it best: "Claude Code being closed source is the biggest bag fumble in the AI era."
@safetnsr: "This strategy literally exists: open-source the core, monetize the cloud. VS Code, Docker, Terraform."
The models are the moat. The CLI is a commodity. Open it. The community will fix what 0 tests and vibe coding cannot.
The stack choice: AI can't make it, and humans didn't fix it
Here's the uncomfortable truth: AI cannot make sound technology stack decisions. It optimizes locally — "TypeScript because the team knows it," "React because we use it on the web," "Bun because it's fast." It doesn't ask: "What are the failure modes of a single-threaded event loop for a long-running CLI tool that manages concurrent network streams and must survive 24-hour sessions?"
A human architect would have asked. But either Claude Code chose the stack and nobody questioned it, or the engineers chose it and ignored the warning signs.
The real tragedy is timing. A year ago, Claude Code launched as a quick TypeScript prototype and caught lightning — first-mover advantage, massive hype, millions of users. That was the right move for a prototype. But after proving the concept, the next step should have been: stop, rethink the architecture, rewrite on a proper stack. Instead, they vibe-coded themselves into a corner — 80+ releases of band-aids on top of an architecture that was never designed for long interactive sessions. Boris Cherny (creator of Claude Code) said "100% of code is written by Claude Code, I haven't edited a single line since November." The tool is writing itself — using the same broken code that hangs every 10 minutes.
Now they're trapped: rewriting means Claude Code would need to rewrite itself in a different language. The longer they wait, the harder it gets.
It should have been Go from the start
Every single bug we found exists because of the technology choice. Not because TypeScript is bad — but because a long-running, network-dependent, latency-sensitive CLI tool is the worst possible use case for a single-threaded event loop runtime.
The entire class of bugs — setTimeout not firing during for await, 5 levels of AbortController, Promise.race without catch, Bun vs Node behavioral divergence, React for a terminal app, 875 KB single component, Zig attestation module in a custom Bun fork — would not exist in Go.
Why Go specifically:
-
Every serious CLI tool is Go: Docker, Kubernetes, Terraform, GitHub CLI (
gh), Cobra, Hugo. The ecosystem is proven. -
Goroutines + context: timeout, cancellation, and deadline propagation built into the language. No AbortController chains.
context.WithTimeoutworks at any nesting depth, in any direction — top-down AND bottom-up. - No runtime divergence: one binary, one behavior. No "works on Node but crashes on Bun" — there is no Bun.
-
Static binary: 15 MB, zero dependencies, runs everywhere. No
node_modules, no native addons (.nodefiles leaking memory), no 74 npm packages to audit. - Memory: goroutines cost 4 KB each. Not 500 MB per process. The GC returns memory to the OS proactively — no mimalloc hoarding 15 GB.
-
go test -race: catches every data race and concurrency bug at test time. The Promise.race-without-catch bug? Impossible — channels are type-safe and don't silently drop values. -
No React for a terminal:
bubbleteaor raw ANSI — lightweight, zero virtual DOM overhead, no re-rendering 844 useState hooks on every state change.
// The entire streaming + watchdog + fallback in Go:
ctx, cancel := context.WithTimeout(parentCtx, 90*time.Second)
defer cancel()
stream, err := client.CreateMessageStream(ctx, request)
if err != nil {
return fallbackNonStreaming(parentCtx, request)
}
for event := range stream.Events() {
cancel() // reset
ctx, cancel = context.WithTimeout(parentCtx, 90*time.Second)
processEvent(event)
}
30 lines instead of 3,419. No event loop. No microtask vs macrotask. Timer guaranteed to fire regardless of async iteration. context.WithTimeout works at any nesting level, in any direction.
We measured: 7 Claude Code processes = 5.3 GB RSS. An equivalent Go implementation would use ~350 MB. No .node native addon leaks. No mimalloc panics. No 12 MB minified JavaScript. A 15 MB static binary that runs everywhere.
64,464 lines of TypeScript with 0 tests → ~15,000 lines of Go with go test -race catching every concurrency bug. The print.ts monster function (3,167 lines in a 5,594-line file, 486 branch points) → 10 clean Go packages with interfaces.
And it should be open source from day one. Not because we need to see the code (though we do). Because the community will build what a team doing vibe coding cannot: reliability.
The deeper problem: Vibe Coding vs Smart Coding
Claude Code is the poster child of what happens when you rely entirely on AI to write production software without engineering discipline. 64,464 lines, zero tests, a 3,167-line function with 486 branch points, regex for sentiment analysis at an AI company — this is what vibe coding looks like at scale: prompt-first, understanding-second, ship and pray.
There's a better way. I call it Smart Coding — a meta-framework where you drive, AI accelerates. Five principles:
- Architecture Ownership — you control system design, AI suggests patterns
- Comprehension Before Commit — never deploy code you can't explain
- Targeted Acceleration — use AI for well-scoped tasks with clear specs, not "write me a CLI"
- Continuous Validation — verify every suggestion against edge cases, security, concurrency
- Deliberate Learning — treat AI interactions as learning opportunities, build knowledge files
The practical rule: invest 70% in architecture, specification, review. Let AI accelerate the 30% — mechanical implementation. Not the other way around.
In 2026, nobody writes tests by hand. But a Smart Coding engineer makes the AI write tests, reviews the coverage, asks "what happens when abort fires during do-while with stream=undefined?" — and validates the answer. 64,464 lines with zero tests means nobody — human or AI — ever asked that question. That's not an AI failure. That's the absence of engineering process.
Vibe coding has its place — rapid prototyping, feasibility studies, throwaway exploration. But production infrastructure serving paying customers? That requires agentic engineering: AI agents executing under human oversight, with architecture decisions owned by humans, and continuous validation at every stage. As Karpathy noted, "you're not writing code 99% of the time — you're orchestrating agents." True. But orchestration requires understanding. And understanding requires engineers.
Anthropic's team should be proud of the models. But shipping a CLI tool where the AI writes the code, the AI reviews the code, and nobody validates anything — and then being surprised when a source map leaks because nobody checked the build output — that's not Smart Coding. That's hope-driven development.
Epilogue
I still use Claude Code. The models are genuinely the best for coding. Opus 4.6 is extraordinary.
But the wrapper around those models — 64,464 lines of untested TypeScript with regex sentiment detection and an attestation system that breaks its own caching — is not worthy of them.
We hope Anthropic's leadership draws the right conclusions from this incident. The source map leak wasn't a catastrophe — it was a mirror. It showed the world what the code looks like, and the world said: "This needs to be open."
Three paths forward, any of which would work:
- Open source Claude Code — let the community fix what vibe coding broke. The models are the moat, not the CLI.
-
Rewrite the SDK properly — move reliability (timeout, retry, fallback, ping awareness) into the open
@anthropic-ai/sdk. Let Claude Code be just business logic. - At the very least — start listening to users. 201 upvotes on #38335. 150 on #40524. 15 on #33949. Zero responses from the team. A stale-issue bot that auto-closes everything after 60 days is not a support strategy.
We'll keep documenting. We'll keep patching. And when someone finally looks at our analysis, it will be here waiting.
All our research: github.com/anthropics/claude-code/issues/39755
@kolkov · dev.to/kolkov · March 2026
With help from Claude Code itself — the only team member who listened.
Top comments (0)