Everyone loves hockey-stick growth charts. Founders put them in pitch decks. VCs tweet them. Product managers frame them and hang them above their standing desks.
Here's mine.
It's the Claude Code GitHub issue queue.
232 issues filed in February 2025. 7,081 in February 2026. That's 30x year-on-year growth. 100 were filed before breakfast today. As I write this, 6,093 sit open. 31,995 total across the repository's lifetime. The March projection, based on the first twelve days of filing velocity, lands around 9,100.
That is not a product adoption curve. That is a maturity crisis hiding behind a success metric.
I run systemprompt.io, a platform that builds on Claude Code's ecosystem daily. Hooks, MCP servers, marketplace plugins, automated workflows. We ship Rust extensions that talk to Claude's toolchain across CLI, Desktop, and Cowork. When something breaks in that ecosystem, we feel it immediately. And things break constantly.
This is not a hit piece. I want to be clear about that upfront. I genuinely believe Claude Code is the most capable AI coding tool available. I use it every day. I build my business on it. But its ecosystem has grown faster than its infrastructure can support, and the gaps are now structural. Not cosmetic. Not edge-case. Structural. If you're building on this platform professionally, you need to understand where the fault lines are. Not the marketing version. The strace-output version.
The Numbers Tell a Story
| Month | Issues Filed |
|---|---|
| Feb 2025 | 232 |
| Mar 2025 | 415 |
| Apr 2025 | 220 |
| May 2025 | 529 |
| Jun 2025 | 1,220 |
| Jul 2025 | 2,039 |
| Aug 2025 | 1,948 |
| Sep 2025 | 1,363 |
| Oct 2025 | 2,116 |
| Nov 2025 | 1,788 |
| Dec 2025 | 3,087 |
| Jan 2026 | 6,014 |
| Feb 2026 | 7,081 |
| Mar 2026 | ~9,100 (projected) |
June 2025 was the inflection point. That's when Claude Code went from a niche CLI tool beloved by early adopters to a mainstream development platform with millions of users. Desktop launched. Cowork followed shortly after. The marketplace opened. MCP became the de facto integration standard. Every one of those expansions multiplied the surface area for things to go wrong, and the multiplication was not linear. Each new surface interacted with every existing surface in ways that hadn't been tested.
The last fortnight alone has been brutal. March 11 saw over 1,400 reports on Downdetector. March 7 brought elevated error rates on Haiku 4.5. March 2 was a major worldwide outage, Anthropic citing "unprecedented demand", with over 2,000 Downdetector reports. Late February saw usage report API errors across the 26th and 27th. A DST transition bug caused infinite loops in scheduled tasks. API connection timeouts from upstream peering issues compounded everything further.
These are not isolated incidents. They are symptoms of a platform that crossed from "developer tool" to "developer infrastructure" without the corresponding investment in reliability engineering. The difference matters. A tool can have bugs. Infrastructure cannot.
Let me walk you through the five technical fault lines I've mapped over the past year, with the specific workarounds we've deployed at systemprompt.io to keep shipping through all of it.
1. Marketplace Schema Divergence
The Claude Code marketplace was supposed to unify plugin distribution. One manifest format, one validation pipeline, one installation flow across every surface. That was the theory. It was a good theory. It lasted about three months.
In practice, plugin manifests that work perfectly in the CLI fail silently in Desktop. Manifests that pass Desktop validation crash Cowork. Anthropic's own official marketplace ships plugins that fail their own validation checks (#26555). Read that again. The vendor's own plugins fail the vendor's own schema validation. This is not a third-party compatibility issue. This is the first-party reference implementation failing against the first-party validator.
The getPlugins interface in Cowork continuously throws schema validation errors (#24328). Not occasionally. Not under edge conditions. Continuously. Every single poll cycle. The error gets logged, ignored, and the poll fires again. Your server logs fill up with identical stack traces.
Auto-updates silently break MCP server configurations (#31864). A plugin that worked yesterday stops working today. No changelog mentions it. No notification appears. Nothing shows up in the user-facing logs. You discover it when your users report that their workflows stopped running, and you spend an hour checking your own infrastructure before realising the marketplace pushed a manifest format change that your plugin validator doesn't recognise.
Marketplace updates sometimes simply don't apply at all (#11856). You push a new version, the marketplace accepts it, users see the new version number, but the actual plugin code running is still the old version. And there are fundamental incompatibilities between MCP server types across surfaces (#3140), meaning certain server configurations that are perfectly valid on one surface are architecturally impossible on another.
The root cause is depressingly straightforward. Three different surfaces (CLI, Desktop, Cowork) each implement their own manifest parser, their own validation logic, and their own installation pipeline. There is no shared schema validation library. There is no contract test suite. Each surface evolved independently under separate teams with separate release cadences, and the divergence compounds with every release. What started as minor differences in how optional fields are handled has grown into fundamentally different interpretations of what a valid manifest looks like.
Our workaround: Server-side dynamic marketplace.json generation. Rather than shipping a static manifest and hoping each surface parses it correctly, we generate surface-specific manifests at request time. The server inspects the incoming request, identifies which client is asking (via User-Agent headers and capability negotiation handshakes), and returns a tailored response with only the fields that surface can handle. We maintain three manifest templates and a compatibility matrix. It's ugly. It works. We covered the full plugin publishing pipeline in depth in our marketplace plugin guide.
2. TLS and Certificate Termination
This one cost me three days. Three full days of strace output, Wireshark captures, and increasingly creative profanity.
Bun, the JavaScript runtime that Claude Code bundles as its execution environment, includes BoringSSL as its TLS implementation. BoringSSL is Google's fork of OpenSSL. It is well-engineered, well-maintained, and completely indifferent to your system configuration. It does not read the system CA store. It ignores NODE_EXTRA_CA_CERTS. It ignores SSL_CERT_FILE. It ignores NODE_TLS_REJECT_UNAUTHORIZED=0. If your certificate chain doesn't match what BoringSSL's compiled-in trust store expects, you get a TLS handshake failure with an error message that could charitably be described as "unhelpful".
Let's Encrypt switched to ECDSA certificates with the E7 intermediate earlier this year. Perfectly valid certificates, trusted by every browser on the planet, accepted by every TLS library you've ever used. Rejected by Claude Code's bundled Bun (#31777). If you're running an MCP server behind Let's Encrypt, and you got an automatic certificate renewal after the E7 switchover, your server stopped working with Claude Code. No warning. No deprecation notice. Just a TLS error that looks identical to an expired certificate.
It gets worse. The TLS SNI implementation in Bun appends the port number to the hostname during the handshake (#29963). I found this by running strace on a failing connection and reading the raw bytes:
sendto(28, "...\x00\x1d\x00\x1b\x00\x00\x18google.com:443...")
That's the SNI extension sending google.com:443 instead of google.com. SNI, Server Name Indication, exists so that a single IP address can serve multiple TLS certificates. The hostname in the SNI extension tells the server which certificate to present. Every TLS library on every server in the world expects a bare hostname. Bun sends hostname-colon-port. RFC 6066 is explicit about this. The hostname must not include a port number. Some servers are lenient and strip the port. Many are not. Many perform an exact match against their certificate's Subject Alternative Names, find no match for google.com:443, and terminate the handshake. You get a cryptic failure and the error message mentions nothing about SNI.
I spent an entire afternoon on that one before I thought to look at the raw bytes. The error message said "certificate verify failed". The certificate was fine. The SNI was wrong. Good luck debugging that without strace.
Self-signed certificates on macOS produce different cryptic errors (#24470). Mutual TLS has been broken since v2.1.23 (#21956), which means any enterprise deployment requiring client certificate authentication simply cannot use Claude Code's native HTTP client. Corporate environments running Zscaler or similar SSL inspection proxies are comprehensively broken (#25977), because Zscaler inserts its own CA into the system trust store, and BoringSSL does not read the system trust store. Desktop doesn't forward NODE_EXTRA_CA_CERTS to spawned processes (#22559), so even if you find a workaround for the parent process, child processes fail differently.
A fix exists upstream. The Bun team merged a comprehensive patch on March 8 (oven-sh/bun#27890) that addresses the CA store reading and SNI formatting issues. But it hasn't been bundled into Claude Code yet. Anthropic ships their own Bun build, and the update cycle is not immediate. So we wait. And we work around.
Our workaround: We terminate TLS at Caddy before anything reaches Bun. All MCP servers bind to localhost on plain HTTP. Caddy sits in front, handles certificate management with ACME, constructs proper certificate chains, and handles SNI correctly because it's written in Go and uses Go's crypto/tls, which actually reads the system CA store. This completely sidesteps Bun's TLS stack. The MCP servers never see a TLS handshake. Caddy handles it all. For the full MCP server setup, see our Rust MCP server guide.
3. The Cloudflare Proxy Catch-22
So Bun rejects your Let's Encrypt certificates. The obvious fix is to put Cloudflare in front of your origin server. Cloudflare terminates TLS with its own certificates. Cloudflare's certificates use RSA with well-known intermediates that BoringSSL's compiled-in trust store recognises. Problem solved.
Except now you have a new problem. Cloudflare's bot detection system flags Claude Code's requests as automated traffic. Because they are automated traffic. Claude Code sends HTTP requests from server IP addresses, with non-browser User-Agent strings, at machine-speed intervals. Cloudflare's heuristics correctly identify this as bot behaviour and serve a challenge page.
OAuth flows from VPS-hosted instances fail with a cf-mitigated: challenge header (#21678). Cloudflare serves a JavaScript challenge page. The challenge requires a browser to execute JavaScript, solve a computational puzzle, and submit the result. Claude Code's HTTP client isn't a browser. It can't execute JavaScript challenges. Authentication fails. The error you see is a 403 with an HTML body containing Cloudflare's challenge page. No one reads the HTML body. Everyone sees "403 Forbidden" and assumes their credentials are wrong.
Desktop is even more entertaining. It enters an infinite Turnstile redirect loop (#25611). I measured 30 redirect errors per second during one debugging session. The browser component embedded in Desktop tries to complete the Cloudflare challenge. It gets redirected. It tries again. It gets redirected again. The embedded browser doesn't have the same fingerprint as a standalone browser, so Cloudflare keeps challenging it. The CPU fan spins up. Memory consumption climbs. The application becomes unresponsive. You force-quit and try again. Same result. Thirty times per second, indefinitely.
Users can't log in at all when Cloudflare verification is active on Anthropic's own endpoints (#9885). Cloudflare Warp, which is Cloudflare's own VPN product marketed as making the internet faster and more secure, breaks Claude Code connectivity entirely (#10050). MCP OAuth flows silently fail behind Cloudflare proxies (#26917), meaning your MCP server's authentication flow works in testing, works with curl, works with Postman, and fails when Claude Code is the client.
The catch-22 is real and it is not hypothetical. You need Cloudflare to fix the TLS problem because Bun won't accept Let's Encrypt certificates. Cloudflare creates the authentication problem because its bot detection correctly identifies machine traffic. You cannot have both TLS termination and bot protection working simultaneously with the default configuration. Pick one.
Our workaround: Cloudflare WAF custom rules with surgical precision. We whitelist Claude Code's User-Agent patterns and the specific IP ranges used by Anthropic's authentication endpoints. We created a separate origin-pull configuration for MCP endpoints that bypasses the bot detection layer entirely but keeps DDoS protection active. It's a maintenance burden that I would rather not have. Every time Claude Code updates its User-Agent string, which happens without notice in minor version bumps, we have to update the WAF rules. But it keeps both TLS termination and bot protection functional simultaneously, which is what production requires. We wrote about this and related integration patterns in our MCP servers and extensions guide.
4. Cowork VM Sandbox on Windows
Cowork, Anthropic's collaborative coding environment, runs workspaces inside lightweight VMs for isolation. On macOS, where it uses Apple's Virtualization.framework, this works reasonably well. On Windows, where it uses Hyper-V, it is a disaster. I don't use that word lightly. I've been building software professionally for over fifteen years. This is a disaster.
VMs crash within five minutes of launch (#25206). Not sometimes. Not under unusual conditions. Reliably. Five minutes. I've timed it repeatedly. The sandbox-helper process fails to unmount virtiofs and Plan9 filesystem shares (#25419) when the VM shuts down, whether cleanly or through a crash. The stale mount points persist and prevent subsequent VM launches. The error messages reference internal paths that aren't documented anywhere. Workspaces become permanently bricked (#25663), requiring manual cleanup of VM state files that are scattered across three different directories, none of which are mentioned in the documentation.
Every Cowork launch spawns a 1.8GB Hyper-V VM (#29045). Let me be precise about when this happens. It happens for every session. Every single one. Even for a simple chat session where you ask a question about JavaScript syntax. Even if your project has no files. 1.8 gigabytes of Hyper-V VM, with a full Linux kernel boot, virtio driver initialisation, filesystem mount, and network configuration. To answer a question about syntax. I found 2,689 stale session files from crashed VMs on one of our test machines. That's not a typo. Two thousand, six hundred and eighty-nine session state files, each with associated VM disk images and configuration fragments. Nobody cleans these up automatically. There is no garbage collection. There is no session reaper. They accumulate until your disk fills up.
The virtiofs mount, which is supposed to share the host filesystem with the VM, produces "bad address" errors (#31520). VM downloads fail with EXDEV cross-device link errors (#30584) because the download target and the final destination are on different filesystems inside the VM, and the move operation isn't implemented as copy-and-delete. Users with Hyper-V enabled and correctly configured get told "Virtualization not enabled" (#27420), because the detection logic checks for a different virtualisation feature than the one Hyper-V actually uses on newer Windows builds. And the configuration option sandbox.enabled: false, which the documentation says should disable the VM entirely and run in direct mode, is simply ignored (#28880). You set it. You restart. The VM launches anyway.
Our workaround: We don't use Cowork on Windows for production work. Full stop. There is no configuration that makes it reliable. CLI through WSL2 is the only Windows workflow we trust. We route all Windows-based development through WSL2 with Claude Code CLI installed inside the Linux environment, connecting to MCP servers running on the Linux side. Desktop on Windows is acceptable for interactive use, things like code review and conversation, but only if you avoid Cowork workspaces entirely. The moment you open a Cowork workspace on Windows, you're rolling dice. We documented our daily workflow patterns, including the complete WSL2 setup and the reasoning behind it, in our daily workflows guide.
5. Cross-Platform Fragmentation and the MSIX Discovery
This is where it gets properly interesting. The kind of interesting where you stare at a hex dump for two hours and then laugh out loud when you realise what you're looking at.
Claude Code now runs across multiple surfaces (CLI, Desktop, Cowork), multiple platforms (macOS, Windows, Linux), and multiple distribution channels (npm, Homebrew, Microsoft Store MSIX, direct download). The feature matrix across these combinations is not documented anywhere. I've started mapping it, and the matrix is sparse. Features that work in CLI don't work in Desktop. Features that work on macOS don't work on Windows. Features that work when installed via npm don't work when installed via MSIX. The wrong config file gets opened when you click "Settings" in certain configurations (#26073). MITM proxy detection blocks legitimate corporate setups (#18854). Built-in MCP servers, the ones that ship with the product, fail to start on certain platform combinations (#27625).
But the real discovery came during a live debugging session on March 12. Today.
I had 56 hooks registered in Claude Code. This is our standard production configuration for the systemprompt.io development workflow. Pre-commit hooks, post-save hooks, notification hooks, analytics hooks, build triggers. Every hook was firing correctly. I could see the hook runner invoking each one. Every hook's HTTP callback was hitting ECONNREFUSED. On Windows. The identical setup on WSL2, same machine, same hooks, same target URLs, same hook server running on the same port, worked perfectly.
I checked the obvious things first. Firewall rules. Port binding. Process listening. The hook server was running. Netstat confirmed it was listening on the correct port. Curl from a separate terminal connected fine. The hooks spawned correctly. The hook scripts executed. But the HTTP client inside the hook process could not connect to localhost. Connection refused. Every single time.
I spent two hours on this before I found the root cause. The Microsoft Store edition of Claude Desktop is packaged as MSIX. MSIX is Microsoft's modern application packaging format, and it includes a sandboxing mechanism called AppContainer. The AppContainer has network capability declarations in its package manifest. These capabilities define what network operations the application is allowed to perform. Claude Desktop's MSIX manifest declares internetClient but not internetClientServer.
Here is what those capabilities mean in practice. The internetClient capability allows outbound connections to remote hosts. Any host on the internet. Fine. The internetClientServer capability is different. It allows the application to act as a network server and, critically, it allows connections to the localhost loopback address from within the AppContainer sandbox. Without internetClientServer, an AppContainer application cannot connect to 127.0.0.1 or ::1. This is a Windows security feature. It is working as designed.
So Claude Desktop can connect to api.anthropic.com. It can connect to github.com. It can connect to any remote host. But its in-process HTTP client cannot connect to a server running on the same machine at localhost:3000. Spawned child processes (bash, node, python) escape the AppContainer sandbox because they are separate executables not covered by the MSIX manifest. They connect to localhost fine. But the parent process, the one actually running the hooks and making the HTTP callbacks, cannot.
This is a single missing capability declaration in an MSIX manifest file. Four words in an XML file. internetClientServer alongside internetClient. It breaks every hook that calls back to a local server. Every MCP server running on localhost. Every local development workflow that relies on the hook runner's built-in HTTP client. And because spawned child processes work fine, it only manifests when the hook runner itself, rather than a spawned script, makes the HTTP call. That makes it incredibly difficult to diagnose. The behaviour looks like a firewall issue. Or a port binding issue. Or a race condition where the server isn't ready yet. It's none of those things. It's an AppContainer sandbox permission that nobody thinks to check because most developers have never heard of AppContainer capabilities.
Our workaround: We restructured our hook architecture so that hooks always spawn a child process to make HTTP calls, rather than making calls in-process from the hook runner. The child process, being a separate executable, escapes the AppContainer sandbox and connects to localhost successfully. We use a thin shell script wrapper that receives the URL and payload as arguments, makes the HTTP call with curl, and returns the response. It adds 50-80ms of latency per hook invocation due to the process spawn overhead. It adds complexity to the error handling because process exit codes don't map cleanly to HTTP status codes. But it works across both MSIX and non-MSIX installations, which is what matters.
For the full hook architecture and patterns we use, including the process-spawn workaround, see our hooks and workflows guide.
Server-Side Mitigation
Beyond the surface-specific workarounds, we've had to implement server-side mitigations for Claude Code's runtime behaviour. Two patterns have been particularly critical, and I suspect they'll be useful to anyone running hook servers or MCP endpoints.
First, the Bun.serve idle timeout. Bun's HTTP server has a default idle timeout of 10 seconds. If a connection doesn't receive a request within 10 seconds, Bun closes it. This sounds reasonable until you realise that Claude Code's hook runner maintains persistent connections to hook servers and doesn't always send requests within that window. When multiple hooks fire simultaneously, the hook runner processes them sequentially. If your hook is number 47 out of 56, the connection was established when the batch started but your request doesn't arrive until many seconds later. By which time Bun has closed the connection. The hook runner sees a connection reset and reports the hook as failed. The fix is blunt:
// Bun.serve mitigation for hook server
Bun.serve({ idleTimeout: 255 }); // Max idle timeout, prevents 10s default killing connections
255 seconds is the maximum value Bun allows for idleTimeout. It means connections sit around for over four minutes, which is wasteful. It also means hooks don't randomly fail because they were 47th in the queue. Crude but effective.
Second, synchronous operations in hook handlers. This one bit us hard. A single execSync("git status") in a hook handler blocks Bun's event loop. Bun is single-threaded for JavaScript execution, just like Node. While that synchronous git command runs, every other HTTP request to the hook server is queued. If you have ten hooks that all need to run git commands, the first one runs in 200ms, the second waits 200ms and then runs in 200ms, and by hook number ten, you're at two seconds of total latency. Claude Code's hook runner has a timeout. If your hook doesn't respond in time, it gets marked as failed and silently disabled. We converted every synchronous operation to async spawning:
// Convert sync operations to async to prevent event loop blocking
const proc = Bun.spawn(["git", "status"], { cwd: workdir });
const output = await new Response(proc.stdout).text();
The Bun.spawn approach runs the child process without blocking the event loop. Other requests can be served while git runs. The response is awaited asynchronously. Total throughput increases by an order of magnitude.
These are not optional optimisations. They are not "nice to have in production". Without them, hook servers under any reasonable load become unreliable. Requests queue behind synchronous operations, timeouts fire, hooks report failure, and Claude Code silently disables hooks it considers "unhealthy". The word "silently" is doing a lot of work in that sentence. You don't get a notification. You don't get a log entry in any user-visible log. The hooks just stop firing. You discover this when a workflow that depends on hooks stops working and you spend an hour debugging your own code before realising the hook runner decided your server was unhealthy and stopped calling it.
The Recursive Loop
There's a dark comedy to all of this that I can't quite shake.
We built a marketplace plugin called "Debugging Claude on Windows". It contains three skills: debug-claude-hooks, debug-cowork-vm, and debug-mcp-connections. Each skill runs diagnostic checks against the local Claude Code installation, inspects configuration files, tests network connectivity, and generates a report. The plugin uses Claude Code to diagnose problems with Claude Code. The tool that's broken is the tool we're using to figure out why it's broken.
I've had sessions where Claude Code's MCP connection drops mid-diagnosis of why MCP connections drop. Where the hook server crashes while investigating hook server crashes. Where the VM sandbox kills the workspace while we're debugging VM sandbox kills. Each of these happened more than once. Each time, I had to restart the diagnostic from scratch, which meant restarting the tool that was causing the failure, which meant potentially triggering the failure again.
There's a term for this in reliability engineering. Cascading failure mode. When your diagnostic tooling shares failure modes with the system being diagnosed, you cannot trust the diagnostics. Every negative result might be a genuine negative or it might be the diagnostic tool itself failing. You need external observability. You need a tool outside the blast radius. For us, that means logging everything to an external HTTP endpoint that doesn't go through Claude Code's hook runner, doesn't use Bun's HTTP client, and doesn't run inside an AppContainer sandbox. Plain curl. Plain logs. Plain text files. Old-fashioned, boring, and trustworthy.
It would be funny if it weren't also our production infrastructure.
What This Actually Means
The Claude Code ecosystem is not failing. It's succeeding faster than its engineering can absorb. Anthropic is shipping features at a pace that would make most engineering organisations dizzy. New surfaces. New integration points. New capabilities. The ambition is genuinely impressive. But the integration testing between surfaces, the cross-platform CI matrix, the schema validation unification, the TLS stack modernisation, the MSIX manifest review, the VM lifecycle management, the connection pool tuning... these are the unglamorous infrastructure investments that haven't kept pace with the feature velocity.
The issue queue growth tells the story clearly. 232 issues in February 2025 was a CLI tool used by early adopters who expected rough edges and filed thoughtful bug reports. 7,081 issues in February 2026 is a mainstream development platform used by teams who expect their tools to work the way their other tools work. Reliably. Consistently. Across platforms. The product crossed that threshold somewhere around June 2025, when the issue count jumped from 529 to 1,220 in a single month, and the infrastructure hasn't caught up yet.
I'm not writing this to complain. Complaining doesn't ship code. I'm writing this because the information vacuum around these issues is actively harmful. Developers encounter these problems, assume they're doing something wrong, spend hours debugging their own configurations, and eventually give up or work around it by accident. Every one of the workarounds I've described took days to discover. The MSIX AppContainer issue took a strace session, a Process Monitor capture, and a deep dive into Microsoft's capability documentation from 2019. Nobody should have to do that independently. These solutions should be documented. They should be shared. That's what I'm doing here.
For those of us building on this ecosystem professionally, the choice is clear. You can wait for Anthropic to fix these issues, which they will, eventually. Their engineering team is capable and the upstream fixes are moving in the right direction. Or you can build the mitigation layer now and iterate as the platform matures. We chose the second path. systemprompt.io exists in large part because the gap between Claude Code's capabilities and its reliability created a market for exactly this kind of infrastructure layer. The tools work brilliantly when they work. Our job is making sure they work.
The growth chart nobody shows you isn't the one going up. It's the gap between what the platform can do and what the platform can do reliably. That gap is where the work is. And right now, that gap is growing.
Further Reading
- Claude Code Hooks and Workflows for the full hook architecture patterns and cross-platform setup
- Building MCP Servers in Rust for server-side TLS termination and MCP transport patterns
- Claude Code MCP Servers and Extensions for the integration layer between Claude Code and external services
Originally published on systemprompt.io.
Top comments (0)