stockyard-dev

Posted on Mar 31

The LLM proxy landscape in 2026: Helicone acquired, LiteLLM compromised, and what's next

#ai #opensource #security #devtools

The LLM proxy space has changed fast in 2026. Two significant events happened in Q1 that are worth understanding if you're picking infrastructure for routing LLM traffic. Then I'll give honest takes on the main options.

What happened in Q1 2026

Helicone acquired by Mintlify (March 3)

Helicone, one of the more established LLM observability and proxy tools, was acquired by Mintlify. The Helicone team announced the acquisition and stated the product would enter maintenance mode. No new feature development. Bug fixes only, timeline unclear.

If you're running Helicone in production, this is a risk signal. Maintenance mode means the product isn't being actively developed against a fast-moving ecosystem. Provider API changes won't get fast fixes. New models won't be prioritized.

LiteLLM PyPI supply chain attack (March 24)

A supply chain attack targeting LiteLLM was discovered on March 24. Malicious packages were published to PyPI that impersonated LiteLLM packages. The attack targeted developers installing or updating LiteLLM via pip.

LiteLLM itself (the legitimate package) was not compromised, but the attack surface here is real: pip's ecosystem has a history of typosquatting and dependency confusion attacks. If your LLM proxy is a Python package installed via pip in production, you have supply chain exposure that a compiled binary doesn't have.

This is not a knock on LiteLLM's code quality. It's a structural observation about Python package distribution.

The current landscape

Here are the main alternatives and honest assessments of each:

Portkey

Portkey supports 200+ providers, which is impressive. The routing and fallback features are solid. The catch: observability and the more advanced features are cloud-locked. If you need on-premise observability for compliance reasons, Portkey isn't the answer. The open-source version exists but the features that matter are in the cloud product.

Langfuse

Langfuse is good at what it does, which is observability. It's not a proxy. You can self-host it, the tracing is detailed, and it integrates with most frameworks. But if you need request routing, caching, or rate limiting at the proxy layer, Langfuse doesn't do that. It's a complementary tool, not a replacement.

OpenRouter

OpenRouter gives you access to a huge model catalog through one API. It's genuinely useful for trying models without managing individual provider credentials. The tradeoffs: it's cloud-only, there's no self-hosted option, and they take a 5.5% fee on top of provider costs. For production workloads where you're sending significant volume, that fee adds up. Also, your traffic goes through their infrastructure, which is a compliance consideration for some use cases.

TensorZero

TensorZero is doing something different: it focuses on the optimization loop, using feedback signals to improve prompts and routing over time. Interesting approach. The limitation is that it doesn't do caching or rate limiting today. If those are requirements, TensorZero isn't a complete solution on its own.

Where Stockyard fits

Stockyard is a self-hosted LLM proxy that ships as a single ~25MB Go binary with embedded SQLite. No external dependencies, no cloud component required. It handles routing, caching, rate limiting, and request logging across 40+ providers.

The honest tradeoffs: the embedded SQLite means no horizontal scaling. If you need to run multiple proxy instances sharing state, that's not supported. Single-instance deployments handle most workloads fine given LLM API latency, but it's a real constraint.

The supply chain story: it's a Go binary. There's no pip install, no package manager, no transitive dependency injection vector at runtime. The attack surface is different.

Stockyard's proxy core is Apache 2.0. The full platform with dashboard and team features is BSL 1.1. Source is at github.com/stockyard-dev/Stockyard.

Full disclosure: I built Stockyard, so I'm biased. But the comparison data is accurate.

Top comments (2)

Max Quimby • Apr 17

What a thorough breakdown of a rough quarter for LLM infrastructure. The LiteLLM supply chain attack hit teams hard — we had production systems using it and the scramble to audit whether we were running the compromised version was genuinely stressful.

What this exposes is the fragility of treating LLM gateways as commodity infrastructure. They're now critical-path for a huge portion of AI systems, but often maintained by small teams with the same supply chain exposure as any other OSS project. We don't apply the same scrutiny to AI middleware that we do to, say, cryptography libraries — and we probably should.

After the incident, we seriously evaluated building a minimal internal proxy — just routing, retry logic, and audit logging — deliberately cutting features to shrink the attack surface. There's a real case for "boring" infrastructure that does 20% of what LiteLLM does but can be fully read in an afternoon.

One thing worth watching: if Helicone disappears into Mintlify's docs platform, that's a signal that standalone LLM observability may not be a defensible product category. The tooling layer is consolidating faster than the infrastructure layer.

Max Quimby • Apr 18

The LiteLLM supply chain attack was a wake-up call for a lot of teams, and I think the industry still hasn't fully processed what it means for how we pin and audit LLM infrastructure dependencies. It's not just about switching to an alternative — the real issue is that any popular open-source gateway with broad permissions over your API keys and traffic is a high-value target. The attack surface is real and growing as these tools become more central to production systems.

The Helicone situation is a different kind of problem: vendor concentration risk dressed up as acquisition. When a product you depend on enters "maintenance mode," the clock starts ticking on migration whether you've budgeted for it or not. The teams that got burned were the ones who treated Helicone as infrastructure rather than a dependency.

On the alternatives: what's your current take on TensorZero for teams that need heavy compliance and audit logging? It seemed underrated in most comparisons I've seen — the stateful evaluation loop is interesting for fine-tuning workflows. Curious whether you've stress-tested it for high-throughput production traffic or only evaluated it at smaller scale.