DEV Community: stockyard-dev

The LLM proxy landscape in 2026: Helicone acquired, LiteLLM compromised, and what's next

stockyard-dev — Tue, 31 Mar 2026 05:49:04 +0000

The LLM proxy space has changed fast in 2026. Two significant events happened in Q1 that are worth understanding if you're picking infrastructure for routing LLM traffic. Then I'll give honest takes on the main options.

What happened in Q1 2026

Helicone acquired by Mintlify (March 3)

Helicone, one of the more established LLM observability and proxy tools, was acquired by Mintlify. The Helicone team announced the acquisition and stated the product would enter maintenance mode. No new feature development. Bug fixes only, timeline unclear.

If you're running Helicone in production, this is a risk signal. Maintenance mode means the product isn't being actively developed against a fast-moving ecosystem. Provider API changes won't get fast fixes. New models won't be prioritized.

LiteLLM PyPI supply chain attack (March 24)

A supply chain attack targeting LiteLLM was discovered on March 24. Malicious packages were published to PyPI that impersonated LiteLLM packages. The attack targeted developers installing or updating LiteLLM via pip.

LiteLLM itself (the legitimate package) was not compromised, but the attack surface here is real: pip's ecosystem has a history of typosquatting and dependency confusion attacks. If your LLM proxy is a Python package installed via pip in production, you have supply chain exposure that a compiled binary doesn't have.

This is not a knock on LiteLLM's code quality. It's a structural observation about Python package distribution.

The current landscape

Here are the main alternatives and honest assessments of each:

Portkey

Portkey supports 200+ providers, which is impressive. The routing and fallback features are solid. The catch: observability and the more advanced features are cloud-locked. If you need on-premise observability for compliance reasons, Portkey isn't the answer. The open-source version exists but the features that matter are in the cloud product.

Langfuse

Langfuse is good at what it does, which is observability. It's not a proxy. You can self-host it, the tracing is detailed, and it integrates with most frameworks. But if you need request routing, caching, or rate limiting at the proxy layer, Langfuse doesn't do that. It's a complementary tool, not a replacement.

OpenRouter

OpenRouter gives you access to a huge model catalog through one API. It's genuinely useful for trying models without managing individual provider credentials. The tradeoffs: it's cloud-only, there's no self-hosted option, and they take a 5.5% fee on top of provider costs. For production workloads where you're sending significant volume, that fee adds up. Also, your traffic goes through their infrastructure, which is a compliance consideration for some use cases.

TensorZero

TensorZero is doing something different: it focuses on the optimization loop, using feedback signals to improve prompts and routing over time. Interesting approach. The limitation is that it doesn't do caching or rate limiting today. If those are requirements, TensorZero isn't a complete solution on its own.

Where Stockyard fits

Stockyard is a self-hosted LLM proxy that ships as a single ~25MB Go binary with embedded SQLite. No external dependencies, no cloud component required. It handles routing, caching, rate limiting, and request logging across 40+ providers.

The honest tradeoffs: the embedded SQLite means no horizontal scaling. If you need to run multiple proxy instances sharing state, that's not supported. Single-instance deployments handle most workloads fine given LLM API latency, but it's a real constraint.

The supply chain story: it's a Go binary. There's no pip install, no package manager, no transitive dependency injection vector at runtime. The attack surface is different.

Stockyard's proxy core is Apache 2.0. The full platform with dashboard and team features is BSL 1.1. Source is at github.com/stockyard-dev/Stockyard.

Full disclosure: I built Stockyard, so I'm biased. But the comparison data is accurate.

I built an open source LLM proxy as a single Go binary — here's why

stockyard-dev — Tue, 31 Mar 2026 05:43:51 +0000

About 18 months ago I started building Stockyard. It's an LLM proxy: you point your apps at it instead of directly at OpenAI or Anthropic or Gemini, and it handles routing, caching, rate limiting, logging, and retries. You can self-host it in under a minute.

The interesting design decision: it ships as a single ~25MB Go binary with embedded SQLite and zero external dependencies.

That choice drives everything else. Here's why I made it, and where it hurts.

Why not Postgres + Redis?

Most infrastructure projects reach for Postgres and Redis by default. It's a reasonable stack. But for an LLM proxy, it creates friction exactly where you don't want it:

Deploying means provisioning two additional services
"Try it out" becomes a multi-step process
On-call incidents now include "is it the proxy or the DB?"

I wanted onboarding to be: download binary, run it, change one URL in your code. That's it. No YAML, no Compose file, no managed database tier.

SQLite handles this surprisingly well for most workloads. LLM proxy traffic is write-heavy (logging requests) but not write-concurrent in ways that stress SQLite. Reads are fast. The database file lives next to the binary. Backups are a file copy.

Why Go?

A few reasons, in order of how much they actually mattered:

Static compilation. go build produces a self-contained binary. No runtime, no shared libraries, no "which Python version." The Linux binary runs on Linux. The macOS binary runs on macOS. This isn't magic, it's just how Go works, but it's genuinely useful for distribution.

Goroutines. An LLM proxy is fundamentally a concurrent problem: you're waiting on upstream API calls, you're handling multiple requests at once, you might be streaming. Go's goroutine model handles this without the overhead of threads-per-request. The event loop alternative (Node, Python asyncio) works too, but Go's concurrency is easier to reason about when things go wrong.

Cross-compilation. GOOS=linux GOARCH=arm64 go build just works. I release binaries for Linux amd64, Linux arm64, macOS amd64, macOS arm64, and Windows amd64 from a single CI step. That would be much harder in most other languages.

Boring is good. Go is not exciting. It has no macros, limited generics, verbose error handling. For infrastructure code, that's a feature. The codebase is readable by anyone who knows Go, and most of it is readable by anyone who programs.

The "change one URL" onboarding

Here's what using Stockyard actually looks like:

# Download and run
curl -L https://stockyard.dev/install.sh | sh
stockyard start

# Your code, before:
client = OpenAI(api_key="sk-...")

# Your code, after:
client = OpenAI(api_key="sk-...", base_url="http://localhost:8080/openai")

That's the entire migration for OpenAI. Same pattern for Anthropic, Gemini, Mistral, and 37 other providers. The proxy speaks each provider's native API format, so your existing SDK code keeps working.

Lessons from 40+ provider integrations

Supporting 40+ LLM providers taught me things I didn't expect:

API inconsistency is the norm. Every provider has slightly different authentication (Bearer token, x-api-key header, query param, custom header), different streaming formats, different error codes. There's no standard. OpenAI's format is the closest thing to one, and providers that claim "OpenAI-compatible" mean it to varying degrees.

Models come and go fast. I've had to update provider configs for model deprecations more times than I can count. Building this as config-driven rather than hard-coded matters a lot.

Streaming is where bugs hide. Non-streaming request/response is straightforward to proxy. Streaming is not. Different providers use different SSE formats, different done signals, different error injection patterns mid-stream. Robust streaming support took probably 40% of the proxy implementation work.

Documentation is often wrong. Provider API docs lag behind actual behavior. The ground truth is the HTTP traffic.

The honest tradeoff: embedded SQLite = no horizontal scaling

Here's what I won't pretend isn't true: SQLite doesn't scale horizontally.

If you're running a single instance, this doesn't matter at all. If you need to run two instances behind a load balancer and have them share state (for rate limiting, caching), you can't. You'd need to switch to a different persistence layer.

For the majority of Stockyard's users, a single instance handles the load. LLM APIs are slow (200ms-2000ms per request), so you can handle a lot of concurrent requests before you need more than one proxy instance. But if you're at the scale where this matters, you'll need to work around it.

I'm not planning to swap out SQLite for Postgres. The single-binary property is core to the project's identity. If horizontal scaling is a requirement, Stockyard might not be the right tool.

Where it is today

Stockyard is live at stockyard.dev. The proxy core is Apache 2.0. The full platform (dashboard, team features) is BSL 1.1. Source is at github.com/stockyard-dev/Stockyard.

It supports 40+ providers, has a web dashboard for request logging and analytics, handles caching and rate limiting, and ships as that single ~25MB binary.

If you're routing LLM traffic through multiple providers or want visibility into what your app is actually sending to APIs, it might be useful to you.