I didn't set out to build an LLM router. Genuinely didn't.
I was just tired of paying per-token rates every time I tested something. My own projects were burning through API credits doing nothing special, just running experiments, pipelines, small automations. The math didn't make sense anymore.
So I built a proxy layer in front of multiple providers. Kept it private. Used it myself. Didn't think much of it.
Three months later I looked at the logs and saw 7 to 8 billion tokens routed through it. That's when I realized this thing was actually production-grade.
What's running through it
The model mix I use day to day: Llama 70B, DeepSeek, Qwen3, MiniMax M3, StepFun 3.7 Flash, Cerebras, and a few others. All open source, no GPT-4o, no Claude. If open source models cover your use case, the pricing difference vs per-token billing is significant.
What I actually learned
Provider reliability is not equal. Some providers quietly drop requests under load. You won't notice until you're in a late-night run and half your completions are just gone. Failover saved me more times than I expected.
Cerebras is genuinely fast. I added it half expecting it to be just another provider. Ended up making it the default for anything latency-sensitive. The difference is real.
Routing logic matters more than model selection. Just pointing to "the best model" isn't enough. You need fallback chains, timeout handling, and something tracking which providers are actually responding. That's where most of the actual engineering went.
Opening it up now
The product is called Janux. I'm opening it up with a founding price for the first 50 users.
Plans:
$5/month flat
$29/year
BYOK (bring your own key) is completely free
First month free if you pre-register now.
Only open source models. If that works for your stack, the pricing is a lot better than paying per token.
janux.studiotrx.in
Would genuinely love feedback from anyone building with LLMs seriously. What providers are giving you trouble? What's your current setup?
Top comments (0)