Every developer using LLMs faces the same three problems:
- Cost blindness — you cannot answer "how much did I spend today?"
- No failover — when OpenAI goes down, your app goes down
- Wasted money — identical prompts hit the API over and over instead of being cached
I built llmux to fix all three with zero code changes.
What is llmux?
A single Rust binary (~5MB) that sits between your code and every LLM API. It handles failover, caching, rate limiting, and cost tracking automatically.
Your code (any language)
|
http://localhost:4000
|
┌──────┐
│ llmux │ ← single binary, ~5MB
└──┬───┘
│
┌────┼────┬────────┐
▼ ▼ ▼ ▼
OpenAI Claude Gemini Ollama
Zero Code Changes
You change one environment variable. That is it.
# Before
export OPENAI_BASE_URL=https://api.openai.com
# After
export OPENAI_BASE_URL=http://localhost:4000/v1
Your existing code — Python, TypeScript, Go, whatever — keeps working exactly the same. llmux intercepts the calls and adds superpowers.
Quick Start
git clone https://github.com/LakshmiSravyaVedantham/llmux.git
cd llmux
cargo build --release
cp config.example.toml config.toml
# Edit config.toml with your API keys
./target/release/llmux start
That is it. Your gateway is running.
What You Get
| Feature | What it does |
|---|---|
| Multi-provider proxy | Routes to OpenAI, Anthropic, Google, Mistral, Ollama |
| Automatic failover | Provider down? Routes to next one automatically |
| Response caching | Identical prompts return cached responses — saves money instantly |
| Token budgets | Set daily spend caps — warn mode or hard block |
| Cost tracking | Real-time cost estimation per model, per provider |
| TUI dashboard | Live terminal dashboard showing spend, cache hits, request log |
| Request logging | Every call logged to embedded SQLite — query anytime |
The TUI Dashboard
Run llmux dash to see live stats:
┌─ llmux dashboard (q to quit) ──────────────────────┐
│ Requests today: 142 │
│ Cache hits: 38 (26.8%) │
│ Input tokens: 284,000 │
│ Output tokens: 71,000 │
│ Spend today: $3.4200 │
└─────────────────────────────────────────────────────┘
┌─ Recent Requests ───────────────────────────────────┐
│ Time Provider Model Status Cost │
│ 14:23:01 openai gpt-4 200 $0.0450 │
│ 14:22:58 openai gpt-4 200 $0.0000 │
│ 14:22:55 anthropic claude-sonnet 200 $0.0120 │
└─────────────────────────────────────────────────────┘
The second request shows $0.00 — that is a cache hit. Same prompt, zero cost.
How Failover Works
Configure providers with priorities:
[[providers]]
name = "openai"
api_key = "${OPENAI_API_KEY}"
priority = 1
base_url = "https://api.openai.com"
[[providers]]
name = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
priority = 2
base_url = "https://api.anthropic.com"
OpenAI returns a 5xx? llmux marks it unhealthy and routes to Anthropic. Connection refused? Same thing. Your app never sees the error.
The Cache Math
If 25% of your LLM calls are identical prompts (common in dev workflows, CI, repeated queries), and you spend $100/month:
- Without llmux: $100/month
- With llmux: $75/month (25% cache hits = $0)
- With aggressive caching (1hr TTL): $60/month
The cache is an in-memory LRU with configurable TTL. Keys are SHA256 hashes of provider + request body. No data leaves your machine.
Budget Protection
[budget]
daily_limit_usd = 5.00
action = "block"
Set action = "warn" to log warnings, or "block" to return HTTP 429 when the limit is hit. Never wake up to a surprise LLM bill again.
Why Rust?
- Single binary — no runtime, no dependencies, no Docker required
- ~15MB RAM at steady state
- <2ms proxy overhead per request
- Thread-safe — handles concurrent requests with zero data races
The entire gateway is ~1,600 lines of Rust across 7 modules.
What is Next
llmux is the first in a trilogy:
- llmux (this project) — gateway, caching, cost tracking
- llm-lens — observability and tracing for AI agents, builds on llmux request capture
- llm-guard — runtime safety monitor, detects loops, hallucinations, budget overruns
Each one is standalone but they compose into a full AI agent infrastructure stack.
Try It
git clone https://github.com/LakshmiSravyaVedantham/llmux.git
cd llmux && cargo build --release
Star the repo if this is useful: github.com/LakshmiSravyaVedantham/llmux
llmux is MIT licensed and open source. Built with Rust, axum, tokio, ratatui, and rusqlite.
Top comments (0)