Last week I ran a simple test. I sent "Say hi" to two different AI APIs.
Claude/OpenAI cost: $0.000179
Through my routing proxy: $0.000004
Same response quality. 45x price difference.
The insight
I analyzed a month of API traffic from a production app. Here's what I found:
- 72% of requests were simple tasks: classification, extraction, summarization, translation
- 18% were medium: multi-step analysis, moderate code generation
- 10% were genuinely hard: complex reasoning, system design, novel code
The simple tasks ran identically on DeepSeek V4 at $0.14/M tokens. We were paying Claude $15/M for the same work. That's a 100x markup on commodity tasks.
The solution
I built an OpenAI-compatible proxy that classifies each request and routes to the cheapest capable model:
| Complexity | Model | Cost/M tokens |
|---|---|---|
| Simple (~70%) | DeepSeek V4 | $0.14 / $0.28 |
| Medium (~20%) | DeepSeek R1 | $0.55 / $2.19 |
| Hard (~10%) | Claude Sonnet | $3.00 / $15.00 |
The classifier itself uses DeepSeek (cost: ~$0.001 per classification). If a cheap model fails, it auto-fallbacks to the next tier.
How to use it
Change one line. Everything else stays the same:
from openai import OpenAI
# Before
client = OpenAI(
api_key="sk-your-openai-key",
base_url="https://api.openai.com/v1"
)
# After
client = OpenAI(
api_key="sk-bridge-your-key",
base_url="https://ai-bridge-router-et30.onrender.com/v1"
)
# This code doesn't change at all
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Classify this as positive or negative: I love this product"}],
stream=True
)
The response includes routing metadata:
"_bridge": {
"complexity": "simple",
"model": "DeepSeek V4",
"cost_usd": 0.000004,
"benchmark_usd": 0.000179
}
The math at scale
For a team spending $3,000/month on AI APIs:
- 70% simple tasks: $3K × 0.7 × ($0.21/$5.00) = $88
- 20% medium: $3K × 0.2 × ($1.37/$5.00) = $164
- 10% hard: $3K × 0.1 × ($9.00/$5.00) = $540
- Total: ~$792/month instead of $3,000
That's a 74% reduction, or $26,496 saved per year.
Try it
Free tier available (50K tokens/day, no credit card):
Or self-host: GitHub repo
Technical details
- Stack: Python, FastAPI, httpx async, SQLite
- Streaming: Full SSE support, converts Anthropic format to OpenAI format on the fly
- Classification: LLM-based (DeepSeek classifies requests) with keyword fallback
- Fallback: Auto-escalates if cheap model returns error
The entire router is a single Python file. MIT licensed.
If you're spending $500+/month on AI APIs, the routing proxy probably pays for itself in the first week. Try it free or self-host from GitHub.
Top comments (0)