I have a problem.
My team uses DeepSeek for reasoning tasks, Kimi for long document processing, MiniMax for multimodal stuff, and Qwen for heavy lifting.
That means four accounts, four API keys, four dashboards, four bills.
Every time I switch models, I have to change the base URL and auth header. It's exhausting.
What I built
A dead-simple proxy that normalizes everything to OpenAI-compatible format. But honestly? I realized someone else already did it better.
I found NovaStack – a unified gateway that takes one API key and one endpoint, then routes to different models based on the model parameter.
Here's what it looks like:
python
import requests
response = requests.post(
"https://api.novapai.ai/v1/chat/completions",
headers={"Authorization": "Bearer your-single-key"},
json={
"model": "deepseek-v4-pro", # or kimi-2.6, minimax-2.7, qwen3-235b
"messages": [{"role": "user", "content": "Explain async/await"}]
}
)
That's it. One endpoint. One key. Four models.
What surprised me
The models actually have distinct strengths
I assumed all frontier models were basically the same. They're not.
Task Best model
100K+ token document QA Kimi 2.6
Complex math/reasoning Qwen3 235B
Quick chat + code DeepSeek-V4 Pro
Image understanding MiniMax 2.7
Routing is cheaper than picking one
We used to just use DeepSeek for everything. Switching to per-task routing cut our monthly bill by about 35%.
Fallback matters more than I thought
When one model hits rate limits, the gateway can automatically retry with another. Saved us from multiple production incidents.
What broke
Not all models support streaming the same way
Some send different SSE formats. The gateway normalizes this, but I had to disable experimental features on one of our clients:
bash
export DISABLE_STREAMING_BETA=1
Cost tracking gets messy
The gateway provides a dashboard at novapai.ai/en-US/, but I still export logs to our own analytics for fine-grained per-task cost monitoring.
Model names aren't standardized
What NovaStack calls qwen3-235b might be different from what another provider calls it. Stick with one provider's naming convention.
My current setup
A simple YAML config that defines routing rules:
yaml
routes:
- match: context_length > 80000 model: kimi-2.6
- match: task_type == "reasoning" model: qwen3-235b
- default: deepseek-v4-pro Then my app just calls NovaStack with whatever model the router picks.
Questions for the community
How many different models are you actively using in production? Are you managing multiple keys or using a gateway?
What's your strategy for cost optimization? Do you manually pick models or use dynamic routing?
Has anyone tried building their own router vs using a hosted solution? Curious about the tradeoffs.
I'm still early in this journey. Would love to hear what's working for others.
Happy to share my routing config and cost tracking script if there's interest.
Top comments (0)