Every AI app has the same problem: you hardcode model: "gpt-4o" and pay frontier prices for "what's the weather?" questions.
I built Styx to fix this. It's an open-source AI gateway where you send "model": "styx:auto" and it picks the right model automatically.
How it works
When your app sends a request to Styx with model: "styx:auto", a 9-signal classifier scores the prompt in real-time:
The 9 signals:
- Token count — Short vs long prompts
- Code presence — Code blocks, function/class/def keywords
- Reasoning patterns — "step by step", "analyze", "compare"
- Math markers — "prove", "equation", "calculate"
- Technical depth — "refactor", "architecture", "optimize"
- Creative scope — "write a story", "design a system"
- Conversation depth — Multi-turn conversations
- Multimodal hints — References to images, documents
- Language detection — Non-English content
Score 0-29 → cheap model (gpt-4o-mini, $0.15/1M)
Score 30-59 → balanced model (gpt-4o, $2.50/1M)
Score 60+ → frontier model (gpt-5.4, $2.50/1M)
The whole thing runs in Go, adds <1ms latency, and the response includes headers telling you exactly what happened:
X-Styx-Auto-Tier: light
X-Styx-Auto-Score: 8
X-Styx-Auto-Model: gpt-4o-mini
Quick start
git clone https://github.com/timmx7/styx && cd styx
./setup.sh # interactive wizard, no .env editing
docker compose up -d
Then:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"styx:auto","messages":[{"role":"user","content":"Hello"}]}'
# → Routes to gpt-4o-mini (cheap, fast)
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"styx:auto","messages":[{"role":"user","content":"Refactor this codebase to use async/await and add comprehensive error handling step by step"}]}'
# → Routes to gpt-5.4 (frontier)
What else it does
- 65+ models from OpenAI, Anthropic, Google, Mistral
- Auto-failover: OpenAI down? Routes to Anthropic automatically
- Dashboard: track every request, cost, latency
- BYOK: your keys, your data, self-hosted
- MCP-native: connect Claude Code or Cursor in one command
- Prices auto-refresh daily from OpenRouter's public API
The real savings
If 80% of your requests are simple (and they usually are), you're saving 90%+ on those by routing to cheap models. Only the 20% complex requests go to frontier. For a SaaS doing 100k requests/month, that's thousands of dollars saved.
GitHub: github.com/timmx7/styx
Would love feedback on the classifier design — especially edge cases you'd want handled differently.
Top comments (0)