I built an AI gateway that picks the right model for every request

#ai #webdev #productivity #opensource

Every AI app has the same problem: you hardcode model: "gpt-4o" and pay frontier prices for "what's the weather?" questions.
I built Styx to fix this. It's an open-source AI gateway where you send "model": "styx:auto" and it picks the right model automatically.

How it works

When your app sends a request to Styx with model: "styx:auto", a 9-signal classifier scores the prompt in real-time:
The 9 signals:

Token count — Short vs long prompts
Code presence — Code blocks, function/class/def keywords
Reasoning patterns — "step by step", "analyze", "compare"
Math markers — "prove", "equation", "calculate"
Technical depth — "refactor", "architecture", "optimize"
Creative scope — "write a story", "design a system"
Conversation depth — Multi-turn conversations
Multimodal hints — References to images, documents
Language detection — Non-English content

Score 0-29 → cheap model (gpt-4o-mini, $0.15/1M)
Score 30-59 → balanced model (gpt-4o, $2.50/1M)
Score 60+ → frontier model (gpt-5.4, $2.50/1M)

The whole thing runs in Go, adds <1ms latency, and the response includes headers telling you exactly what happened:

X-Styx-Auto-Tier: light
X-Styx-Auto-Score: 8
X-Styx-Auto-Model: gpt-4o-mini

Quick start

git clone https://github.com/timmx7/styx && cd styx
./setup.sh          # interactive wizard, no .env editing
docker compose up -d

Then:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"styx:auto","messages":[{"role":"user","content":"Hello"}]}'
# → Routes to gpt-4o-mini (cheap, fast)

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"styx:auto","messages":[{"role":"user","content":"Refactor this codebase to use async/await and add comprehensive error handling step by step"}]}'
# → Routes to gpt-5.4 (frontier)

What else it does

65+ models from OpenAI, Anthropic, Google, Mistral
Auto-failover: OpenAI down? Routes to Anthropic automatically
Dashboard: track every request, cost, latency
BYOK: your keys, your data, self-hosted
MCP-native: connect Claude Code or Cursor in one command
Prices auto-refresh daily from OpenRouter's public API

The real savings

If 80% of your requests are simple (and they usually are), you're saving 90%+ on those by routing to cheap models. Only the 20% complex requests go to frontier. For a SaaS doing 100k requests/month, that's thousands of dollars saved.

GitHub: github.com/timmx7/styx

Would love feedback on the classifier design — especially edge cases you'd want handled differently.