DEV Community

Cover image for I built an AI gateway that picks the right model for every request
Styx 7
Styx 7

Posted on

I built an AI gateway that picks the right model for every request

Every AI app has the same problem: you hardcode model: "gpt-4o" and pay frontier prices for "what's the weather?" questions.
I built Styx to fix this. It's an open-source AI gateway where you send "model": "styx:auto" and it picks the right model automatically.

How it works

When your app sends a request to Styx with model: "styx:auto", a 9-signal classifier scores the prompt in real-time:
The 9 signals:

  • Token count — Short vs long prompts
  • Code presence — Code blocks, function/class/def keywords
  • Reasoning patterns — "step by step", "analyze", "compare"
  • Math markers — "prove", "equation", "calculate"
  • Technical depth — "refactor", "architecture", "optimize"
  • Creative scope — "write a story", "design a system"
  • Conversation depth — Multi-turn conversations
  • Multimodal hints — References to images, documents
  • Language detection — Non-English content

Score 0-29 → cheap model (gpt-4o-mini, $0.15/1M)
Score 30-59 → balanced model (gpt-4o, $2.50/1M)
Score 60+ → frontier model (gpt-5.4, $2.50/1M)

The whole thing runs in Go, adds <1ms latency, and the response includes headers telling you exactly what happened:

X-Styx-Auto-Tier: light
X-Styx-Auto-Score: 8
X-Styx-Auto-Model: gpt-4o-mini
Enter fullscreen mode Exit fullscreen mode

Quick start

git clone https://github.com/timmx7/styx && cd styx
./setup.sh          # interactive wizard, no .env editing
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Then:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"styx:auto","messages":[{"role":"user","content":"Hello"}]}'
# → Routes to gpt-4o-mini (cheap, fast)
Enter fullscreen mode Exit fullscreen mode
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"styx:auto","messages":[{"role":"user","content":"Refactor this codebase to use async/await and add comprehensive error handling step by step"}]}'
# → Routes to gpt-5.4 (frontier)
Enter fullscreen mode Exit fullscreen mode

What else it does

  • 65+ models from OpenAI, Anthropic, Google, Mistral
  • Auto-failover: OpenAI down? Routes to Anthropic automatically
  • Dashboard: track every request, cost, latency
  • BYOK: your keys, your data, self-hosted
  • MCP-native: connect Claude Code or Cursor in one command
  • Prices auto-refresh daily from OpenRouter's public API

The real savings

If 80% of your requests are simple (and they usually are), you're saving 90%+ on those by routing to cheap models. Only the 20% complex requests go to frontier. For a SaaS doing 100k requests/month, that's thousands of dollars saved.

GitHub: github.com/timmx7/styx

Would love feedback on the classifier design — especially edge cases you'd want handled differently.

Top comments (0)