I Let a Router Pick My LLM for Every Task. Here Is What Went Wrong (and Right)

#ai #llm #openclaw #devtools

After manually switching between 4 different LLMs for a month, I automated the selection. A routing gateway now classifies each task and picks the model. Here is an honest report after 3 weeks.

What Went Right

Cost dropped 40%. The router is more disciplined than I am. I default to Sonnet out of habit; the router sends 60% of tasks to DeepSeek where they belong.

Rate limits disappeared. Requests spread across 4 providers. No single one sees enough traffic to throttle me.

Mental load gone. I stopped spending 2 seconds per prompt thinking "is this worth Sonnet or should I use DeepSeek?" Multiply by 50+ tasks per day and that adds up.

What Went Wrong

5% misroutes. About 1 in 20 tasks gets sent to a cheaper model when I would have preferred Sonnet. Usually for edge cases the classifier does not handle well — like a "simple" refactor that actually touches 6 files.

Context does not carry across providers. If DeepSeek handles task A and then Sonnet handles task B, Sonnet does not see what DeepSeek did. I work around this by starting fresh sessions per task, but it is a real limitation.

Debugging the router itself. When output quality drops, is it the model or the routing? Took me a while to figure out the right debugging workflow.

The Setup

I use TeamoRouter — one API key for Claude, GPT-4o, Gemini, DeepSeek, Kimi, MiniMax. The teamo-balanced mode handles the auto-selection. Installs in OpenClaw in 2 seconds.

Routing modes:

teamo-best: always quality-first
teamo-balanced: auto-select (my default)
teamo-eco: always cheapest

Would I Go Back to Manual Selection?

No. The 5% misroute rate is annoying but the 40% cost savings and zero rate limits make it worth it. I just override to teamo-best when I know the task is complex.

Discord where we share routing configs and debug routing issues together.