Not every coding task needs Opus. Bernstein's contextual bandit router learns which model handles each task type best, then routes accordingly. In our own runs, the bandit router cut spend roughly in half compared to uniform model selection. Measure yours with bernstein cost.
The uniform selection problem
Most multi-agent setups use the same model for everything. Every task — whether it's renaming a variable or designing an authentication system — gets routed to the same model at the same effort level. This is wasteful. A docs task that writes a docstring doesn't need the same model as a security task that implements credential scoping.
The cost difference is real. At current API pricing, routing a simple task to Haiku instead of Opus costs roughly 30x less. Over a session with 40-60 tasks, that adds up fast.
How the router works
Bernstein's routing pipeline has three layers:
Layer 1: Heuristic classification. Every task has a complexity field (low, medium, high) and a role (backend, frontend, qa, security, etc.). The router uses a rule-based classifier to make an initial model/effort assignment. Low-complexity tasks default to Haiku or Sonnet with standard effort. High-complexity tasks get Opus with max effort.
Layer 2: Epsilon-greedy bandit. This is where it gets interesting. The bandit maintains per-role reward estimates for each model. When a task arrives, it exploits the best-known model 80% of the time and explores alternatives 20% of the time. Rewards come from task outcomes: did the agent complete the task? Did tests pass? How many retries were needed?
# Simplified selection logic
candidates = ["sonnet", "opus"] if task.complexity == "high" else CASCADE
selected = bandit.select(role=task.role, candidate_models=candidates)
The CASCADE list includes all available models from cheapest to most capable. For high-complexity tasks, the bandit only considers Sonnet and Opus — sending a hard architecture task to Haiku would waste the agent's time even if it's cheap.
Layer 3: Effectiveness seeding. The bandit warms up using historical effectiveness data from the .sdd/metrics/ directory. If a previous run showed that backend tasks succeed 95% of the time with Sonnet but only 70% with Haiku, the bandit starts with that prior. No cold-start problem after the first session.
What the router learns
After a few sessions, clear patterns emerge:
| Task type | Typical model | Why |
|---|---|---|
| Docs, docstrings | Haiku | Templated output, low reasoning |
| Test writing | Sonnet | Needs code understanding, not creativity |
| Bug fixes | Sonnet | Pattern matching on error traces |
| Refactoring | Sonnet/Opus | Depends on scope |
| Architecture, security | Opus | Requires deep reasoning |
These aren't hardcoded rules — they're learned from outcomes. If your codebase has unusually complex tests, the bandit will learn to route test tasks to a stronger model.
Configuration
The bandit is enabled by default when a metrics directory exists. You can tune exploration rate and model cascade in your config:
# .sdd/config.yaml
routing:
bandit_epsilon: 0.2 # 20% exploration
cascade: [haiku, sonnet, opus]
min_samples_per_arm: 5 # explore each option at least 5 times
To disable bandit routing and use pure heuristics:
routing:
bandit_enabled: false
The numbers
Across our internal runs (self-development sessions where Bernstein improves its own codebase), the bandit router cut per-session spend roughly in half compared to the baseline of Sonnet-for-everything. Task completion rates stayed within a couple of percentage points, so cheaper models handle their assigned tasks fine. Measure your own runs with bernstein cost.
The savings compound. A 10-agent session running 50 tasks might cost $15-20 with uniform Sonnet. With bandit routing, the same session runs $7-10. Over weeks of iterative development, that's the difference between a side project budget and a real expense.
Further reading
- Architecture overview for how routing fits into the orchestration pipeline
- Getting started to try it yourself
- Source code for the full router implementation
Top comments (0)