Most LLM routing is still traffic control. Here's how to build routers that reason, search, and learn—without touching model weights.
Most LLM routing code you'll find today is traffic control: pick a model, retry on failure, track cost. But the systems behind the router have changed—they reason, act, make mistakes, and adapt. If you're still routing like it's 2022, you're governing an agent with proxy logic.
This post walks through code-shaped patterns for what I call second-half routing: treating the router as a reasoning component that searches, evaluates, and learns—all without touching model weights.
First-Half vs. Second-Half Routing
Most "LLM routing" examples today boil down to:
// classic first-half routing
const model = pickModelBasedOn({
providerHealth,
latency,
costTier,
tenantPolicy,
});
const result = await callModel(model, prompt);
That's first-half routing: traffic control.
Second-half routing treats routing as decision-time compute and collective intelligence orchestration. The router reasons, acts, searches, and learns—without touching model weights.
We'll cover:
- Semantic routing (think-then-decide)
- Strategy trees (search + backtracking)
- Reflexive routing (feedback → memory)
- Cross-LLM collaboration (collective intelligence)
All in a way you can actually implement.
1. Semantic Routing: Think-Then-Decide
Instead of routing purely on system state, second-half routing uses semantic signals:
- Task type (chat, code, retrieval, tools, etc.)
- Domain (legal, medical, casual, internal)
- Difficulty / reasoning depth
- Risk level
- User intent
- Adversarial indicators
You typically introduce a cheap classifier step before picking a path.
1.1 Basic Shape
type TaskAnalysis = {
intent: "chat" | "code" | "summarize" | "search" | "tooling" | "unknown";
domain: "general" | "legal" | "medical" | "financial" | "internal";
difficulty: "low" | "medium" | "high";
risk: "low" | "medium" | "high";
needsRetrieval: boolean;
needsTools: boolean;
};
async function analyzeTask(prompt: string): Promise<TaskAnalysis> {
const analysis = await callSLM("router-analyzer", {
system: "Classify the task and estimate difficulty, risk, and needs.",
user: prompt,
});
return parseAnalysis(analysis);
}
Then use this analysis to choose a strategy, not just a model:
type Strategy =
| "SLM_DIRECT"
| "MID_RAG"
| "LLM_REASONING"
| "LLM_REASONING_WITH_VERIFIER"
| "CLARIFY_THEN_DECIDE";
function pickStrategy(analysis: TaskAnalysis): Strategy {
if (analysis.risk === "high") {
return "LLM_REASONING_WITH_VERIFIER";
}
if (analysis.difficulty === "low" && !analysis.needsRetrieval) {
return "SLM_DIRECT";
}
if (analysis.needsRetrieval) {
return "MID_RAG";
}
if (analysis.difficulty === "high") {
return "LLM_REASONING";
}
return "CLARIFY_THEN_DECIDE";
}
Key point: the router is now doing decision-time compute via an SLM, not just reading system metrics.
2. Strategy Trees: Search + Backtracking
First-half routing:
const model = pickModel(...);
return callModel(model, prompt);
Second-half routing:
- Expands multiple candidate paths
- Evaluates partial results
- Backtracks when needed
2.1 Representing a Strategy Tree
type RouteNode = {
id: string;
description: string;
execute: () => Promise<RouteResult>;
children?: RouteNode[];
};
type RouteResult = {
status: "success" | "uncertain" | "fail";
answer?: string;
cost: number;
qualityEstimate?: number;
trace: any; // logs, intermediate reasoning, tool calls, etc.
};
2.2 Example Strategy Tree
function buildStrategyTree(prompt: string): RouteNode {
return {
id: "root",
description: "Second-half routing root",
execute: async () => ({
status: "uncertain",
cost: 0,
trace: [],
}),
children: [
{
id: "slm-direct",
description: "Cheap SLM direct answer",
execute: () => slmDirectAnswer(prompt),
},
{
id: "mid-rag",
description: "Mid-tier model with retrieval",
execute: () => midTierRAG(prompt),
},
{
id: "llm-verify",
description: "Strong reasoning + verifier",
execute: () => llmWithVerifier(prompt),
},
],
};
}
2.3 Tree Search with Self-Evaluation
async function searchStrategyTree(root: RouteNode): Promise<RouteResult> {
const queue: RouteNode[] = [...(root.children ?? [])];
const evaluated: RouteResult[] = [];
while (queue.length) {
const node = queue.shift()!;
const result = await node.execute();
evaluated.push(result);
if (isGoodEnough(result)) {
return result;
}
// backtracking / expansion logic
if (result.status === "uncertain" && node.children) {
queue.push(...node.children);
}
}
return pickBestUnderConstraints(evaluated);
}
function isGoodEnough(result: RouteResult): boolean {
return (
result.status === "success" &&
(result.qualityEstimate ?? 0) > 0.8 &&
result.cost < COST_BUDGET
);
}
This is Tree of Thoughts applied to routing: explore, evaluate, backtrack.
3. Reflexive Routing: Feedback → Memory → Policy
Reflexion at the routing layer means:
- Capture feedback & signals
- Write language-level reflections
- Feed them into future decisions
3.1 Capturing Feedback
type Feedback = {
routeId: string;
success: boolean;
userRating?: number;
factualityScore?: number;
escalationOccurred?: boolean;
notes?: string;
};
async function recordFeedback(feedback: Feedback) {
await db.insert("routing_feedback", feedback);
}
3.2 Generating a Reflection
Use an SLM or LLM to summarize patterns periodically:
async function reflectOnRoutingHistory(routeId: string) {
const history = await db.query("routing_feedback", { routeId });
const reflection = await callLLM("router-reflector", {
system: `
You are a routing coach.
Look at the failures and successes for this route.
Propose adjustments to strategy selection or thresholds.
`,
user: JSON.stringify(history),
});
return parseReflection(reflection);
}
parseReflection might output:
type Reflection = {
routeId: string;
suggestedChanges: {
newThresholds?: any;
avoidPatterns?: string[];
preferPatterns?: string[];
};
naturalLanguageSummary: string;
};
3.3 Updating Routing Policy
async function applyReflection(reflection: Reflection) {
const policy = await getRoutingPolicy(reflection.routeId);
const updatedPolicy = {
...policy,
thresholds: {
...policy.thresholds,
...reflection.suggestedChanges.newThresholds,
},
avoidPatterns: [
...new Set([
...(policy.avoidPatterns ?? []),
...(reflection.suggestedChanges.avoidPatterns ?? []),
]),
],
};
await saveRoutingPolicy(reflection.routeId, updatedPolicy);
}
This is online learning at the routing layer—without retraining any model.
4. Cross-LLM Collaboration: Collective Intelligence
Instead of "pick one best model," the router orchestrates multiple experts:
- Cheap classifier
- Reasoning model
- Retrieval model
- Verifier model
- Tool-executor
4.1 Defining Experts
type ExpertResult = {
role: string;
output: string;
confidence?: number;
};
async function slmClassifier(prompt: string): Promise<ExpertResult> {
// classify task
}
async function reasoningLLM(prompt: string): Promise<ExpertResult> {
// deep reasoning
}
async function retriever(prompt: string): Promise<ExpertResult> {
// search + retrieve context
}
async function verifier(answer: string): Promise<ExpertResult> {
// verify factuality / consistency
}
4.2 Orchestrating Collaboration
async function orchestrateExperts(prompt: string) {
const [analysis, retrieval] = await Promise.all([
slmClassifier(prompt),
retriever(prompt),
]);
const reasoning = await reasoningLLM(
buildReasoningPrompt(prompt, retrieval.output, analysis.output),
);
const verification = await verifier(reasoning.output);
return aggregateOutputs({
analysis,
retrieval,
reasoning,
verification,
});
}
4.3 Aggregation Logic
type ExpertBundle = {
analysis: ExpertResult;
retrieval: ExpertResult;
reasoning: ExpertResult;
verification: ExpertResult;
};
function aggregateOutputs(bundle: ExpertBundle): RouteResult {
const quality = estimateQuality(bundle);
const cost = estimateCost(bundle);
const status =
bundle.verification.confidence && bundle.verification.confidence < 0.5
? "uncertain"
: "success";
return {
status,
answer: bundle.reasoning.output,
qualityEstimate: quality,
cost,
trace: bundle,
};
}
The router is explicitly acting as coordination logic for a multi-expert system.
5. Putting It Together: A Second-Half Router Skeleton
Here's how the pieces compose:
export async function routeLLMRequest(prompt: string): Promise<RouteResult> {
// 1) Semantic analysis
const analysis = await analyzeTask(prompt);
const strategy = pickStrategy(analysis);
// 2) Build strategy tree
const root = buildStrategyTreeForStrategy(prompt, strategy);
// 3) Search tree (search + backtracking)
const result = await searchStrategyTree(root);
// 4) Collect feedback signals
const feedback: Feedback = {
routeId: strategy,
success: result.status === "success",
userRating: undefined, // fill from external feedback later
factualityScore: result.qualityEstimate,
notes: "",
};
await recordFeedback(feedback);
return result;
}
buildStrategyTreeForStrategy can embed specific orchestration logic (e.g., cross-LLM, verification, retrieval).
6. Design Principles
Routing is now a first-class agent.
Treat the router as a reasoning component, not a pure proxy.
Keep decision-time compute bounded.
You are trading off better behavior vs latency/cost. Make budgets explicit.
Separate policy from mechanism.
Store routing policies in a config/policy store; keep orchestration logic flexible.
Make evaluation multi-dimensional.
Log not just latency/cost, but task success, user feedback, escalation rate, and drift.
Log traces as first-class artifacts.
You need rich traces (strategy, paths explored, expert outputs, verifications) for debugging and reflexion.
7. Where to Go From Here
If you're building any of the following:
- Multi-model "Auto" experiences
- Agent frameworks that call multiple tools/models
- Cost-optimized inference stacks
- Safety-critical LLM apps
…it's time to stop thinking of routing as traffic control and start treating it as an intelligence control plane.
For what this means outside infrastructure—including a family-level application of these patterns—see my Substack post: From One Big Brain to the Family Brain.
*Next up: RouterEval: An Evaluation Harness for LLM Routing Policies
Top comments (0)