DEV Community

Cover image for Second-Half Routing: From Traffic Control to Collective Intelligence
Narnaiezzsshaa Truong
Narnaiezzsshaa Truong

Posted on

Second-Half Routing: From Traffic Control to Collective Intelligence

Most LLM routing is still traffic control. Here's how to build routers that reason, search, and learn—without touching model weights.


Most LLM routing code you'll find today is traffic control: pick a model, retry on failure, track cost. But the systems behind the router have changed—they reason, act, make mistakes, and adapt. If you're still routing like it's 2022, you're governing an agent with proxy logic.

This post walks through code-shaped patterns for what I call second-half routing: treating the router as a reasoning component that searches, evaluates, and learns—all without touching model weights.


First-Half vs. Second-Half Routing

Most "LLM routing" examples today boil down to:

// classic first-half routing
const model = pickModelBasedOn({
  providerHealth,
  latency,
  costTier,
  tenantPolicy,
});

const result = await callModel(model, prompt);
Enter fullscreen mode Exit fullscreen mode

That's first-half routing: traffic control.

Second-half routing treats routing as decision-time compute and collective intelligence orchestration. The router reasons, acts, searches, and learns—without touching model weights.

We'll cover:

  1. Semantic routing (think-then-decide)
  2. Strategy trees (search + backtracking)
  3. Reflexive routing (feedback → memory)
  4. Cross-LLM collaboration (collective intelligence)

All in a way you can actually implement.


1. Semantic Routing: Think-Then-Decide

Instead of routing purely on system state, second-half routing uses semantic signals:

  • Task type (chat, code, retrieval, tools, etc.)
  • Domain (legal, medical, casual, internal)
  • Difficulty / reasoning depth
  • Risk level
  • User intent
  • Adversarial indicators

You typically introduce a cheap classifier step before picking a path.

1.1 Basic Shape

type TaskAnalysis = {
  intent: "chat" | "code" | "summarize" | "search" | "tooling" | "unknown";
  domain: "general" | "legal" | "medical" | "financial" | "internal";
  difficulty: "low" | "medium" | "high";
  risk: "low" | "medium" | "high";
  needsRetrieval: boolean;
  needsTools: boolean;
};
Enter fullscreen mode Exit fullscreen mode
async function analyzeTask(prompt: string): Promise<TaskAnalysis> {
  const analysis = await callSLM("router-analyzer", {
    system: "Classify the task and estimate difficulty, risk, and needs.",
    user: prompt,
  });

  return parseAnalysis(analysis);
}
Enter fullscreen mode Exit fullscreen mode

Then use this analysis to choose a strategy, not just a model:

type Strategy =
  | "SLM_DIRECT"
  | "MID_RAG"
  | "LLM_REASONING"
  | "LLM_REASONING_WITH_VERIFIER"
  | "CLARIFY_THEN_DECIDE";

function pickStrategy(analysis: TaskAnalysis): Strategy {
  if (analysis.risk === "high") {
    return "LLM_REASONING_WITH_VERIFIER";
  }

  if (analysis.difficulty === "low" && !analysis.needsRetrieval) {
    return "SLM_DIRECT";
  }

  if (analysis.needsRetrieval) {
    return "MID_RAG";
  }

  if (analysis.difficulty === "high") {
    return "LLM_REASONING";
  }

  return "CLARIFY_THEN_DECIDE";
}
Enter fullscreen mode Exit fullscreen mode

Key point: the router is now doing decision-time compute via an SLM, not just reading system metrics.


2. Strategy Trees: Search + Backtracking

First-half routing:

const model = pickModel(...);
return callModel(model, prompt);
Enter fullscreen mode Exit fullscreen mode

Second-half routing:

  • Expands multiple candidate paths
  • Evaluates partial results
  • Backtracks when needed

2.1 Representing a Strategy Tree

type RouteNode = {
  id: string;
  description: string;
  execute: () => Promise<RouteResult>;
  children?: RouteNode[];
};

type RouteResult = {
  status: "success" | "uncertain" | "fail";
  answer?: string;
  cost: number;
  qualityEstimate?: number;
  trace: any; // logs, intermediate reasoning, tool calls, etc.
};
Enter fullscreen mode Exit fullscreen mode

2.2 Example Strategy Tree

function buildStrategyTree(prompt: string): RouteNode {
  return {
    id: "root",
    description: "Second-half routing root",
    execute: async () => ({
      status: "uncertain",
      cost: 0,
      trace: [],
    }),
    children: [
      {
        id: "slm-direct",
        description: "Cheap SLM direct answer",
        execute: () => slmDirectAnswer(prompt),
      },
      {
        id: "mid-rag",
        description: "Mid-tier model with retrieval",
        execute: () => midTierRAG(prompt),
      },
      {
        id: "llm-verify",
        description: "Strong reasoning + verifier",
        execute: () => llmWithVerifier(prompt),
      },
    ],
  };
}
Enter fullscreen mode Exit fullscreen mode

2.3 Tree Search with Self-Evaluation

async function searchStrategyTree(root: RouteNode): Promise<RouteResult> {
  const queue: RouteNode[] = [...(root.children ?? [])];
  const evaluated: RouteResult[] = [];

  while (queue.length) {
    const node = queue.shift()!;
    const result = await node.execute();
    evaluated.push(result);

    if (isGoodEnough(result)) {
      return result;
    }

    // backtracking / expansion logic
    if (result.status === "uncertain" && node.children) {
      queue.push(...node.children);
    }
  }

  return pickBestUnderConstraints(evaluated);
}

function isGoodEnough(result: RouteResult): boolean {
  return (
    result.status === "success" &&
    (result.qualityEstimate ?? 0) > 0.8 &&
    result.cost < COST_BUDGET
  );
}
Enter fullscreen mode Exit fullscreen mode

This is Tree of Thoughts applied to routing: explore, evaluate, backtrack.


3. Reflexive Routing: Feedback → Memory → Policy

Reflexion at the routing layer means:

  1. Capture feedback & signals
  2. Write language-level reflections
  3. Feed them into future decisions

3.1 Capturing Feedback

type Feedback = {
  routeId: string;
  success: boolean;
  userRating?: number;
  factualityScore?: number;
  escalationOccurred?: boolean;
  notes?: string;
};
Enter fullscreen mode Exit fullscreen mode
async function recordFeedback(feedback: Feedback) {
  await db.insert("routing_feedback", feedback);
}
Enter fullscreen mode Exit fullscreen mode

3.2 Generating a Reflection

Use an SLM or LLM to summarize patterns periodically:

async function reflectOnRoutingHistory(routeId: string) {
  const history = await db.query("routing_feedback", { routeId });

  const reflection = await callLLM("router-reflector", {
    system: `
      You are a routing coach.
      Look at the failures and successes for this route.
      Propose adjustments to strategy selection or thresholds.
    `,
    user: JSON.stringify(history),
  });

  return parseReflection(reflection);
}
Enter fullscreen mode Exit fullscreen mode

parseReflection might output:

type Reflection = {
  routeId: string;
  suggestedChanges: {
    newThresholds?: any;
    avoidPatterns?: string[];
    preferPatterns?: string[];
  };
  naturalLanguageSummary: string;
};
Enter fullscreen mode Exit fullscreen mode

3.3 Updating Routing Policy

async function applyReflection(reflection: Reflection) {
  const policy = await getRoutingPolicy(reflection.routeId);

  const updatedPolicy = {
    ...policy,
    thresholds: {
      ...policy.thresholds,
      ...reflection.suggestedChanges.newThresholds,
    },
    avoidPatterns: [
      ...new Set([
        ...(policy.avoidPatterns ?? []),
        ...(reflection.suggestedChanges.avoidPatterns ?? []),
      ]),
    ],
  };

  await saveRoutingPolicy(reflection.routeId, updatedPolicy);
}
Enter fullscreen mode Exit fullscreen mode

This is online learning at the routing layer—without retraining any model.


4. Cross-LLM Collaboration: Collective Intelligence

Instead of "pick one best model," the router orchestrates multiple experts:

  • Cheap classifier
  • Reasoning model
  • Retrieval model
  • Verifier model
  • Tool-executor

4.1 Defining Experts

type ExpertResult = {
  role: string;
  output: string;
  confidence?: number;
};

async function slmClassifier(prompt: string): Promise<ExpertResult> {
  // classify task
}

async function reasoningLLM(prompt: string): Promise<ExpertResult> {
  // deep reasoning
}

async function retriever(prompt: string): Promise<ExpertResult> {
  // search + retrieve context
}

async function verifier(answer: string): Promise<ExpertResult> {
  // verify factuality / consistency
}
Enter fullscreen mode Exit fullscreen mode

4.2 Orchestrating Collaboration

async function orchestrateExperts(prompt: string) {
  const [analysis, retrieval] = await Promise.all([
    slmClassifier(prompt),
    retriever(prompt),
  ]);

  const reasoning = await reasoningLLM(
    buildReasoningPrompt(prompt, retrieval.output, analysis.output),
  );

  const verification = await verifier(reasoning.output);

  return aggregateOutputs({
    analysis,
    retrieval,
    reasoning,
    verification,
  });
}
Enter fullscreen mode Exit fullscreen mode

4.3 Aggregation Logic

type ExpertBundle = {
  analysis: ExpertResult;
  retrieval: ExpertResult;
  reasoning: ExpertResult;
  verification: ExpertResult;
};

function aggregateOutputs(bundle: ExpertBundle): RouteResult {
  const quality = estimateQuality(bundle);
  const cost = estimateCost(bundle);

  const status =
    bundle.verification.confidence && bundle.verification.confidence < 0.5
      ? "uncertain"
      : "success";

  return {
    status,
    answer: bundle.reasoning.output,
    qualityEstimate: quality,
    cost,
    trace: bundle,
  };
}
Enter fullscreen mode Exit fullscreen mode

The router is explicitly acting as coordination logic for a multi-expert system.


5. Putting It Together: A Second-Half Router Skeleton

Here's how the pieces compose:

export async function routeLLMRequest(prompt: string): Promise<RouteResult> {
  // 1) Semantic analysis
  const analysis = await analyzeTask(prompt);
  const strategy = pickStrategy(analysis);

  // 2) Build strategy tree
  const root = buildStrategyTreeForStrategy(prompt, strategy);

  // 3) Search tree (search + backtracking)
  const result = await searchStrategyTree(root);

  // 4) Collect feedback signals
  const feedback: Feedback = {
    routeId: strategy,
    success: result.status === "success",
    userRating: undefined, // fill from external feedback later
    factualityScore: result.qualityEstimate,
    notes: "",
  };
  await recordFeedback(feedback);

  return result;
}
Enter fullscreen mode Exit fullscreen mode

buildStrategyTreeForStrategy can embed specific orchestration logic (e.g., cross-LLM, verification, retrieval).


6. Design Principles

Routing is now a first-class agent.

Treat the router as a reasoning component, not a pure proxy.

Keep decision-time compute bounded.

You are trading off better behavior vs latency/cost. Make budgets explicit.

Separate policy from mechanism.

Store routing policies in a config/policy store; keep orchestration logic flexible.

Make evaluation multi-dimensional.

Log not just latency/cost, but task success, user feedback, escalation rate, and drift.

Log traces as first-class artifacts.

You need rich traces (strategy, paths explored, expert outputs, verifications) for debugging and reflexion.


7. Where to Go From Here

If you're building any of the following:

  • Multi-model "Auto" experiences
  • Agent frameworks that call multiple tools/models
  • Cost-optimized inference stacks
  • Safety-critical LLM apps

…it's time to stop thinking of routing as traffic control and start treating it as an intelligence control plane.


For what this means outside infrastructure—including a family-level application of these patterns—see my Substack post: From One Big Brain to the Family Brain.

*Next up: RouterEval: An Evaluation Harness for LLM Routing Policies

Top comments (0)