Narnaiezzsshaa Truong

Posted on Jan 13

Second-Half Routing: From Traffic Control to Collective Intelligence

#ai #typescript #architecture #machinelearning

Most LLM routing is still traffic control. Here's how to build routers that reason, search, and learn—without touching model weights.

Most LLM routing code you'll find today is traffic control: pick a model, retry on failure, track cost. But the systems behind the router have changed—they reason, act, make mistakes, and adapt. If you're still routing like it's 2022, you're governing an agent with proxy logic.

This post walks through code-shaped patterns for what I call second-half routing: treating the router as a reasoning component that searches, evaluates, and learns—all without touching model weights.

First-Half vs. Second-Half Routing

Most "LLM routing" examples today boil down to:

// classic first-half routing
const model = pickModelBasedOn({
  providerHealth,
  latency,
  costTier,
  tenantPolicy,
});

const result = await callModel(model, prompt);

That's first-half routing: traffic control.

Second-half routing treats routing as decision-time compute and collective intelligence orchestration. The router reasons, acts, searches, and learns—without touching model weights.

We'll cover:

Semantic routing (think-then-decide)
Strategy trees (search + backtracking)
Reflexive routing (feedback → memory)
Cross-LLM collaboration (collective intelligence)

All in a way you can actually implement.

1. Semantic Routing: Think-Then-Decide

Instead of routing purely on system state, second-half routing uses semantic signals:

Task type (chat, code, retrieval, tools, etc.)
Domain (legal, medical, casual, internal)
Difficulty / reasoning depth
Risk level
User intent
Adversarial indicators

You typically introduce a cheap classifier step before picking a path.

1.1 Basic Shape

type TaskAnalysis = {
  intent: "chat" | "code" | "summarize" | "search" | "tooling" | "unknown";
  domain: "general" | "legal" | "medical" | "financial" | "internal";
  difficulty: "low" | "medium" | "high";
  risk: "low" | "medium" | "high";
  needsRetrieval: boolean;
  needsTools: boolean;
};

async function analyzeTask(prompt: string): Promise<TaskAnalysis> {
  const analysis = await callSLM("router-analyzer", {
    system: "Classify the task and estimate difficulty, risk, and needs.",
    user: prompt,
  });

  return parseAnalysis(analysis);
}

Then use this analysis to choose a strategy, not just a model:

type Strategy =
  | "SLM_DIRECT"
  | "MID_RAG"
  | "LLM_REASONING"
  | "LLM_REASONING_WITH_VERIFIER"
  | "CLARIFY_THEN_DECIDE";

function pickStrategy(analysis: TaskAnalysis): Strategy {
  if (analysis.risk === "high") {
    return "LLM_REASONING_WITH_VERIFIER";
  }

  if (analysis.difficulty === "low" && !analysis.needsRetrieval) {
    return "SLM_DIRECT";
  }

  if (analysis.needsRetrieval) {
    return "MID_RAG";
  }

  if (analysis.difficulty === "high") {
    return "LLM_REASONING";
  }

  return "CLARIFY_THEN_DECIDE";
}

Key point: the router is now doing decision-time compute via an SLM, not just reading system metrics.

2. Strategy Trees: Search + Backtracking

First-half routing:

const model = pickModel(...);
return callModel(model, prompt);

Second-half routing:

Expands multiple candidate paths
Evaluates partial results
Backtracks when needed

2.1 Representing a Strategy Tree

type RouteNode = {
  id: string;
  description: string;
  execute: () => Promise<RouteResult>;
  children?: RouteNode[];
};

type RouteResult = {
  status: "success" | "uncertain" | "fail";
  answer?: string;
  cost: number;
  qualityEstimate?: number;
  trace: any; // logs, intermediate reasoning, tool calls, etc.
};

2.2 Example Strategy Tree

function buildStrategyTree(prompt: string): RouteNode {
  return {
    id: "root",
    description: "Second-half routing root",
    execute: async () => ({
      status: "uncertain",
      cost: 0,
      trace: [],
    }),
    children: [
      {
        id: "slm-direct",
        description: "Cheap SLM direct answer",
        execute: () => slmDirectAnswer(prompt),
      },
      {
        id: "mid-rag",
        description: "Mid-tier model with retrieval",
        execute: () => midTierRAG(prompt),
      },
      {
        id: "llm-verify",
        description: "Strong reasoning + verifier",
        execute: () => llmWithVerifier(prompt),
      },
    ],
  };
}

2.3 Tree Search with Self-Evaluation

async function searchStrategyTree(root: RouteNode): Promise<RouteResult> {
  const queue: RouteNode[] = [...(root.children ?? [])];
  const evaluated: RouteResult[] = [];

  while (queue.length) {
    const node = queue.shift()!;
    const result = await node.execute();
    evaluated.push(result);

    if (isGoodEnough(result)) {
      return result;
    }

    // backtracking / expansion logic
    if (result.status === "uncertain" && node.children) {
      queue.push(...node.children);
    }
  }

  return pickBestUnderConstraints(evaluated);
}

function isGoodEnough(result: RouteResult): boolean {
  return (
    result.status === "success" &&
    (result.qualityEstimate ?? 0) > 0.8 &&
    result.cost < COST_BUDGET
  );
}

This is Tree of Thoughts applied to routing: explore, evaluate, backtrack.

3. Reflexive Routing: Feedback → Memory → Policy

Reflexion at the routing layer means:

Capture feedback & signals
Write language-level reflections
Feed them into future decisions

3.1 Capturing Feedback

type Feedback = {
  routeId: string;
  success: boolean;
  userRating?: number;
  factualityScore?: number;
  escalationOccurred?: boolean;
  notes?: string;
};

async function recordFeedback(feedback: Feedback) {
  await db.insert("routing_feedback", feedback);
}

3.2 Generating a Reflection

Use an SLM or LLM to summarize patterns periodically:

async function reflectOnRoutingHistory(routeId: string) {
  const history = await db.query("routing_feedback", { routeId });

  const reflection = await callLLM("router-reflector", {
    system: `
      You are a routing coach.
      Look at the failures and successes for this route.
      Propose adjustments to strategy selection or thresholds.
    `,
    user: JSON.stringify(history),
  });

  return parseReflection(reflection);
}

parseReflection might output:

type Reflection = {
  routeId: string;
  suggestedChanges: {
    newThresholds?: any;
    avoidPatterns?: string[];
    preferPatterns?: string[];
  };
  naturalLanguageSummary: string;
};

3.3 Updating Routing Policy

async function applyReflection(reflection: Reflection) {
  const policy = await getRoutingPolicy(reflection.routeId);

  const updatedPolicy = {
    ...policy,
    thresholds: {
      ...policy.thresholds,
      ...reflection.suggestedChanges.newThresholds,
    },
    avoidPatterns: [
      ...new Set([
        ...(policy.avoidPatterns ?? []),
        ...(reflection.suggestedChanges.avoidPatterns ?? []),
      ]),
    ],
  };

  await saveRoutingPolicy(reflection.routeId, updatedPolicy);
}

This is online learning at the routing layer—without retraining any model.

4. Cross-LLM Collaboration: Collective Intelligence

Instead of "pick one best model," the router orchestrates multiple experts:

Cheap classifier
Reasoning model
Retrieval model
Verifier model
Tool-executor

4.1 Defining Experts

type ExpertResult = {
  role: string;
  output: string;
  confidence?: number;
};

async function slmClassifier(prompt: string): Promise<ExpertResult> {
  // classify task
}

async function reasoningLLM(prompt: string): Promise<ExpertResult> {
  // deep reasoning
}

async function retriever(prompt: string): Promise<ExpertResult> {
  // search + retrieve context
}

async function verifier(answer: string): Promise<ExpertResult> {
  // verify factuality / consistency
}

4.2 Orchestrating Collaboration

async function orchestrateExperts(prompt: string) {
  const [analysis, retrieval] = await Promise.all([
    slmClassifier(prompt),
    retriever(prompt),
  ]);

  const reasoning = await reasoningLLM(
    buildReasoningPrompt(prompt, retrieval.output, analysis.output),
  );

  const verification = await verifier(reasoning.output);

  return aggregateOutputs({
    analysis,
    retrieval,
    reasoning,
    verification,
  });
}

4.3 Aggregation Logic

type ExpertBundle = {
  analysis: ExpertResult;
  retrieval: ExpertResult;
  reasoning: ExpertResult;
  verification: ExpertResult;
};

function aggregateOutputs(bundle: ExpertBundle): RouteResult {
  const quality = estimateQuality(bundle);
  const cost = estimateCost(bundle);

  const status =
    bundle.verification.confidence && bundle.verification.confidence < 0.5
      ? "uncertain"
      : "success";

  return {
    status,
    answer: bundle.reasoning.output,
    qualityEstimate: quality,
    cost,
    trace: bundle,
  };
}

The router is explicitly acting as coordination logic for a multi-expert system.

5. Putting It Together: A Second-Half Router Skeleton

Here's how the pieces compose:

export async function routeLLMRequest(prompt: string): Promise<RouteResult> {
  // 1) Semantic analysis
  const analysis = await analyzeTask(prompt);
  const strategy = pickStrategy(analysis);

  // 2) Build strategy tree
  const root = buildStrategyTreeForStrategy(prompt, strategy);

  // 3) Search tree (search + backtracking)
  const result = await searchStrategyTree(root);

  // 4) Collect feedback signals
  const feedback: Feedback = {
    routeId: strategy,
    success: result.status === "success",
    userRating: undefined, // fill from external feedback later
    factualityScore: result.qualityEstimate,
    notes: "",
  };
  await recordFeedback(feedback);

  return result;
}

buildStrategyTreeForStrategy can embed specific orchestration logic (e.g., cross-LLM, verification, retrieval).

6. Design Principles

Routing is now a first-class agent.

Treat the router as a reasoning component, not a pure proxy.

Keep decision-time compute bounded.

You are trading off better behavior vs latency/cost. Make budgets explicit.

Separate policy from mechanism.

Store routing policies in a config/policy store; keep orchestration logic flexible.

Make evaluation multi-dimensional.

Log not just latency/cost, but task success, user feedback, escalation rate, and drift.

Log traces as first-class artifacts.

You need rich traces (strategy, paths explored, expert outputs, verifications) for debugging and reflexion.

7. Where to Go From Here

If you're building any of the following:

Multi-model "Auto" experiences
Agent frameworks that call multiple tools/models
Cost-optimized inference stacks
Safety-critical LLM apps

…it's time to stop thinking of routing as traffic control and start treating it as an intelligence control plane.

For what this means outside infrastructure—including a family-level application of these patterns—see my Substack post: From One Big Brain to the Family Brain.

*Next up: RouterEval: An Evaluation Harness for LLM Routing Policies

DEV Community