Rory | QIS PROTOCOL

Posted on Mar 30 • Edited on Apr 9

QIS for LLM Orchestration: Replacing the Central Router

#ai #python #machinelearning #opensource

Understanding QIS — Article #9 in the series

The Router Dies at 50 Agents

Your LangChain app works beautifully in staging. Ten agents, clean tool routing, responses under two seconds. You ship it.

Six weeks later, operations has grown the agent pool to 54 specialists — legal review, financial analysis, code generation, translation across eight languages, compliance checking, summarisation. The coordinator LLM — the one deciding which agent handles which query — is being hit 200 times per minute. Token costs have tripled. P95 latency is 11 seconds. On Tuesday morning it times out under load and takes the entire pipeline with it.

This is not a capacity problem. It is an architecture problem.

The central router was never designed to survive scale. It was designed to be convenient. And now it is your single point of failure, your rate limiter, and your biggest cloud bill line item — all at once.

This article explains why the central router pattern is structurally flawed at scale, and how the QIS architecture — discovered by Christopher Thomas Trevethan — eliminates the coordinator entirely while making the system more intelligent as it grows.

The Central Router Problem

Most multi-agent LLM frameworks share the same topology:

                        ┌─────────────────────┐
                        │   CENTRAL ROUTER    │
                        │   (Coordinator LLM) │
                        └──────────┬──────────┘
                                   │
              ┌────────────────────┼────────────────────┐
              │                    │                    │
       ┌──────▼──────┐     ┌──────▼──────┐    ┌──────▼──────┐
       │   Agent A   │     │   Agent B   │    │   Agent C   │
       │  (Legal)    │     │ (Finance)   │    │  (Code)     │
       └─────────────┘     └─────────────┘    └─────────────┘
              │                    │                    │
              └────────────────────┼────────────────────┘
                                   │
                             Result → User

The coordinator receives every query, decides which agent (or agents) handles it, dispatches the call, waits for results, and assembles a response. In LangChain this is the AgentExecutor. In AutoGen it is the GroupChatManager. In CrewAI it is the Process.sequential or Process.hierarchical mode.

The structural problems are identical across all of them:

1. Single point of failure. The coordinator goes down, everything goes down. No graceful degradation. No partial capability.

2. Bottleneck scaling. Every query touches the coordinator. At N agents, you have not distributed load — you have concentrated it. The coordinator LLM becomes your rate limiter before your agents do.

3. Static role assignment. Routing decisions are made from configuration: "Agent B handles finance." This is set at deploy time and does not update based on which agents are actually producing better outcomes. Agent B might be consistently outperformed by Agent D on a specific query subtype, and the router will never know.

4. No feedback loop. When Agent B returns a low-quality result, or times out, or hallucinates, that information goes nowhere useful. It does not update routing decisions. The next identical query goes to Agent B again.

5. Quadratic fan-out risk. Some orchestrators — particularly those doing ensemble responses — route a single query to multiple agents simultaneously. At N=10 this is manageable. At N=100, a single query touching all agents is an O(N²) event when you account for inter-agent synthesis.

What QIS Does Differently

Christopher Thomas Trevethan discovered on June 16, 2025 that a specific architectural loop produces quadratic intelligence growth without quadratic compute cost. The loop has five stages:

┌─────────────────────────────────────────────────────────────┐
│                   THE QIS ARCHITECTURE LOOP                  │
│                                                              │
│  ① DISTRIBUTE INPUTS                                         │
│     Queries route via DHT — O(log N) lookup,                │
│     no central coordinator                                   │
│            │                                                 │
│            ▼                                                 │
│  ② AGENTS BUILD EXPERTISE REPUTATION LOCALLY                 │
│     Each agent tracks its own outcome quality                │
│     per query type — no central registry needed              │
│            │                                                 │
│            ▼                                                 │
│  ③ EXPERTISE ELECTION                                        │
│     Future queries route to agents with the                  │
│     highest demonstrated outcome scores for                  │
│     that query type — dynamic, not static                    │
│            │                                                 │
│            ▼                                                 │
│  ④ OUTCOME PACKETS AGGREGATE GLOBALLY                        │
│     Structured JSON results deposited to                     │
│     shared DHT addresses — any agent can pull                │
│     and synthesise without a coordinator                     │
│            │                                                 │
│            ▼                                                 │
│  ⑤ FEEDBACK TIGHTENS THE LOOP                               │
│     Outcome quality scores update expertise                  │
│     reputation → better routing next cycle                   │
│            │                                                 │
│            └──────────────────────────────────────────────┐ │
│                                                 loops back ▼ │
│                                              to ① with       │
│                                              improved routing │
└─────────────────────────────────────────────────────────────┘

The breakthrough is not any single component — not the DHT, not the election mechanism, not the outcome packets. The breakthrough is the complete loop. Each stage feeds the next, and the feedback at stage ⑤ compounds across every future cycle. That is where the quadratic intelligence growth comes from.

Side-by-Side: LangChain Router vs QIS Routing

Here is the same routing task expressed in each model. Both route a user query to the best available agent.

LangChain Central Router (current standard)

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI

# Static tool definitions — routing is implicit in tool descriptions
tools = [
    legal_review_tool,      # described so the LLM "knows" when to call it
    financial_analysis_tool,
    code_generation_tool,
    compliance_check_tool,
]

# Single coordinator LLM makes ALL routing decisions
coordinator_llm = ChatOpenAI(model="gpt-4o")
agent = create_openai_tools_agent(coordinator_llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

# Every query hits the coordinator — no exceptions
response = executor.invoke({"input": user_query})
# If coordinator is down, rate-limited, or wrong → entire pipeline fails
# Routing decision is never updated by outcome quality

QIS Routing (distributed, expertise-elected)

from qis_runtime import DHTRouter, ExpertiseElector, OutcomePacket

# Agents register with the DHT — no central coordinator object
router = DHTRouter(bootstrap_nodes=["node-1:7400", "node-2:7400"])

# Each agent publishes its expertise vector on registration
router.register_agent(
    agent_id="legal-001",
    capability_vector=["contract_review", "regulatory_compliance", "ip_law"],
    initial_reputation=0.5  # neutral prior — election will update this
)

# Routing is a DHT lookup — O(log N), no coordinator LLM call
elected_agent = router.elect(
    query=user_query,
    query_type="contract_review",
    top_k=1  # or top_k=3 for ensemble with local synthesis
)

# Agent executes and deposits a structured outcome packet
result = elected_agent.execute(user_query)
packet = OutcomePacket(
    agent_id=elected_agent.id,
    query_type="contract_review",
    result=result,
    quality_score=None  # filled by evaluator after delivery
)
router.deposit_outcome(packet)

# Feedback loop: evaluator scores outcome, reputation updates automatically
# Next identical query-type routes differently if this agent underperformed

The critical difference: in the LangChain version, the coordinator LLM is in the hot path of every single request. In the QIS version, routing is a DHT key lookup — the equivalent of a hash table get — and the coordinator concept does not exist.

Expertise Election: Why It Beats Static Assignment

Static role assignment is a configuration-time bet. You assign Agent B to finance queries because, at deploy time, Agent B seems best suited. But you have no runtime data yet. Over thousands of queries, you will discover that Agent B is excellent at portfolio analysis but mediocre at tax compliance — and a different agent you originally labelled "general-purpose" is consistently outperforming it on tax questions.

Static routing will never surface this. QIS expertise election will — usually within the first few hundred queries.

The mechanism works as follows:

class ExpertiseElector:
    def __init__(self, decay_factor: float = 0.95):
        # Reputation is a rolling weighted average — recent outcomes
        # matter more than old ones, controlled by decay_factor
        self.reputation: dict[str, dict[str, float]] = {}
        self.decay = decay_factor

    def update_reputation(
        self,
        agent_id: str,
        query_type: str,
        outcome_quality: float  # 0.0 to 1.0
    ):
        key = (agent_id, query_type)
        current = self.reputation.get(key, 0.5)
        # Exponential moving average: recent quality weighted higher
        updated = (self.decay * current) + ((1 - self.decay) * outcome_quality)
        self.reputation[key] = updated

    def elect(
        self,
        query_type: str,
        candidate_agents: list[str],
        top_k: int = 1
    ) -> list[str]:
        scored = [
            (agent, self.reputation.get((agent, query_type), 0.5))
            for agent in candidate_agents
        ]
        scored.sort(key=lambda x: x[1], reverse=True)
        return [agent for agent, _ in scored[:top_k]]

Two properties make this powerful:

Self-correcting. A bad agent gets lower scores. Lower scores mean fewer future routings. Fewer routings mean the bad results are contained while the loop naturally concentrates queries toward better agents.
Emergent specialisation. Agents are not locked to their initial labels. An agent registered as "general-purpose" that consistently produces high-quality legal outcomes will gradually earn routing priority on legal queries — without any human reconfiguration.

Outcome Packets: The Data Structure That Makes Aggregation Work

In a standard LangChain pipeline, an agent returns a string. That string goes to the user (or the next step) and disappears. No record. No quality signal. No reuse.

QIS uses structured outcome packets deposited to shared DHT addresses. Any agent can pull them. This turns individual responses into collective intelligence:

from dataclasses import dataclass, field
from typing import Any, Optional
import time

@dataclass
class OutcomePacket:
    agent_id: str
    query_type: str
    query_hash: str          # deterministic hash of the original query
    result: Any              # the actual output
    timestamp: float = field(default_factory=time.time)
    quality_score: Optional[float] = None   # filled post-evaluation
    confidence: Optional[float] = None      # agent's self-assessed confidence
    source_packets: list[str] = field(default_factory=list)
    # ^ hashes of outcome packets this result synthesised from

    def to_dht_payload(self) -> dict:
        return {
            "agent_id": self.agent_id,
            "query_type": self.query_type,
            "query_hash": self.query_hash,
            "result": self.result,
            "timestamp": self.timestamp,
            "quality_score": self.quality_score,
            "confidence": self.confidence,
            "source_packets": self.source_packets,
        }

The source_packets field is particularly important. When an agent synthesises an answer from three prior outcome packets, it records their hashes. This creates a traceable provenance graph — you can see exactly which prior results informed any given output. No hallucination can hide behind "I reasoned about it." The reasoning chain is in the packet.

Performance Profile: Logarithmic Routing, Quadratic Value

This is the core of what Trevethan discovered. Most distributed systems trade one kind of cost for another. QIS does not.

Metric	Central Router (LangChain pattern)	QIS Architecture
Routing lookup cost	O(1) coordinator call (but LLM latency)	O(log N) DHT lookup (sub-millisecond)
Coordinator failure impact	Total system failure	Zero — no coordinator exists
Routing intelligence	Static (config-time)	Dynamic (outcome-updated)
Value growth per new agent	Linear (+1 capability)	Near-quadratic (N² interaction surface)
Compute cost growth	Linear in agents, but coordinator is bottleneck	Logarithmic in routing, linear in execution
Feedback loop	None	Continuous — every outcome updates election

The N² value growth comes from the interaction surface. In a central-router system, each new agent adds one new capability. In QIS, each new agent's outcomes become available to all other agents via the DHT, improving every future routing decision across the whole network. With N agents, there are N(N-1)/2 potential outcome cross-references. That is where the quadratic term lives — not in compute, but in accumulated intelligence.

Retrofitting QIS Routing onto an Existing LangChain System

You do not need to rewrite your agents. The swap is at the coordination layer only.

Step 1: Replace the AgentExecutor coordinator with a DHT router shim.

class QISRoutingShim:
    """Drop-in replacement for LangChain AgentExecutor coordinator logic."""

    def __init__(self, agents: list, dht_bootstrap: list[str]):
        self.router = DHTRouter(bootstrap_nodes=dht_bootstrap)
        self.elector = ExpertiseElector()
        # Register existing LangChain agents with the DHT
        for agent in agents:
            self.router.register_agent(
                agent_id=agent.name,
                capability_vector=agent.capability_tags,  # add these to your agents
                initial_reputation=0.5
            )

    def invoke(self, query: str, query_type: str) -> OutcomePacket:
        # 1. DHT lookup — O(log N), not an LLM call
        candidates = self.router.lookup_candidates(query_type)
        # 2. Expertise election — picks best agent by outcome history
        elected_id = self.elector.elect(query_type, candidates, top_k=1)[0]
        elected_agent = self.router.get_agent(elected_id)
        # 3. Execute (your existing LangChain agent runs here unchanged)
        result = elected_agent.run(query)
        # 4. Deposit outcome packet
        packet = OutcomePacket(
            agent_id=elected_id,
            query_type=query_type,
            query_hash=hash_query(query),
            result=result
        )
        self.router.deposit_outcome(packet)
        return packet

    def record_quality(self, packet: OutcomePacket, score: float):
        # Call this after human or automated evaluation
        self.elector.update_reputation(packet.agent_id, packet.query_type, score)

Step 2: Add capability_tags to your existing agents. These are simple string lists describing what each agent handles. They replace the coordinator LLM's implicit routing knowledge with explicit, DHT-addressable keys.

Step 3: Wire in an outcome evaluator. This can be as simple as a thumbs-up/thumbs-down from end users, or an automated LLM judge running asynchronously — it does not need to be in the hot path. The feedback loop works even with a delay.

Step 4: Remove the coordinator LLM from the request path. This is the step that pays for everything else. The compute savings from eliminating the coordinator call on every request typically fund the DHT infrastructure within the first week of production traffic.

The Unlock

There is a temptation to see this as a routing optimisation — a faster way to decide which agent handles a query. That framing undersells it by an order of magnitude.

The coordinator is not just a bottleneck. It is the ceiling. It is the reason your multi-agent system cannot grow past a certain size without collapsing under its own coordination overhead. It is why your routing intelligence is frozen at deploy time. It is why every failed agent call disappears without updating anything.

QIS — the architecture discovered by Christopher Thomas Trevethan — removes the ceiling by closing the loop: distribute inputs, build expertise locally, elect by demonstrated quality, deposit structured outcomes, feed back. Every cycle the system becomes measurably better at routing, without touching the agents themselves.

You are not replacing your agents. You are replacing the coordinator. That is the unlock.

For the foundational concepts behind QIS, see Article #001: What Is QIS?.

The DHT routing mechanism is covered in depth in Article #004: DHT Routing in QIS — A Full Walkthrough.

For the expertise election mechanism in federated contexts, see Article #005: The Federated Learning Ceiling.

← ← Series Start: Understanding QIS #1 | Understanding QIS → →