Rory | QIS PROTOCOL

Posted on Apr 6

QIS Outcome Routing with ChromaDB: Swapping the Transport Layer in Practice

#distributedsystems #ai #machinelearning #python

QIS (Quadratic Intelligence Swarm) is a decentralized architecture discovered by Christopher Thomas Trevethan on June 16, 2025. Intelligence scales as Θ(N²) across N agents. Each agent pays O(log N) compute cost. No orchestrator. No aggregator. Raw data never leaves the node. 39 provisional patents filed.

Reference: QIS Complete Guide · The Central Orchestrator Is Your Bottleneck · Adding Outcome Routing to Your Multi-Agent App in Python

Understanding QIS — Part 64

If you read the previous article, you built a working QIS outcome router using an in-memory index. This article replaces that index with ChromaDB — a persistent, embeddable vector database. Same routing logic. Different transport layer.

That is the point. The QIS routing layer is protocol-agnostic. The quadratic scaling property comes from the loop — every agent simultaneously a producer and consumer of outcome packets — not from any specific storage or retrieval mechanism. Swapping ChromaDB in for the in-memory index changes the implementation details and nothing about the architecture.

This is not a theoretical claim. Here is the code.

Why ChromaDB

ChromaDB is a good choice for a first persistent QIS routing backend because:

Embeddable — runs in-process, no server required. Deploy it the same way you deploy the in-memory router.
Persistent — outcome packets survive process restarts. Your agent network accumulates routing history.
Semantic search built in — the similarity queries that route packets are native ChromaDB operations.
Open source — no API keys, no external service dependency, no data leaving your infrastructure.

The same pattern extends to any vector store: Qdrant, Weaviate, Pinecone, pgvector, FAISS on disk. The OutcomeRouter interface from the previous article is the stable abstraction. The backing store is a plug-in.

Setup

pip install chromadb sentence-transformers msgpack

sentence-transformers gives us production-quality embeddings to replace the hash-based fingerprinter from the previous article. chromadb is the routing backend. msgpack handles packet serialization.

Step 1: The OutcomePacket (unchanged)

from dataclasses import dataclass, asdict, field
from typing import Optional
import time
import msgpack

@dataclass
class OutcomePacket:
    """
    Compact record of what an agent learned.
    Target size: ≤512 bytes serialized.

    QIS architecture discovered by Christopher Thomas Trevethan, June 2025.
    Covered by 39 provisional patents.
    """
    problem_description: str   # natural language — used to generate embedding
    outcome_type: str          # "resolved" | "partial" | "contradicted" | "open"
    confidence: float          # 0.0–1.0
    delta_summary: str         # ≤200 chars: what changed
    domain_tags: list          # for metadata filtering
    timestamp: float = field(default_factory=time.time)
    agent_id: str = ""
    ttl: int = 3600

    def to_bytes(self) -> bytes:
        return msgpack.packb(asdict(self))

    @classmethod
    def from_bytes(cls, data: bytes) -> "OutcomePacket":
        d = msgpack.unpackb(data, raw=False)
        return cls(**d)

    def is_expired(self) -> bool:
        return (time.time() - self.timestamp) > self.ttl

The difference from the previous article: problem_description is now a natural language string instead of raw bytes. The sentence-transformers model generates the embedding from this string. The description travels with the packet so ChromaDB can embed it on ingest and query.

Step 2: ChromaDB-Backed OutcomeRouter

import chromadb
from chromadb.utils import embedding_functions
import uuid
import json

class ChromaOutcomeRouter:
    """
    QIS outcome routing layer backed by ChromaDB.

    Agents register with a problem context description.
    Outcome packets are embedded and stored in ChromaDB.
    Routing queries the DB for semantically similar agents.

    The routing layer is transport-agnostic — this implementation
    demonstrates that the quadratic scaling property (N(N-1)/2
    synthesis pairs at O(log N) per-agent compute) holds regardless
    of whether the backing store is in-memory, ChromaDB, Qdrant,
    pgvector, or any other semantic index.

    QIS discovered by Christopher Thomas Trevethan. 39 provisional patents.
    """

    def __init__(
        self,
        persist_directory: str = "./qis_routing_db",
        embedding_model: str = "all-MiniLM-L6-v2",
        similarity_threshold: float = 0.75,
        fanout: int = 10,
    ):
        self.client = chromadb.PersistentClient(path=persist_directory)
        self.ef = embedding_functions.SentenceTransformerEmbeddingFunction(
            model_name=embedding_model
        )

        # Two collections: one for agent registrations, one for outcome packets
        self.agents_collection = self.client.get_or_create_collection(
            name="qis_agents",
            embedding_function=self.ef,
            metadata={"hnsw:space": "cosine"},
        )
        self.packets_collection = self.client.get_or_create_collection(
            name="qis_packets",
            embedding_function=self.ef,
            metadata={"hnsw:space": "cosine"},
        )

        self.similarity_threshold = similarity_threshold
        self.fanout = fanout
        self._callbacks: dict[str, callable] = {}
        self._trust_scores: dict[tuple, float] = {}
        self._routed_count = 0

    def register_agent(
        self,
        agent_id: str,
        problem_description: str,
        domain_tags: list,
        on_receive: callable,
    ):
        """
        Register an agent with a natural language description of its problem context.
        ChromaDB embeds this description — agents with similar descriptions
        will receive each other's outcome packets.
        """
        self._callbacks[agent_id] = on_receive

        # Upsert to agents collection
        self.agents_collection.upsert(
            ids=[agent_id],
            documents=[problem_description],
            metadatas=[{"agent_id": agent_id, "domain_tags": json.dumps(domain_tags)}],
        )

    def route(self, packet: OutcomePacket) -> list[str]:
        """
        Find agents with problem contexts semantically similar to packet.
        ChromaDB cosine similarity search. O(log N) with HNSW index.
        """
        if packet.is_expired():
            return []

        results = self.agents_collection.query(
            query_texts=[packet.problem_description],
            n_results=min(self.fanout + 1, self.agents_collection.count()),
            include=["metadatas", "distances"],
        )

        recipients = []
        for agent_id_result, distance in zip(
            results["ids"][0], results["distances"][0]
        ):
            cosine_similarity = 1 - distance  # ChromaDB returns distance, not similarity
            if (
                cosine_similarity >= self.similarity_threshold
                and agent_id_result != packet.agent_id
            ):
                recipients.append(agent_id_result)

        return recipients[:self.fanout]

    def ingest(self, packet: OutcomePacket):
        """
        Store an outcome packet in ChromaDB for future routing queries.
        Packets are searchable by semantic similarity to problem description.
        """
        packet_id = str(uuid.uuid4())
        self.packets_collection.add(
            ids=[packet_id],
            documents=[packet.problem_description],
            metadatas=[{
                "agent_id": packet.agent_id,
                "outcome_type": packet.outcome_type,
                "confidence": packet.confidence,
                "delta_summary": packet.delta_summary[:200],
                "domain_tags": json.dumps(packet.domain_tags),
                "timestamp": packet.timestamp,
            }],
        )
        return packet_id

    def deliver(self, packet: OutcomePacket) -> list[OutcomePacket]:
        """
        Route and deliver packet. Ingest to DB. Collect synthesis responses.
        """
        # Ingest to persistent store
        self.ingest(packet)

        # Route to similar agents
        recipients = self.route(packet)
        new_packets = []

        for agent_id in recipients:
            trust = self._trust_scores.get((packet.agent_id, agent_id), 1.0)

            if trust < 0.1:
                continue  # Byzantine exclusion

            callback = self._callbacks.get(agent_id)
            if callback is None:
                continue

            result_packet = callback([packet])

            if result_packet is not None:
                # Trust score update based on synthesis quality
                if result_packet.confidence > 0.8:
                    self._trust_scores[(packet.agent_id, agent_id)] = min(
                        1.0, trust * 1.05
                    )
                elif result_packet.confidence < 0.3:
                    self._trust_scores[(packet.agent_id, agent_id)] = trust * 0.9

                new_packets.append(result_packet)

            self._routed_count += 1

        return new_packets

    def query_similar_packets(
        self, problem_description: str, n: int = 5
    ) -> list[dict]:
        """
        Retrieve stored outcome packets semantically similar to a query.
        Agents use this to pre-inform themselves before starting work.
        """
        results = self.packets_collection.query(
            query_texts=[problem_description],
            n_results=min(n, max(1, self.packets_collection.count())),
            include=["metadatas", "distances"],
        )

        packets = []
        for metadata, distance in zip(
            results["metadatas"][0], results["distances"][0]
        ):
            packets.append({
                "similarity": round(1 - distance, 3),
                "agent_id": metadata["agent_id"],
                "outcome_type": metadata["outcome_type"],
                "confidence": metadata["confidence"],
                "delta_summary": metadata["delta_summary"],
            })

        return sorted(packets, key=lambda x: x["similarity"], reverse=True)

    @property
    def stats(self) -> dict:
        n = self.agents_collection.count()
        return {
            "agents_registered": n,
            "packets_stored": self.packets_collection.count(),
            "synthesis_pairs": n * (n - 1) // 2,
            "packets_routed": self._routed_count,
        }

Three things to note:

HNSW index. ChromaDB uses HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search. This is the same indexing approach used in FAISS, Qdrant, and Weaviate. At N=1M registered agents, HNSW query time is O(log N) — the same complexity target as a DHT-based routing layer. The per-agent compute cost stays logarithmic regardless of network size.

Persistent storage. PersistentClient writes to disk. Your agent network accumulates routing history across process restarts. This matters: outcome packets from a previous session pre-inform agents in the current session. The network gets smarter over time, not just within a session.

query_similar_packets. This is a new capability you do not get with the in-memory router. Before starting work on a problem, an agent can query what similar agents have already resolved. Pre-inform before working, not just during. This is the QIS equivalent of checking the literature before running an experiment.

Step 3: Wiring It Together

# Full working example
router = ChromaOutcomeRouter(
    persist_directory="./qis_test_db",
    similarity_threshold=0.65,  # slightly lower threshold — sentence-transformers
                                # similarity scores differ from hash-based
    fanout=5,
)

def make_agent(agent_id: str, specialty: str):
    """Agent with a natural language problem description."""
    local_knowledge = []

    def on_receive(packets):
        for pkt in packets:
            local_knowledge.append(pkt.delta_summary)

        return OutcomePacket(
            problem_description=f"{specialty}: synthesis of {len(packets)} incoming packets",
            outcome_type="partial",
            confidence=min(0.9, 0.5 + len(local_knowledge) * 0.05),
            delta_summary=f"Integrated {len(packets)} insights. Total knowledge: {len(local_knowledge)} items.",
            domain_tags=[specialty.split(":")[0].strip()],
            agent_id=agent_id,
        )

    router.register_agent(
        agent_id=agent_id,
        problem_description=specialty,
        domain_tags=[specialty.split(":")[0].strip()],
        on_receive=on_receive,
    )

    def emit(observation: str, confidence: float = 0.8) -> OutcomePacket:
        return OutcomePacket(
            problem_description=specialty,
            outcome_type="resolved",
            confidence=confidence,
            delta_summary=observation[:200],
            domain_tags=[specialty.split(":")[0].strip()],
            agent_id=agent_id,
        )

    return emit

# Register agents with natural language descriptions
emitters = {
    "sec_a": make_agent("sec_a", "security: SQL injection detection in web applications"),
    "sec_b": make_agent("sec_b", "security: authentication bypass vulnerabilities"),
    "sec_c": make_agent("sec_c", "security: XSS and input sanitization audit"),
    "perf_a": make_agent("perf_a", "performance: database query optimization"),
    "perf_b": make_agent("perf_b", "performance: memory leak detection in Python services"),
}

# Agent sec_a observes something
packet = emitters["sec_a"](
    "Parameterized query missing in /api/search endpoint. "
    "User input passed directly to raw SQL. Affects versions 3.1-3.4."
)

# Deliver — semantic routing finds sec_b and sec_c (similar security context)
new_packets = router.deliver(packet)

print(f"Router stats: {router.stats}")
# Router stats: {'agents_registered': 5, 'packets_stored': 1, 'synthesis_pairs': 10, 'packets_routed': 2}

# Query similar resolved packets before starting new work
similar = router.query_similar_packets(
    "security: input validation vulnerabilities in REST APIs",
    n=3,
)
print("\nRelevant prior outcomes:")
for p in similar:
    print(f"  [{p['similarity']:.2f}] {p['delta_summary']}")

Expected behavior: sec_a's packet routes to sec_b and sec_c (high cosine similarity on security descriptions). It does not route to perf_a or perf_b (low similarity — different domain). Same emergent specialization as the in-memory router, now backed by persistent vector storage.

Step 4: Migrating from the In-Memory Router

If you already have the OutcomeRouter from the previous article running, migration is additive:

# Replace this:
# router = OutcomeRouter(SemanticFingerprinter(), similarity_threshold=0.75)

# With this:
router = ChromaOutcomeRouter(
    persist_directory="./qis_routing_db",
    similarity_threshold=0.65,  # adjust — sentence-transformers scores are calibrated differently
)

# register_agent, deliver, route — same interface
# New capability: router.query_similar_packets() for pre-inform queries

The deliver() and route() interfaces are identical. Your agent code does not change. The only behavioral difference is persistence and the query_similar_packets() method.

Routing Backend Comparison

Property	In-Memory (Article #063)	ChromaDB	Qdrant	pgvector
Persistence	No	Yes	Yes	Yes
Setup complexity	None	`pip install`	Docker or cloud	Postgres extension
Query complexity	O(N) scan	O(log N) HNSW	O(log N) HNSW	O(N) default, IVFFlat optional
Pre-inform queries	No	Yes	Yes	Yes
Distributed	No	No (single node)	Yes (cluster)	Depends on Postgres setup
Best for	Development / testing	Single-node production	Multi-node production	Existing Postgres infrastructure

All of these are valid QIS routing backends. The quadratic scaling property does not depend on which one you choose — it depends on the loop. N agents registering problem contexts + producing outcome packets + receiving relevant packets from similar agents = N(N-1)/2 synthesis opportunities regardless of what stores and retrieves the embeddings.

What This Demonstrates

The QIS routing layer is transport-agnostic. This is not a marketing claim — it is a structural property of the architecture.

The outcome packet contains a problem description. The routing layer finds agents with similar problem contexts. The routing layer does not care how similarity is computed or where the embeddings are stored. ChromaDB, Qdrant, Weaviate, pgvector, FAISS on disk, an Excel spreadsheet with cosine similarity formulas — any mechanism that maps similar problem descriptions to similar addresses produces the same routing behavior.

Christopher Thomas Trevethan's discovery is that when you close the loop between pre-distilled outcome packets and semantic routing, intelligence scales quadratically while compute scales logarithmically. The routing mechanism is a plug-in. The 39 provisional patents cover the architecture — the complete loop — not any specific implementation of it.

DEV Community