QIS and Knowledge Graphs: Why Semantic Routing Beats Vector Similarity Search at Scale

#distributedsystems #machinelearning #programming #architecture

Part 16 of the Understanding QIS series

Every RAG pipeline in production right now is built on the same assumption: embed your documents, embed your query, find the nearest neighbors, retrieve and generate. It works. At small scale, it works quite well.

But in 2026, the teams hitting real scaling walls are discovering something uncomfortable: cosine similarity in high-dimensional embedding space degrades as N grows. Not gradually — structurally. And the standard fix, layering a knowledge graph on top of a vector store, introduces a different problem: the graph goes stale the moment you stop paying curators to maintain it.

QIS (Quadratic Intelligence Swarm) takes a different path. The routing layer in a QIS network is a knowledge graph — one that builds and updates itself from empirically confirmed outcomes rather than human-defined ontologies or static embedding geometry. The routing mechanism is protocol-agnostic: DHTs, vector databases, graph databases, pub/sub systems, or any efficient lookup layer can carry it. What matters is not the transport but the feedback loop.

This article explains why that distinction matters architecturally, when it matters in practice, and how you can see the pattern in code.

The Curse of Dimensionality Is Not a Myth

The phrase "curse of dimensionality" gets thrown around loosely. In the context of vector similarity search, the specific problem is this: as embedding dimensionality increases, the ratio of the maximum distance to the minimum distance between any two points in a random sample approaches 1. Everything starts to look equally far apart.

More concretely: in a 1,536-dimensional embedding space (OpenAI's text-embedding-3-large), with a corpus of 10 million documents, the top-k cosine similarity results for a given query are often separated by differences in the fourth or fifth decimal place. FAISS and Chroma are engineering marvels that find those neighbors efficiently — but they cannot fix the underlying geometry. You are making high-stakes retrieval decisions on vanishingly small discriminative signals.

The standard workarounds each introduce their own costs:

Dimensionality reduction (PCA, UMAP) trades retrieval precision for geometric stability. You lose the expressiveness of the original embedding.
Re-ranking with a cross-encoder adds a second inference pass that scales O(k × corpus_size) for thorough coverage.
Hybrid search (BM25 + embeddings) improves recall but does not solve the fundamental representational problem.
Knowledge graph + embeddings (Neo4j, LlamaIndex KG) adds relational structure but requires upfront ontology construction and ongoing maintenance to stay accurate.

None of these are wrong approaches. They are the right tools for what they are. The question is whether semantic routing can be solved at a lower level, before retrieval, by routing queries to the nodes most likely to produce confirmed-good outcomes rather than the nodes most geometrically similar in embedding space.

How QIS Outcome-Weighted Routing Handles Semantic Queries Differently

In a QIS network, routing is driven by confirmed outcome feedback rather than static embedding geometry. When a task or query arrives at the network, the routing layer finds the subset of nodes whose confirmed performance on this class of task is highest — not the nodes most geometrically similar in vector space.

The critical word is confirmed. These are not self-reported capability claims. They are not inferred from model weights or training data provenance. They are empirically measured: a node handled tasks of type X and produced outcomes that downstream validators confirmed as correct, useful, or high quality. That signal accumulates in the routing layer and decays over time if the node's performance changes.

Implementations can track per-node domain performance via accuracy vectors — a multi-dimensional profile across task domains, built from outcome history — but this is an enhancement, not a requirement of the base protocol. The base property is simpler: honest outcomes across N(N-1)/2 synthesis paths naturally aggregate toward accurate nodes. Routing weights are one way to express and query that aggregation; any efficient mechanism that routes toward confirmed performance works.

This is semantically richer than embedding lookup in two ways:

1. Domain performance is not the same as semantic similarity. Two documents can be very similar in embedding space (both discuss "transformer attention mechanisms") but route to very different nodes depending on whether the query requires theoretical explanation, implementation debugging, or performance optimization. Confirmed outcome history encodes this distinction. Cosine similarity does not.

2. The routing signal reflects current reality, not historical training. An embedding model trained in 2024 does not know about your production system's performance profile in 2026. A QIS routing layer updated from live outcome feedback does.

The Knowledge Graph That Writes Itself

Here is the architectural insight that connects QIS to knowledge graphs: the confirmed-outcome routing structure that emerges from operation is a dynamic knowledge graph.

In a conventional knowledge graph (Neo4j, a LlamaIndex KG index, a Wikidata-style ontology), nodes represent entities, edges represent relationships, and weights or properties on those edges encode domain knowledge. A human — or a pipeline with human oversight — constructs and maintains that graph. It is accurate at the time of construction and drifts from reality thereafter at a rate proportional to how fast the underlying domain changes and how slowly the curation pipeline runs.

In a QIS network, the equivalent structure emerges from operation:

Nodes are the intelligent agents in the network.
Edges are the routing paths between nodes, weighted by confirmed outcome performance.
Properties are the per-domain performance profiles, updated from every confirmed outcome packet.
Ontology is implicit in the routing topology — if queries of type X consistently route through a particular cluster of nodes, that cluster has implicitly become the "knowledge subgraph" for domain X.

No curator defined this. No ontology engineer specified the edges. The graph constructed itself from accumulated evidence of what worked.

The additional structural property that makes this semantically richer than flat retrieval: N(N-1)/2 synthesis paths.

In a network of N nodes, the number of possible pairwise synthesis combinations is N(N-1)/2. A flat vector retrieval finds the k nearest neighbors and retrieves them independently. QIS routing can activate synthesis across any subset of relevant nodes — combining their outputs into a response that reflects multiple confirmed-accurate perspectives rather than a single best-match retrieval. At N=100 nodes, that is 4,950 potential synthesis paths. At N=1,000, it is 499,500. Flat retrieval does not have this property. A static knowledge graph has it in principle but requires explicit edge construction to enable it in practice.

Comparison: Three Approaches at Scale

Dimension	Vector Similarity Search	Knowledge Graph + Embeddings	QIS Outcome-Weighted Routing
Routing signal	Geometric proximity in embedding space	Ontology edges + embedding similarity	Empirically confirmed domain performance
Staleness	Stale when corpus changes	Stale when ontology drifts from reality	Updates from every confirmed outcome
Discriminative power at scale	Degrades as N grows (curse of dimensionality)	Depends on graph density; brittle at edges	Scales with outcome volume — more data = better routing
Synthesis capability	Retrieve-then-aggregate (post-hoc)	Traverse graph + retrieve	N(N-1)/2 paths, activated per-query
Maintenance overhead	Low (reindex on corpus change)	High (ontology curation required)	Self-maintaining via feedback loop
Cold start behavior	Works immediately with any embedding model	Requires ontology construction before useful	Requires minimum node count and outcome history
Explainability	Cosine score (geometric, not semantic)	Edge traversal path (explicit but may be stale)	Routing trace + outcome history per node

The cold start row is the honest tradeoff: QIS routing requires accumulated outcome history to outperform embedding lookup. Below a threshold of nodes and confirmed outcomes, a well-tuned FAISS index with a cross-encoder re-ranker will outperform QIS routing on retrieval tasks. The advantage compounds as the outcome history grows.

Code Example: Dynamic Semantic Routing vs Static Embedding Lookup

The following example shows the structural difference between building a retrieval layer on static embeddings (FAISS style) and building one on QIS-style outcome-weighted routing. This is a simplified single-process simulation; production QIS routing can run over DHTs, databases, vector search, pub/sub, or any routing mechanism that efficiently queries confirmed performance — the pattern is what matters, not the transport.

import numpy as np
from dataclasses import dataclass, field
from typing import Dict, List, Tuple
from collections import defaultdict

# ── Static embedding lookup (FAISS-style simplified) ──────────────────────────

class StaticEmbeddingIndex:
    """
    Represents a conventional vector similarity retrieval layer.
    Embeddings are fixed at index time. No feedback loop.
    """
    def __init__(self, embedding_dim: int = 1536):
        self.embedding_dim = embedding_dim
        self.nodes: Dict[str, np.ndarray] = {}

    def index_node(self, node_id: str, embedding: np.ndarray):
        self.nodes[node_id] = embedding / np.linalg.norm(embedding)

    def query(self, query_embedding: np.ndarray, top_k: int = 3) -> List[Tuple[str, float]]:
        q = query_embedding / np.linalg.norm(query_embedding)
        scores = {
            node_id: float(np.dot(q, emb))
            for node_id, emb in self.nodes.items()
        }
        return sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_k]


# ── QIS-style outcome-weighted routing ────────────────────────────────────────

@dataclass
class OutcomePacket:
    """
    Confirmed outcome from a completed task.
    This is the feedback signal that updates routing weights.
    """
    node_id: str
    task_domain: str
    quality_score: float          # 0.0–1.0, validated by downstream consumers
    latency_ms: float
    timestamp: float


@dataclass
class NodeAccuracyVector:
    """
    Per-node accuracy profile across task domains.
    Built from accumulated confirmed outcomes — never manually defined.
    """
    node_id: str
    domain_scores: Dict[str, List[float]] = field(default_factory=lambda: defaultdict(list))
    decay_factor: float = 0.95    # older outcomes weighted less

    def ingest_outcome(self, packet: OutcomePacket):
        """Update domain performance from a confirmed outcome packet."""
        self.domain_scores[packet.task_domain].append(packet.quality_score)
        # Keep rolling window — in production this lives in the DHT
        if len(self.domain_scores[packet.task_domain]) > 100:
            self.domain_scores[packet.task_domain].pop(0)

    def get_domain_score(self, domain: str) -> float:
        """
        Return decay-weighted average for this domain.
        Nodes with no confirmed outcomes score 0.0 — honest cold start behavior.
        """
        scores = self.domain_scores.get(domain, [])
        if not scores:
            return 0.0
        weights = np.array([self.decay_factor ** i for i in range(len(scores) - 1, -1, -1)])
        return float(np.average(scores, weights=weights))


class QISSemanticRouter:
    """
    Simplified QIS-style routing layer.
    Routes queries by empirically confirmed domain performance,
    not geometric proximity in embedding space.
    """
    def __init__(self):
        self.nodes: Dict[str, NodeAccuracyVector] = {}

    def register_node(self, node_id: str):
        self.nodes[node_id] = NodeAccuracyVector(node_id=node_id)

    def ingest_outcome(self, packet: OutcomePacket):
        """Feed confirmed outcomes back into the routing layer."""
        if packet.node_id in self.nodes:
            self.nodes[packet.node_id].ingest_outcome(packet)

    def route_query(self, task_domain: str, top_k: int = 3) -> List[Tuple[str, float]]:
        """
        Route a query to the top-k nodes by confirmed domain performance.
        This is the structural difference from cosine similarity retrieval.
        """
        scores = {
            node_id: nav.get_domain_score(task_domain)
            for node_id, nav in self.nodes.items()
        }
        ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_k]
        return ranked

    def synthesis_path_count(self) -> int:
        """N(N-1)/2 potential synthesis paths across all nodes."""
        n = len(self.nodes)
        return n * (n - 1) // 2

    def activate_synthesis(self, task_domain: str, min_score: float = 0.6) -> List[str]:
        """
        Identify all nodes above threshold for multi-node synthesis.
        In production, these nodes collaborate via any coordinating transport
        (DHT, message bus, graph DB, API — the architecture is protocol-agnostic).
        """
        return [
            node_id for node_id, nav in self.nodes.items()
            if nav.get_domain_score(task_domain) >= min_score
        ]


# ── Demonstration ─────────────────────────────────────────────────────────────

import time

if __name__ == "__main__":
    print("=== Static Embedding Index (FAISS-style) ===")
    static_index = StaticEmbeddingIndex(embedding_dim=64)

    # Nodes have fixed embeddings — no feedback loop
    rng = np.random.default_rng(42)
    for node_id in ["node_A", "node_B", "node_C", "node_D"]:
        static_index.index_node(node_id, rng.standard_normal(64))

    query_emb = rng.standard_normal(64)
    results = static_index.query(query_emb, top_k=3)
    print("Top-3 by cosine similarity:", results)
    print("Note: scores are geometric, not semantic. No outcome data used.\n")

    print("=== QIS Outcome-Weighted Router ===")
    router = QISSemanticRouter()
    for node_id in ["node_A", "node_B", "node_C", "node_D"]:
        router.register_node(node_id)

    # Simulate confirmed outcome packets arriving over time
    # node_A and node_C have strong confirmed performance on "code_generation"
    # node_B has strong confirmed performance on "literature_review"
    outcomes = [
        OutcomePacket("node_A", "code_generation", 0.92, 180, time.time()),
        OutcomePacket("node_A", "code_generation", 0.89, 195, time.time()),
        OutcomePacket("node_C", "code_generation", 0.94, 160, time.time()),
        OutcomePacket("node_B", "literature_review", 0.97, 220, time.time()),
        OutcomePacket("node_D", "code_generation", 0.51, 400, time.time()),
        OutcomePacket("node_A", "code_generation", 0.91, 175, time.time()),
    ]
    for packet in outcomes:
        router.ingest_outcome(packet)

    print("Routing 'code_generation' query:")
    routed = router.route_query("code_generation", top_k=3)
    for node_id, score in routed:
        print(f"  {node_id}: confirmed accuracy score = {score:.4f}")

    synthesis_nodes = router.activate_synthesis("code_generation", min_score=0.85)
    print(f"\nNodes activated for multi-node synthesis (score >= 0.85): {synthesis_nodes}")
    print(f"Total synthesis paths available in network: {router.synthesis_path_count()}")
    print("\nKey difference: node_D ranks last despite potentially high embedding similarity.")
    print("Its confirmed outcome quality is low — the feedback loop surfaces this.")

Running this will show node_D ranked last for code_generation despite having been registered alongside the others. A cosine similarity lookup against a random embedding has no mechanism to capture this — it has no outcome history to draw from.

Why the Feedback Loop Is the Architecture

The standard framing of this comparison focuses on components: DHT vs vector store, accuracy vectors vs embeddings, synthesis paths vs top-k retrieval. But the component comparison misses the point.

The architectural breakthrough in QIS is the complete feedback loop: tasks route to nodes, nodes produce outputs, outputs are validated, confirmed outcomes update the routing signal, which changes how future tasks route. This loop runs continuously, without human intervention, and means the routing layer reflects the current performance reality of the network rather than a historical snapshot.

Accuracy vectors and routing weights are one natural implementation of that loop — useful, concrete, and worth building. But they are enhancements. The base property is that honest aggregate outcomes across N(N-1)/2 synthesis paths structurally outweigh inconsistent minority behavior. That property holds regardless of whether you implement it with a DHT, a graph database, a vector index, or a pub/sub system. The mechanism is the loop, not the transport.

In knowledge graph terms: a conventional KG is a snapshot. A QIS routing layer is a live system. The difference is not about which technology is more sophisticated — it is about whether the semantic structure of the system can update itself from evidence.

LangChain and LlamaIndex both offer knowledge graph integrations (LlamaIndex's KnowledgeGraphIndex, LangChain's Neo4j vector store hybrid). These are valuable tools for applications where the ontology is stable and curation is feasible. QIS-style outcome-weighted routing is the appropriate architecture when the domain is dynamic, the scale makes manual curation impractical, or the cost of retrieval errors is high enough to justify building on confirmed performance rather than geometric inference.

Neither approach is universally correct. Both can coexist — a LlamaIndex KG index for structured document retrieval, QIS routing for agent-to-agent task delegation — in a system that uses each where it fits.

What This Means for Engineers Building RAG and Agent Systems in 2026

If you are hitting retrieval quality ceilings with FAISS or Chroma at large corpus sizes, the first question to ask is whether your problem is a retrieval problem or a routing problem. If you need to find the most semantically similar document, vector search is the right tool. If you need to route a task to the agent or subsystem most likely to handle it correctly based on past performance, you need a routing layer that learns from outcomes — and that is what QIS provides.

The knowledge graph framing is useful here: the graph you need already exists in your system's outcome history. QIS makes it explicit and queryable.

QIS was discovered by Christopher Thomas Trevethan on June 16, 2025. 39 provisional patents filed.