DEV Community

Sitanshu Kumar
Sitanshu Kumar

Posted on

SynaptoRoute v0.3.0: Matching Semantic Router While Scaling to 50,000 Routes

This is a follow-up to SynaptoRoute: A Study in Local Semantic Routing. If you haven't read it, the short version is: SynaptoRoute is a zero-token semantic routing engine that classifies user queries into intents using local embeddings instead of LLM API calls.


SynaptoRoute v0.3.0: Matching Semantic Router While Scaling to 50,000 Routes

What Changed Since v0.2.0

When I published the first post, SynaptoRoute had just shipped dynamic batching and O(1) hot-reload. The throughput numbers were promising, but the accuracy story was incomplete. I had internal benchmarks but no comparison against a widely adopted baseline under identical, reproducible conditions.

That gap is now closed.

v0.3.0 is live on PyPI:

pip install synaptoroute==0.3.0
Enter fullscreen mode Exit fullscreen mode

The Benchmarking Journey

Getting to these numbers took multiple benchmark revisions.

Early synthetic datasets produced catastrophic accuracy collapse and initially suggested that both SynaptoRoute and Semantic Router were performing poorly. After deeper investigation, the root cause turned out to be flaws in the dataset generation pipeline rather than limitations of the routing engines themselves.

Several rounds of validation, failure analysis, threshold tuning, adversarial testing, and external benchmarking followed. All final results presented in this article come from independent public datasets with strict train/test separation, eliminating dataset leakage and benchmark inflation.

That process was valuable because it forced the project to validate assumptions against real-world data instead of relying on synthetic benchmarks.


The Benchmark That Actually Matters

I evaluated SynaptoRoute against Semantic Router on two standard NLU datasets. Same embedding model (BAAI/bge-small-en-v1.5). Same hardware. Same evaluation script. Same train/test splits loaded from HuggingFace.

CLINC150

150 intents spanning 10 domains, plus an out-of-domain class. This is the standard stress test for intent routers.

Metric SynaptoRoute Semantic Router
Top-1 Accuracy 74.20% 73.35%
Precision 78.53% 74.68%
Recall 86.91% 88.46%
F1 81.34% 80.45%

Banking77

77 highly overlapping intents in a single domain. This dataset punishes routers that cannot distinguish between semantically adjacent queries like "card not working" and "card payment declined."

Metric SynaptoRoute Semantic Router
Top-1 Accuracy 91.81% 91.29%
Precision 91.29% 91.41%
Recall 91.80% 91.28%
F1 91.40% 91.28%

I want to be explicit about what this does and does not prove.

It proves that SynaptoRoute's architecture (Faiss-backed index, SQLite persistence, adaptive threshold fitting) produces classification accuracy that is competitive with the most widely adopted open-source semantic router.

It does not prove that one system is categorically better than the other. Half a percentage point on a single run is within normal benchmark variance. What it does establish is benchmark parity.

Current benchmark results show no evidence of a meaningful accuracy trade-off for SynaptoRoute's architectural advantages.


Scale Numbers

These are not accuracy benchmarks. These are infrastructure stress tests.

Metric Result
Max Routes Tested 50,000
P99 Latency at 50k Routes <50ms
Index Backend Faiss FlatIP (L2-normalized)
Cold Boot (Prebuilt Index Load) 0.45s

At 50,000 routes, the system sustains approximately 302 queries per second on consumer hardware (Ryzen 7, 16GB RAM, no GPU).

The significance of these numbers is not raw accuracy. They demonstrate that routing quality can remain competitive while scaling to route counts that are rarely evaluated in semantic routing systems.


What's Architecturally New

Pluggable Encoders

v0.2.0 was hardcoded to FastEmbed. v0.3.0 introduces a BaseEncoder interface. You can now route through remote embedding endpoints without modifying the core:

from synaptoroute.encoder import OpenAIEncoder

encoder = OpenAIEncoder(model="text-embedding-3-small", dim=1536)
router = AdaptiveRouter(encoder, storage)
Enter fullscreen mode Exit fullscreen mode

The OpenAIEncoder wraps the synchronous OpenAI client in asyncio.to_thread internally, so it does not block the batch worker's event loop.

Distributed State Sync

The biggest limitation I called out in the first post was this:

"The router is intentionally stateful. Different pods may have different local routing matrices."

That's no longer true. v0.3.0 ships a RedisSyncManager that broadcasts route mutations over Redis pub/sub. When one replica adds, updates, or deletes a route, all peers invalidate their local cache and rebuild.

from synaptoroute.sync import RedisSyncManager

sync = RedisSyncManager(redis_url="redis://localhost:6379")
router = AdaptiveRouter(encoder, storage, sync_manager=sync)
Enter fullscreen mode Exit fullscreen mode

This is not a distributed consensus protocol. It is cache invalidation. The source of truth remains SQLite on each node. Redis is the notification bus.

Optimization Profiles

Rather than exposing raw batch sizes and timeout parameters, v0.3.0 introduces named profiles:

from synaptoroute.router import AdaptiveRouter, OptimizationProfile

router = AdaptiveRouter(
    encoder,
    storage,
    profile=OptimizationProfile.THROUGHPUT
)
Enter fullscreen mode Exit fullscreen mode

THROUGHPUT configures larger batch sizes and longer queue drain intervals.

router = AdaptiveRouter(
    encoder,
    storage,
    profile=OptimizationProfile.LATENCY
)
Enter fullscreen mode Exit fullscreen mode

LATENCY bypasses the queue entirely and encodes synchronously for single-query workloads.

Framework Integrations

SynaptoRoute can now be injected directly into LangChain and LlamaIndex pipelines:

from synaptoroute.integrations.langchain import SynaptoRouteTool

tool = SynaptoRouteTool(router=router)
Enter fullscreen mode Exit fullscreen mode

What's Still Missing

I committed in the first post to being direct about limitations. That hasn't changed.

  • Cross-Encoder Reranking: Experimental prototypes have been evaluated and benchmarked but are not yet included in the production package. The current release continues to use a single-pass cosine similarity architecture. Production-grade reranking remains a v0.4.0 objective.
  • GPU Acceleration: The ONNX runtime falls back to CPU on all tested configurations. FastEmbed's CUDA provider requires specific cuDNN versions that are not trivially installable.
  • Multilingual Routing: Not validated. The benchmark model (bge-small-en-v1.5) is English-only. Multilingual routing requires a different embedding model and a separate evaluation.

What We Learned

A few conclusions became clear during benchmarking:

  • Semantic routing remains highly effective on real-world intent classification datasets.
  • Larger embedding models do not automatically produce better routing accuracy.
  • Both SynaptoRoute and Semantic Router struggle with logical reasoning tasks such as negation, double negation, and mixed-intent queries.
  • Most routing failures occur at semantic boundaries where multiple routes are genuinely plausible.
  • Architectural improvements such as batching, indexing, persistence, and state synchronization can significantly improve scalability without sacrificing benchmark accuracy.

The most important takeaway is that scaling semantic routing is primarily an infrastructure problem rather than an LLM problem.


What's Next

The next milestone is independent reproducibility.

The benchmarking work completed for v0.3.0 was performed on local hardware using publicly available datasets and documented evaluation scripts. The next release cycle will focus on building a dedicated benchmarking package that allows anyone to install SynaptoRoute, execute the same evaluations, and generate reproducible benchmark manifests containing:

  • Accuracy metrics
  • Latency metrics
  • Throughput metrics
  • Resource utilization
  • Hardware specifications
  • Software versions
  • Dataset metadata

The goal is simple: make every published benchmark independently verifiable.


Try It

pip install synaptoroute==0.3.0
Enter fullscreen mode Exit fullscreen mode
from synaptoroute import AdaptiveRouter, Route
from synaptoroute.encoder import FastEmbedEncoder
from synaptoroute.storage import SQLiteStorage

encoder = FastEmbedEncoder()
storage = SQLiteStorage("routes.db")
router = AdaptiveRouter(encoder, storage)

router.add_route(Route(
    name="billing",
    utterances=[
        "check my balance",
        "payment history",
        "how much do I owe"
    ],
    threshold=0.5
))

result = router("what's my current balance?")
print(result.name)  # billing
Enter fullscreen mode Exit fullscreen mode

The full benchmark methodology, raw numbers, and reproducibility instructions are documented in docs/BENCHMARKS.md and docs/COMPARISON.md in the repository.

Repository: https://github.com/sitanshukr08/SynaptoRoute

PyPI: https://pypi.org/project/synaptoroute/


If you run the benchmarks on your own hardware, I'd genuinely like to see the results. Open an issue, submit a benchmark manifest, or leave a comment.

Top comments (0)