TJ Sweet

Posted on Mar 26

~1ms hybrid graph + vector queries (network is now the bottleneck)

#ai #database #performance #rag

I finally have benchmark results worth sharing.

TL;DR

~0.6ms p50 — vector search
~1.6ms p50 — vector + 1-hop graph traversal
~6k–15k req/s locally

When deployed remotely:

~110ms p50, which exactly matches network latency

→ The database is fast enough that the network dominates total latency

What was tested

Two query types:

Vector only (embedding similarity, top-k)
Vector + one-hop graph traversal (expand into knowledge graph)

Each run:

800 requests
noisy / real-ish text inputs
concurrent execution

Local (M3 Max 64GB Native MacOS Installer)

Vector only

p50: ~0.58ms
p95: ~0.80ms
~15.7k req/s

Vector + graph

p50: ~1.6ms
p95: ~2.3ms
~6k req/s

Remote (GCP, 8 cores, 32GB RAM)

Client → server latency: ~110ms

Vector only

p50: ~110.7ms

Vector + graph

p50: ~112.9ms

The delta between local and remote ≈ network RTT.

What’s interesting

Adding graph traversal costs ~1ms
Latency distribution is tight (low variance)
Hybrid queries behave almost like constant-time at small depth

Most systems treat this as:

vector DB + graph DB + glue code

This is:

one execution engine

How this compares (public numbers)

Vector DBs (Pinecone / Weaviate / Qdrant)

Typically 5–50ms p50 depending on index + scale
Often network + ANN dominates

Neo4j (graph + vector)

Graph queries: typically 10–100ms+ depending on traversal
Vector is a newer add-on layer

TigerGraph

Strong traversal performance (parallelized)
Still generally multi-ms to 10s of ms for real queries

Important caveats

These are single-node, in-memory-ish conditions
Dataset is not at billion-scale (yet)
Remote throughput is latency-bound, not compute-bound
Found a response consistency bug (fixed next)

What this suggests

If hybrid queries are:

~1–2ms compute
+100ms network

Then optimizing the DB further doesn’t matter unless:

you colocate compute
or batch / pipeline queries

Takeaway

We’re hitting a point where:

hybrid retrieval is cheaper than the network it rides on

Looking for feedback on:

deeper traversal benchmarks (2–3 hops)
scaling behavior (dataset + concurrency)
fair comparisons vs existing systems
real-world workloads (RAG, entity resolution, etc.)

If this resonates (or sounds wrong), I’d love to hear why.

Addendum: test setup + external verification

For anyone who wants to reproduce or challenge these numbers: the benchmark used a single-node dataset with 67,280 nodes, 40,921 edges, and 67,298 embeddings indexed with HNSW (CPU-only). Workload was 800 requests/query type, noisy natural-language prompts, concurrent clients, and two query shapes: (1) vector top-k, (2) vector top-k + 1-hop graph expansion over returned entities. Local runs were on an M3 Max locally with the native installer; remote runs were on GCP (8 vCPU, 32GB RAM).

The key observation is straightforward: local compute stayed in low-ms, while remote p50 tracked client↔server RTT (~110ms), so end-to-end latency was network-bound. If you run this yourself, please share p50/p95, dataset size, and hop depth so results are directly comparable.

Item	Value
Nodes	67,280
Edges	40,921
Embeddings	67,298
Vector index	HNSW, CPU-only
Request count	800 per query type
Query types	Vector top-k; Vector top-k + 1-hop traversal

Verification queries (same shape)

# Vector-only (same query shape as benchmark)
curl -s -u "$NORNIC_USERNAME:$NORNIC_PASSWORD" "$ENDPOINT" \
  -H "Content-Type: application/json" -H "Accept: application/json" \
  -d '{
    "statements":[
      {
        "statement":"CALL db.index.vector.queryNodes('\''idx_original_text'\'', $topK, $text) YIELD node, score RETURN node.originalText AS originalText, score ORDER BY score DESC LIMIT $topK",
        "parameters":{"text":"get it delivered","topK":5},
        "resultDataContents":["row"]
      }
    ]
  }'

# Vector + one-hop graph (same query shape as benchmark)
curl -s -u "$NORNIC_USERNAME:$NORNIC_PASSWORD" "$ENDPOINT" \
  -H "Content-Type: application/json" -H "Accept: application/json" \
  -d '{
    "statements":[
      {
        "statement":"CALL db.index.vector.queryNodes('\''idx_original_text'\'', $topK, $text) YIELD node, score MATCH (node:OriginalText)-[:TRANSLATES_TO]->(t:TranslatedText) WHERE t.language = $targetLang RETURN node.originalText AS originalText, score, t.language AS language, coalesce(t.auditedText, t.translatedText) AS translatedText ORDER BY score DESC, language LIMIT $topK",
        "parameters":{"text":"get it delivered","topK":5,"targetLang":"es"},
        "resultDataContents":["row"]
      }
    ]
  }'