DEV Community

TJ Sweet
TJ Sweet

Posted on

~1ms hybrid graph + vector queries (network is now the bottleneck)

I finally have benchmark results worth sharing.


TL;DR

  • ~0.6ms p50 — vector search
  • ~1.6ms p50 — vector + 1-hop graph traversal
  • ~6k–15k req/s locally

When deployed remotely:

  • ~110ms p50, which exactly matches network latency

→ The database is fast enough that the network dominates total latency


What was tested

Two query types:

  1. Vector only (embedding similarity, top-k)
  2. Vector + one-hop graph traversal (expand into knowledge graph)

Each run:

  • 800 requests
  • noisy / real-ish text inputs
  • concurrent execution

Local (M3 Max 64GB Native MacOS Installer)

Vector only

  • p50: ~0.58ms
  • p95: ~0.80ms
  • ~15.7k req/s

Vector + graph

  • p50: ~1.6ms
  • p95: ~2.3ms
  • ~6k req/s

Remote (GCP, 8 cores, 32GB RAM)

Client → server latency: ~110ms

Vector only

  • p50: ~110.7ms

Vector + graph

  • p50: ~112.9ms

The delta between local and remote ≈ network RTT.


What’s interesting

  • Adding graph traversal costs ~1ms
  • Latency distribution is tight (low variance)
  • Hybrid queries behave almost like constant-time at small depth

Most systems treat this as:

vector DB + graph DB + glue code

This is:

one execution engine


How this compares (public numbers)

Vector DBs (Pinecone / Weaviate / Qdrant)

  • Typically 5–50ms p50 depending on index + scale
  • Often network + ANN dominates

Neo4j (graph + vector)

  • Graph queries: typically 10–100ms+ depending on traversal
  • Vector is a newer add-on layer

TigerGraph

  • Strong traversal performance (parallelized)
  • Still generally multi-ms to 10s of ms for real queries

Important caveats

  • These are single-node, in-memory-ish conditions
  • Dataset is not at billion-scale (yet)
  • Remote throughput is latency-bound, not compute-bound
  • Found a response consistency bug (fixed next)

What this suggests

If hybrid queries are:

  • ~1–2ms compute
  • +100ms network

Then optimizing the DB further doesn’t matter unless:

  • you colocate compute
  • or batch / pipeline queries

Takeaway

We’re hitting a point where:

hybrid retrieval is cheaper than the network it rides on


Looking for feedback on:

  • deeper traversal benchmarks (2–3 hops)
  • scaling behavior (dataset + concurrency)
  • fair comparisons vs existing systems
  • real-world workloads (RAG, entity resolution, etc.)

If this resonates (or sounds wrong), I’d love to hear why.

Addendum: test setup + external verification

For anyone who wants to reproduce or challenge these numbers: the benchmark used a single-node dataset with 67,280 nodes, 40,921 edges, and 67,298 embeddings indexed with HNSW (CPU-only). Workload was 800 requests/query type, noisy natural-language prompts, concurrent clients, and two query shapes: (1) vector top-k, (2) vector top-k + 1-hop graph expansion over returned entities. Local runs were on an M3 Max locally with the native installer; remote runs were on GCP (8 vCPU, 32GB RAM).

The key observation is straightforward: local compute stayed in low-ms, while remote p50 tracked client↔server RTT (~110ms), so end-to-end latency was network-bound. If you run this yourself, please share p50/p95, dataset size, and hop depth so results are directly comparable.

Item Value
Nodes 67,280
Edges 40,921
Embeddings 67,298
Vector index HNSW, CPU-only
Request count 800 per query type
Query types Vector top-k; Vector top-k + 1-hop traversal

Verification queries (same shape)

# Vector-only (same query shape as benchmark)
curl -s -u "$NORNIC_USERNAME:$NORNIC_PASSWORD" "$ENDPOINT" \
  -H "Content-Type: application/json" -H "Accept: application/json" \
  -d '{
    "statements":[
      {
        "statement":"CALL db.index.vector.queryNodes('\''idx_original_text'\'', $topK, $text) YIELD node, score RETURN node.originalText AS originalText, score ORDER BY score DESC LIMIT $topK",
        "parameters":{"text":"get it delivered","topK":5},
        "resultDataContents":["row"]
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode
# Vector + one-hop graph (same query shape as benchmark)
curl -s -u "$NORNIC_USERNAME:$NORNIC_PASSWORD" "$ENDPOINT" \
  -H "Content-Type: application/json" -H "Accept: application/json" \
  -d '{
    "statements":[
      {
        "statement":"CALL db.index.vector.queryNodes('\''idx_original_text'\'', $topK, $text) YIELD node, score MATCH (node:OriginalText)-[:TRANSLATES_TO]->(t:TranslatedText) WHERE t.language = $targetLang RETURN node.originalText AS originalText, score, t.language AS language, coalesce(t.auditedText, t.translatedText) AS translatedText ORDER BY score DESC, language LIMIT $topK",
        "parameters":{"text":"get it delivered","topK":5,"targetLang":"es"},
        "resultDataContents":["row"]
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

https://github.com/orneryd/NornicDB/releases/tag/v1.0.33

Top comments (0)