I finally have benchmark results worth sharing.
TL;DR
- ~0.6ms p50 — vector search
- ~1.6ms p50 — vector + 1-hop graph traversal
- ~6k–15k req/s locally
When deployed remotely:
- ~110ms p50, which exactly matches network latency
→ The database is fast enough that the network dominates total latency
What was tested
Two query types:
- Vector only (embedding similarity, top-k)
- Vector + one-hop graph traversal (expand into knowledge graph)
Each run:
- 800 requests
- noisy / real-ish text inputs
- concurrent execution
Local (M3 Max 64GB Native MacOS Installer)
Vector only
- p50: ~0.58ms
- p95: ~0.80ms
- ~15.7k req/s
Vector + graph
- p50: ~1.6ms
- p95: ~2.3ms
- ~6k req/s
Remote (GCP, 8 cores, 32GB RAM)
Client → server latency: ~110ms
Vector only
- p50: ~110.7ms
Vector + graph
- p50: ~112.9ms
The delta between local and remote ≈ network RTT.
What’s interesting
- Adding graph traversal costs ~1ms
- Latency distribution is tight (low variance)
- Hybrid queries behave almost like constant-time at small depth
Most systems treat this as:
vector DB + graph DB + glue code
This is:
one execution engine
How this compares (public numbers)
Vector DBs (Pinecone / Weaviate / Qdrant)
- Typically 5–50ms p50 depending on index + scale
- Often network + ANN dominates
Neo4j (graph + vector)
- Graph queries: typically 10–100ms+ depending on traversal
- Vector is a newer add-on layer
TigerGraph
- Strong traversal performance (parallelized)
- Still generally multi-ms to 10s of ms for real queries
Important caveats
- These are single-node, in-memory-ish conditions
- Dataset is not at billion-scale (yet)
- Remote throughput is latency-bound, not compute-bound
- Found a response consistency bug (fixed next)
What this suggests
If hybrid queries are:
- ~1–2ms compute
- +100ms network
Then optimizing the DB further doesn’t matter unless:
- you colocate compute
- or batch / pipeline queries
Takeaway
We’re hitting a point where:
hybrid retrieval is cheaper than the network it rides on
Looking for feedback on:
- deeper traversal benchmarks (2–3 hops)
- scaling behavior (dataset + concurrency)
- fair comparisons vs existing systems
- real-world workloads (RAG, entity resolution, etc.)
If this resonates (or sounds wrong), I’d love to hear why.
Addendum: test setup + external verification
For anyone who wants to reproduce or challenge these numbers: the benchmark used a single-node dataset with 67,280 nodes, 40,921 edges, and 67,298 embeddings indexed with HNSW (CPU-only). Workload was 800 requests/query type, noisy natural-language prompts, concurrent clients, and two query shapes: (1) vector top-k, (2) vector top-k + 1-hop graph expansion over returned entities. Local runs were on an M3 Max locally with the native installer; remote runs were on GCP (8 vCPU, 32GB RAM).
The key observation is straightforward: local compute stayed in low-ms, while remote p50 tracked client↔server RTT (~110ms), so end-to-end latency was network-bound. If you run this yourself, please share p50/p95, dataset size, and hop depth so results are directly comparable.
| Item | Value |
|---|---|
| Nodes | 67,280 |
| Edges | 40,921 |
| Embeddings | 67,298 |
| Vector index | HNSW, CPU-only |
| Request count | 800 per query type |
| Query types | Vector top-k; Vector top-k + 1-hop traversal |
Verification queries (same shape)
# Vector-only (same query shape as benchmark)
curl -s -u "$NORNIC_USERNAME:$NORNIC_PASSWORD" "$ENDPOINT" \
-H "Content-Type: application/json" -H "Accept: application/json" \
-d '{
"statements":[
{
"statement":"CALL db.index.vector.queryNodes('\''idx_original_text'\'', $topK, $text) YIELD node, score RETURN node.originalText AS originalText, score ORDER BY score DESC LIMIT $topK",
"parameters":{"text":"get it delivered","topK":5},
"resultDataContents":["row"]
}
]
}'
# Vector + one-hop graph (same query shape as benchmark)
curl -s -u "$NORNIC_USERNAME:$NORNIC_PASSWORD" "$ENDPOINT" \
-H "Content-Type: application/json" -H "Accept: application/json" \
-d '{
"statements":[
{
"statement":"CALL db.index.vector.queryNodes('\''idx_original_text'\'', $topK, $text) YIELD node, score MATCH (node:OriginalText)-[:TRANSLATES_TO]->(t:TranslatedText) WHERE t.language = $targetLang RETURN node.originalText AS originalText, score, t.language AS language, coalesce(t.auditedText, t.translatedText) AS translatedText ORDER BY score DESC, language LIMIT $topK",
"parameters":{"text":"get it delivered","topK":5,"targetLang":"es"},
"resultDataContents":["row"]
}
]
}'
Top comments (0)