SIMDVec Supercharges Vector Hunt

#algorithms #architecture #database #performance

Elasticsearch simdvec is the engine behind every vector distance computation in Elasticsearch. It provides hand-tuned AVX-512 and NEON kernels for every vector type Elasticsearch supports.

In bulk scoring (where production actually lives), the hand-tuned SIMD kernel library powering all vector distance computations in Elasticsearch, processes float32 vectors at 95 ns/vector. FAISS is at 165 ns. jvector at 412 ns on x86. That's 4x.

The gap widens with scale. Once you exceed CPU cache capacity, simdvec's prefetching drops L1 cache misses from 139K to 19K operations, doubling instructions per cycle. On ARM (Graviton), interleaved loading keeps the pipeline busy during memory fetches — backend stalls down 40%, and the gains hold regardless of dataset size.

This matters more than a benchmark. AI agents in 2026 are retrieval-bottlenecked, not model-bottlenecked. We ran this to ground with BrowseComp-Plus: same model, same prompts — accuracy went from 15% to 93% when retrieval got better. simdvec is what makes that retrieval fast enough to actually ship.

Chris Hegarty, Lorenzo Dematté, and Simon Cooper wrote the full breakdown, benchmarks and methodology included, not just the headline number.

→