Alain Airom

Posted on Jan 13

Jvector vs. HSNW (Part 3)

#jvector #vectorsearch #hsnw

Part 3 of my jvector discoveries!

In this post, I analyze the performance benchmarks and design choices that make jVector a faster, more memory-efficient alternative to HNSW!

Introduction — What is HNSW?

Hierarchical Navigable Small World (HNSW) is a state-of-the-art graph-based algorithm designed for efficient Approximate Nearest Neighbor (ANN) search in high-dimensional spaces. It works by organizing data points into a multi-layered structure of “navigable small world” graphs, inspired by the concept of skip lists. The top layers are sparse and contain long-range links that allow the search to “hop” across large sections of the data space very quickly. As the search descends into lower, denser layers, the links become shorter and more localized, eventually narrowing down to the exact or near-exact neighbors in the bottom layer. This hierarchical approach allows HNSW to achieve logarithmic search complexity, making it significantly faster than a brute-force linear scan while maintaining very high accuracy (recall).

Where is HNSW used?

Today, HNSW is the industry standard for vector indexing and is widely implemented in nearly every major vector database, including Pinecone, Milvus, Weaviate, and Qdrant, as well as popular extensions like pgvector for PostgreSQL. It is used extensively in Generative AI (GenAI) and Retrieval-Augmented Generation (RAG) pipelines to quickly find relevant document chunks that provide context to Large Language Models (LLMs). Beyond AI, it powers real-time recommendation engines (like suggesting products or music), image and video retrieval systems, and anomaly detection in cybersecurity. While it is primarily an in-memory index that requires significant RAM for large datasets, its balance of speed and precision has made it the "go-to" choice for enterprise-scale similarity search applications.

Reminder of what is Jvector

Excerpt from jvetor documentation;

Exact nearest neighbor search (k-nearest-neighbor or KNN) is prohibitively expensive at higher dimensions, because approaches to segment the search space that work in 2D or 3D like quadtree or k-d tree devolve to linear scans at higher dimensions. This is one aspect of what is called “the curse of dimensionality.”
With larger datasets, it is almost always more useful to get an approximate answer in logarithmic time, than the exact answer in linear time. This is abbreviated as ANN (approximate nearest neighbor) search.
There are two broad categories of ANN index:

Partition-based indexes, like LSH or IVF or SCANN

Graph indexes, like HNSW or DiskANN Graph-based indexes tend to be simpler to implement and faster, but more importantly they can be constructed and updated incrementally. This makes them a much better fit for a general-purpose index than partitioning approaches that only work on static datasets that are completely specified up front. That is why all the major commercial vector indexes use graph approaches. JVector is a graph index that merges the DiskANN and HNSW family trees. JVector borrows the hierarchical structure from HNSW, and uses Vamana (the algorithm behind DiskANN) within each layer. JVector is a graph-based index that builds on the HNSW and DiskANN designs with composable extensions. JVector implements a multi-layer graph with nonblocking concurrency control, allowing construction to scale linearly with the number of cores.

Jvector Advantages

JVector vs. HNSW: The Architecture Shift



| Feature             | **HNSW**                                                      | **JVector**                                                |
| ------------------- | ------------------------------------------------------------  | ---------------------------------------------------------- |
| **Algorithm Base**  | Hierarchical Small World Graphs                               | **Vamana** (DiskANN) + Hierarchical Layers                 |
| **Memory Strategy** | In-Memory (Entire index must fit in RAM)                      | **Disk-Aware** (Compressed index in RAM, full data on SSD) |
| **Parallelism**     | Often single-threaded construction (implementation dependent) | **Concurrent, lock-free** construction (linear scaling)    |
| **Hardware Use**    | standard CPU                                                  | Optimized for **SIMD (Java Vector API)**                   |
| **Scaling**         | Vertical (Requires more RAM as data grows)                    | Hybrid (Scales with disk throughput/IOPS)                  |

Vamana vs. HNSW Layers

HNSW works like a “skip list” for graphs — it has sparse top layers for fast navigation and a dense bottom layer for accuracy. JVector takes this further by using the Vamana algorithm (the engine behind Microsoft’s DiskANN). Vamana graphs are more “robust,” meaning they provide better connectivity with fewer edges. This allows JVector to find results with fewer “hops,” reducing the CPU work required for every search.

Breaking the “RAM Prison”

The biggest limitation of HNSW is that if your index doesn’t fit in RAM, your performance falls off a cliff.

HNSW: Needs roughly 1.3× the raw vector size in RAM to stay fast. For 100 million vectors, this could easily cost thousands of dollars in monthly cloud RAM costs.
JVector: Keeps only a Product Quantized (PQ) version of the graph in memory. When it finds the top candidates, it performs a “re-ranking” by pulling the full-precision vectors from disk. This allows you to host massive datasets on a machine with 1/10th the RAM.

True Multithreading

Most traditional HNSW implementations (like hnswlib) struggle with parallel index building because of the complexity of maintaining graph integrity while multiple threads add nodes. JVector features a lock-free concurrent construction mechanism. In benchmarks, increasing your CPU core count leads to a nearly linear increase in indexing speed. For a billion-vector dataset, this turns an indexing process that would take days into one that takes hours.

SIMD Acceleration (Project Panama)

While many C++ HNSW libraries use SIMD, Java libraries have historically been slow at math-heavy vector operations. JVector is one of the first major libraries to use Java’s new Vector API, allowing it to talk directly to your CPU’s AVX-512 or ARM Neon instructions. This levels the playing field, giving you C++ performance inside a safe Java environment.

Image from wikipedia — https://en.wikipedia.org/wiki/Single_instruction,_multiple_data

The Configuration Philosophy

HNSW (and Lucene too), is primarily tuned for memory and the number of connections. JVector, is tuned for the balance between Product Quantization (PQ) compression and the Vamana graph structure.

| Feature          | Lucene/HNSW Config             | JVector Config                |
| ---------------- | ------------------------------ | ----------------------------- |
| **Connectivity** | `maxConn` (M)                  | `maxDegree`                   |
| **Search Depth** | `beamWidth`                    | `constructionSearchConf`      |
| **Algorithm**    | Rigid HNSW hierarchy           | **Vamana** (supports DiskANN) |
| **Compression**  | Scaler Quantization (optional) | **Native PQ** (integrated)    |

Code Comparison

Building an Index- Lucene HNSW (Traditional)

HNSW/Lucene’s configuration focuses on the hierarchical layers and how many neighbors to keep per level.

// Lucene HNSW Configuration
HnswGraphVectorFormat format = new HnswGraphVectorFormat(
    16,   // M: number of connections per node
    100   // beamWidth: search depth during indexing
);

JVector (Modern / Disk-Aware)

JVector allows for a “Hierarchical” mode similar to HNSW but adds parameters for degree overflow and diversity (Alpha), which are key to the Vamana algorithm’s efficiency.

// JVector Vamana/DiskANN Configuration
var builder = new GraphIndexBuilder(
    buildScoreProvider, 
    dimension, 
    16,    // maxDegree: connections per node
    100,   // constructionSearchDepth: how hard to look for neighbors
    1.2f,  // alpha: diversity of neighbors (Vamana specific)
    1.5f   // degreeOverflow: flexibility during build
);

// Optional: Enable hierarchy (HNSW-style layers)
builder.setAddHierarchy(true);

The “Secret Sauce”: Quantization Setup

The biggest difference is how JVector handles memory. While HNSW typically requires the full vectors to be present for scoring, JVector uses a compressed representation for the graph search and only uses the “real” data for the final ranking.

How JVector configures compression;

// Compute PQ codebooks from a sample of your data
var pq = ProductQuantization.compute(vectorSource, clusters, subspaces, true);

// Encode all vectors to stay in-memory (compressed)
var compressedVectors = pq.encodeAll(vectorSource);

// Search using the compressed vectors first (High Speed, Low RAM)
var searcher = new GraphSearcher.Builder(index.getView())
    .build();

var results = searcher.search(
    queryVector, 
    10, 
    compressedVectors.getApproximateScoreFunction(queryVector)
);

Summary: Which approach wins?

HNSW is simpler to configure if you have plenty of RAM and your dataset is relatively small (under 1M vectors).
JVector is superior when you need to handle larger-than-memory datasets or want to leverage modern CPU instructions (SIMD) without leaving the Java ecosystem.

Conclusion

I guess we can apply “rules of thumb” in mind to ensure when we need a system to stays fast and reliable as the quantity of the data grows.

Optimize for the “90/10” Rule: the beauty of JVector’s DiskANN-style design is that you don’t need to keep 100% of your data in RAM. Aim to keep your Graph Index and Compressed Vectors (PQ) in memory, while leaving the high-precision “Full Vectors” on an NVMe SSD. This setup typically gives you sub-10ms latency at a fraction of the cost of a pure in-memory solution.
Standardizing Vectors: jvector supports several similarity metrics, but for most applications, using Dot Product on normalized vectors is the “sweet spot” — it is mathematically equivalent to Cosine Similarity but runs up to 50% faster because it requires fewer CPU instructions.
Leverage “Linear Scaling” for Updates: jvector is designed to handle concurrent updates. Unlike many other engines that lock up or slow down when you add new data, JVector scales linearly. If you have a massive batch of vectors to index, don’t be afraid to throw 32 or 64 threads at the GraphIndexBuilder—it will use every bit of your CPU to finish the job faster.
Monitor Your “Recall”: in the world of Approximate Nearest Neighbor (ANN) search, speed is a trade-off with accuracy. Always benchmarking the “Recall” (the percentage of true nearest neighbors found). If the recall drops below ‘the’ requirements (e.g., 95%), we can consider increasing the constructionSearchDepth or the number of subspaces in the Product Quantization.

That’s a wrap, thanks for reading! 🫠

DEV Community