IVFFlat vs HNSW in pgvector: Which Index Should You Use?

#database #performance #postgres #ai

IVFFlat vs HNSW in pgvector: Which Index Should You Use?

Every pgvector deployment eventually hits the same question: should you use IVFFlat or HNSW for your vector index? The pgvector docs describe both, Stack Overflow has opinions, and blog posts tend to default to "just use HNSW." But the real answer depends on your dataset size, recall tolerance, write patterns, and memory budget. Let's look at the actual tradeoffs with concrete SQL so you can make an informed decision.

How IVFFlat Works

IVFFlat (Inverted File with Flat compression) partitions your vectors into a configurable number of Voronoi cells -- called "lists." At query time, it identifies the nearest cells to your query vector and searches only those vectors. The number of cells searched is controlled by the probes setting.

The critical property of IVFFlat: index quality is determined at build time. The clustering step uses k-means on whatever data exists in the table when you run CREATE INDEX. If you build on an empty table, a partially loaded table, or data that doesn't represent your final distribution, the cell boundaries will be wrong and recall will suffer permanently until you rebuild.

The upside: IVFFlat builds are fast and the resulting index is compact. For very large datasets (50M+ vectors), the memory and build-time savings over HNSW can be substantial.

How HNSW Works

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph structure. Each vector is a node connected to its approximate neighbors. The top layers are sparse (long-range connections for fast navigation), and the bottom layers are dense (short-range connections for precision). Queries enter at the top and descend through layers, narrowing in on the nearest neighbors.

The key advantage: HNSW quality does not depend on build order or data distribution. New vectors are inserted into the graph incrementally, and the graph structure adapts. You don't need to rebuild after data changes. Recall is typically 95%+ with default parameters.

The tradeoff: HNSW indexes consume 2-5x more memory than IVFFlat because the graph stores neighbor connections at every layer. Build time is also longer, scaling with the m (connections per node) and ef_construction (search width during build) parameters.

Detecting Which Index You Have

-- List all vector indexes with their type and size
SELECT
    indexname,
    indexdef,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE indexdef LIKE '%ivfflat%' OR indexdef LIKE '%hnsw%'
ORDER BY indexname;

-- Check whether the index is actually being used
SELECT
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%ivfflat%' OR indexrelname LIKE '%hnsw%'
ORDER BY idx_scan DESC;

An index with idx_scan = 0 is being ignored by the planner. Either the table is small enough for sequential scans to win, or the query is not using the matching operator class.

Testing IVFFlat Recall

If you suspect your IVFFlat index has degraded recall, compare against exact results:

-- Exact results (sequential scan)
SET enable_indexscan = off;
SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10;

-- Index results
SET enable_indexscan = on;
SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10;

If the results differ significantly, your IVFFlat index needs more probes or a full rebuild with better list parameters.

Choosing the Right Index

Use HNSW When:

Your dataset is under 50 million vectors
You need 95%+ recall without extensive tuning
Your application inserts vectors continuously (not just bulk loads)
You can afford the higher memory footprint (2-5x over IVFFlat)
You want minimal ongoing maintenance

Use IVFFlat When:

Your dataset is very large (50M+ vectors) and HNSW build time is prohibitive
You can tolerate 90% recall with tuning
Your write pattern is bulk-load-then-query (not continuous inserts)
Memory is constrained
You are willing to rebuild the index after significant data distribution changes

Creating the Right Index

-- HNSW: higher recall, handles incremental inserts
CREATE INDEX CONCURRENTLY idx_docs_hnsw
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

-- IVFFlat: faster build, lower memory
-- CRITICAL: Load ALL data BEFORE creating the index
CREATE INDEX CONCURRENTLY idx_docs_ivfflat
ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);

HNSW Tuning Parameters

m (default 16): connections per node. Higher values improve recall but increase memory and build time. 16-64 is the practical range.
ef_construction (default 64): search width during build. Higher values produce a better graph. 200 is a good starting point for production.
hnsw.ef_search (runtime, default 40): search width during queries. Increase for better recall at the cost of latency.

SET hnsw.ef_search = 100;
EXPLAIN ANALYZE SELECT * FROM documents
ORDER BY embedding <=> $1 LIMIT 10;

IVFFlat Tuning Parameters

lists: number of Voronoi cells. Heuristic: sqrt(row_count) for under 1M rows, row_count / 1000 for larger tables.
ivfflat.probes (runtime, default 1): cells to search. Higher values improve recall. Start with sqrt(lists).

SET ivfflat.probes = 32;
EXPLAIN ANALYZE SELECT * FROM documents
ORDER BY embedding <=> $1 LIMIT 10;

The Decision Matrix

Factor	HNSW	IVFFlat
Build time (1M vectors)	Minutes	Seconds
Build time (100M vectors)	Hours	Minutes
Index size	2-5x larger	Compact
Default recall	~95%+	~70-80% (needs tuning)
Tuned recall	99%+	95%+
Incremental inserts	Handled well	Degrades quality
Maintenance	Minimal	Periodic rebuild

Common Mistakes

Building IVFFlat on a partial dataset. The clustering quality depends entirely on the data present at build time. Always load all data first.
Using IVFFlat defaults for probes. The default probes = 1 searches only the single nearest cell. Recall will be terrible. Set probes = 10-50 depending on your lists count.
Over-provisioning HNSW parameters. Setting m = 64 and ef_construction = 500 produces a high-quality graph but may consume more memory than your server can afford. Start with m = 16, ef_construction = 200 and increase only if recall is insufficient.
Never benchmarking. Always test both index types on your actual queries with your actual data. Synthetic benchmarks don't capture your data distribution or query patterns.

For most applications using pgvector today -- RAG pipelines, semantic search, recommendation engines -- HNSW is the safer default. It requires less tuning, handles real-world write patterns, and delivers high recall without surprises. Reserve IVFFlat for the specific case where you have a very large, mostly-static dataset and need to optimize for build time or memory.

Originally published at mydba.dev/blog/pgvector-ivfflat-vs-hnsw