HNSW Build Time at 1M Embeddings: 27 Minutes to 10 Minutes by Fixing Insertion Order
For a 1M-embedding corpus, we reduced HNSW construction time from about 27 minutes to about 10 minutes (2.7x) without changing recall or graph quality.
This post explains:
- the problem (where traversal work is wasted during construction),
- the solution (BM25-seeded insertion order),
- and the math behind the observed speedup.
All numbers in this writeup use the validated parameters from the current implementation:
M=16ef_construction=100- seed set size =
256 * 8 = 2,048nodes
Problem: Random insertion order creates traversal waste
HNSW build quality and build cost both depend on insertion order. With random insertion:
- early nodes form accidental local hubs,
- new nodes frequently enter a poor region first,
- greedy search spends extra distance evaluations before finding useful neighbors.
That wasted traversal work compounds over time.
Visual A: Where random-order traversal waste comes from
In practice, this increases construction cost by a multiplicative overhead factor. I will call that factor beta:
build_time = ideal_time * beta
Where:
-
ideal_timeis the minimum cost if each insert reaches good neighbors with minimal detours, -
beta > 1captures wasted traversal and repair work.
Baseline mechanics: why layer-0 dominates at 1M scale
Using M=16, the level distribution gives:
P(node at layer >= 1) = 1/M = 1/16 = 6.25%- so
93.75%of inserted nodes effectively do all meaningful work in layer 0.
For ef_construction=100, expected distance computations per insertion are:
Upper layers: 0.067 * 100 * 16 = 107
Layer 0: 1.000 * 100 * 32 = 3,200
-----
Total per insert = 3,307
(32 above is 2*M, the layer-0 connection bound.)
So the primary optimization target is not exotic upper-layer behavior; it is reducing layer-0 traversal waste during insertion.
Solution: BM25-seeded insertion creates a backbone first
Instead of random insertion order, we pick a lexically diverse seed set from BM25 and insert those vectors first.
Seed extraction:
- take high-IDF terms,
- for each term, take top docs by term frequency,
- defaults:
NORNICDB_HNSW_LEXICAL_SEED_MAX_TERMS=256,NORNICDB_HNSW_LEXICAL_SEED_PER_TERM=8, - maximum seed set:
256 * 8 = 2,048nodes.
Build order:
- insert seed nodes first,
- insert remaining
N - seed_countnodes.
This gives the graph a broad early backbone, so later inserts find useful neighbors quickly instead of wandering.
Visual B: Seed-first construction reduces detours
The math check on the 27 -> 10 minute result
Use a conservative distance-op estimate for 1024-dim float32 vectors in Go:
- compute plus memory effects: about
160 nsper distance operation in this workload class.
Then ideal floor for 1M insertions:
ideal_time
= 1,000,000 * 3,307 * 160 ns
= 529 s
= 8.8 min
Now map measured times to beta:
beta_random = 27 / 8.8 = 3.07
beta_seeded = 10 / 8.8 = 1.13
speedup = beta_random / beta_seeded
= 3.07 / 1.13
= 2.72x ~= 2.7x
This is the key point: the reported speedup is exactly what you expect if seeded order mostly removes traversal waste.
Visual C: Overhead factor (beta) before vs after
beta (lower is better)
Random order | ############################### 3.07
Seeded order | ########### 1.13
Interpretation:
- random build spends about
3.07xthe ideal work, - seeded build is close to the floor at about
1.13x.
Time decomposition for the 1M run
Total build time decomposition (minutes)
Case Ideal floor Overhead Total
Random order 8.8 18.2 27.0
Seeded order 8.8 1.2 10.0
Visual D: Same floor, different overhead
Random order | [#########.................][##################] 27.0
Seeded order | [#########.................][#] 10.0
ideal floor (8.8 min) overhead
Both runs share the same algorithmic floor; the difference is how much overhead is paid while traversing and wiring the graph.
Why this does not require a recall tradeoff
This change does not reduce ef_construction, M, or search-time quality knobs. It changes insertion order so the builder spends less effort reaching good neighborhoods.
That is why a large build-time gain can occur without reducing recall or graph quality: the graph is built with the same target connectivity constraints, but with less wasted traversal on the way there.
How to reproduce in your environment
- Keep HNSW params fixed (
M,ef_constructionunchanged). - Build once with seeding disabled (or effectively zero seed set).
- Build once with defaults:
NORNICDB_HNSW_LEXICAL_SEED_MAX_TERMS=256
NORNICDB_HNSW_LEXICAL_SEED_PER_TERM=8
- Log and compare:
- total build time,
- insertion throughput,
- recall on a fixed validation query set.
Use ratio as the primary cross-machine signal. Absolute minutes depend on CPU, memory bandwidth, cache behavior, and runtime effects.
Secondary effect: same seed mechanism helps k-means init
The same BM25-derived seed mechanism is also used by bm25+kmeans++ seed mode for centroid initialization. That improves initial centroid spread and typically reduces convergence iterations in the k-means phase.
The important architectural detail is reuse: one seed extraction pass supports both HNSW construction order and k-means initialization.
Closing
The 27-to-10 minute result is not a tuning artifact. It is a direct consequence of reducing traversal waste during graph construction:
- keep the same quality parameters,
- improve insertion geometry,
- move
betafrom about3.07to about1.13.
At 1M scale, this is enough to produce a repeatable 2.7x build-time improvement while preserving result quality.


Top comments (0)