This is an excerpt. The full article includes a live 2D vector space sandbox — drag a query vector across a semantic coordinate field and watch cosine similarity scores update in real time across 9 framework nodes, with a toggle between Brute Force KNN and HNSW skip-graph traversal modes. Read the full interactive version →
The Geometry of Semantics
In the era of large language models, unstructured text is treated not as raw characters, but as a vector coordinate in a high-dimensional mathematical space.
When you pass text into an embedding model (like Google's text-embedding-004), it maps the semantic meaning into an array of floating-point numbers:
"Redis In-Memory Cache" → [ 0.01249, -0.04581, 0.08914, ..., -0.00312 ] // 3072 dimensions
"PostgreSQL Database" → [ 0.01183, -0.04491, 0.08877, ..., -0.00298 ] // 3072 dimensions
"PyTorch Model" → [-0.09214, 0.07321, -0.03812, ..., 0.06441 ] // 3072 dimensions
Within this latent space, conceptually similar texts are geometrically close. "Redis" and "PostgreSQL" cluster near each other (both databases). "PyTorch" is far away (AI/ML domain).
Semantic search = finding the K-nearest neighbor coordinates to a query vector.
The Mathematical Distance Metrics
Three metrics dominate vector similarity calculations:
1. Cosine Similarity (most common for NLP)
cos(θ) = (A · B) / (|A| × |B|)
Measures the angular projection between two vectors. Range: -1 to 1. Completely ignores magnitude — two vectors pointing in the same direction score 1.0 even if one is 100× longer. This is ideal for text embeddings where magnitude carries no semantic information.
2. Dot Product (fastest for normalized vectors)
A · B = Σ(Aᵢ × Bᵢ)
When vectors are L2-normalized (unit length = 1), dot product yields identical rankings to cosine similarity with significantly less computation. Normalize your embeddings on ingestion, then use dot product at query time.
3. L2 Euclidean Distance (for spatial data)
d(A, B) = √(Σ(Aᵢ - Bᵢ)²)
Measures raw physical distance between vector tips. Highly sensitive to magnitude variations. Preferred for coordinate-based spatial systems, not raw text embeddings.
The Scaling Bottleneck: Why Brute Force Fails
In naive Brute Force KNN, you compare the query vector against every document in your database:
Complexity: O(D × N) — where D = dimensions, N = document count
At scale, brute force saturates CPU and destroys real-time response requirements. You need Approximate Nearest Neighbor (ANN) indexing.
HNSW: Hierarchical Navigable Small World
HNSW translates the Skip List data structure into a multi-layered graph:
LAYER 2 (Sparse Highway) ─── node_A ──────────────────── node_B ───
LAYER 1 (Intermediate) ─── node_A ─── node_C ─── node_B ─── node_D
LAYER 0 (Dense Ground) ─ ALL nodes fully connected to neighbors ──
This reduces search complexity from:
Brute Force: O(N) → Linear — fails at scale
HNSW: O(log N) → Logarithmic — sub-5ms at 10M+ docs
TypeScript HNSW Implementation
class HNSWIndex {
private layers: Map<string, string[]>[] = [];
private vectors: Map<string, number[]> = new Map();
private readonly maxLayers: number;
private readonly maxConnections: number;
constructor(maxLayers = 4, maxConnections = 16) {
this.maxLayers = maxLayers;
this.maxConnections = maxConnections;
this.layers = Array.from({ length: maxLayers }, () => new Map());
}
private cosineSimilarity(a: number[], b: number[]): number {
const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
const magA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
const magB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
return magA === 0 || magB === 0 ? 0 : dot / (magA * magB);
}
private getRandomLayer(): number {
let layer = 0;
while (layer < this.maxLayers - 1 && Math.random() < 0.5) {
layer++;
}
return layer;
}
insert(id: string, vector: number[]): void {
this.vectors.set(id, vector);
const assignedLayer = this.getRandomLayer();
for (let level = 0; level <= assignedLayer; level++) {
if (!this.layers[level].has(id)) {
const candidates = this.searchLayer(vector, level, this.maxConnections);
this.layers[level].set(id, candidates.map(c => c.id));
}
}
}
search(queryVector: number[], k: number): { id: string; score: number }[] {
let candidates = this.getEntryPoints(queryVector);
for (let level = this.maxLayers - 1; level >= 0; level--) {
candidates = this.searchLayer(queryVector, level, k, candidates);
}
return candidates
.sort((a, b) => b.score - a.score)
.slice(0, k);
}
private searchLayer(
query: number[],
level: number,
k: number,
entryPoints: { id: string; score: number }[] = []
): { id: string; score: number }[] {
const visited = new Set<string>();
const results: { id: string; score: number }[] = [...entryPoints];
const queue = [...entryPoints];
while (queue.length > 0) {
const current = queue.shift()!;
if (visited.has(current.id)) continue;
visited.add(current.id);
const neighbors = this.layers[level].get(current.id) ?? [];
for (const neighborId of neighbors) {
if (!visited.has(neighborId)) {
const vec = this.vectors.get(neighborId)!;
const score = this.cosineSimilarity(query, vec);
queue.push({ id: neighborId, score });
results.push({ id: neighborId, score });
}
}
}
return results
.sort((a, b) => b.score - a.score)
.slice(0, k);
}
private getEntryPoints(query: number[]): { id: string; score: number }[] {
const entries: { id: string; score: number }[] = [];
for (const [id, vec] of this.vectors) {
entries.push({ id, score: this.cosineSimilarity(query, vec) });
if (entries.length >= 3) break;
}
return entries;
}
}
Production Scaling
Memory-Mapped Files (mmap): Keeping large indices in RAM causes OOM crashes. Bind binary index files directly to virtual address space — the OS swaps vector blocks in/out of disk cache automatically based on query patterns.
Product Quantization (PQ): Divide each high-dimensional vector into sub-vectors and map them to centroids. This reduces memory footprint by up to 95% with minimal recall degradation.
Engineering Takeaways
- Always L2-normalize embeddings on ingestion to enable fast dot product scoring instead of full cosine calculation.
- HNSW reduces O(N) to O(log N) — the single most impactful optimization for vector search at scale.
-
Use
mmap+ Product Quantization in production to handle tens of millions of vectors without OOM.
The full article includes a drag-and-drop 2D semantic space sandbox — 9 framework nodes plotted across a latent coordinate field, real-time cosine similarity leaderboard, toggle between Brute Force and HNSW skip-graph modes, and live layer traversal visualization.
Written by Ebenezer Akinseinde — Software Developer & AI Automations Engineer.
Top comments (0)