How I built browser-native semantic vector search using WASM — no server, no API keys, no per-query cost. Full code walkthrough with React.
My documentation site had search that sent every user query to Algolia. Fine for open source (free tier), annoying for a paid product where $1/1K searches compounds faster than you expect.
Fuse.js was the obvious alternative — runs in the browser, zero config. But Fuse.js is fuzzy text matching. A user typing "cancel my plan" will never find a doc titled "end your subscription." That's the gap I wanted to close.
What I wanted: search that understands meaning, runs entirely in the browser, and has zero per-query cost.
So I built altor-vec — HNSW vector search compiled to 54KB of WASM. No server. No API keys. The index is a static file on your CDN.
Fuse.js vs semantic search — the actual difference
Fuse.js asks: "does this string look like that string?" (Bitap / Levenshtein distance)
altor-vec asks: "does this meaning resemble that meaning?" (HNSW + embeddings)
`Query: "how do I cancel"
Fuse.js matches: docs containing the word "cancel"
altor-vec matches: docs about cancellation, ending service,
unsubscribing, account closure, billing stop`
The cost of semantic search: you need embeddings — a one-time build step. If you want typo tolerance over a short autocomplete list, Fuse.js is the right call. If you want understanding, keep reading.
Step 1: Generate the index at build time
// scripts/build-search-index.mjs
import { pipeline } from '@huggingface/transformers';
import { WasmSearchEngine } from 'altor-vec/node';
import fs from 'fs';
// Free embedding model, runs in Node.js — no API call needed
const embed = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const docs = [
{ id: 0, text: 'How to cancel your subscription and manage billing' },
{ id: 1, text: 'Account settings and profile preferences' },
{ id: 2, text: 'Getting started with the API and authentication' },
{ id: 3, text: 'Troubleshooting login and password reset' },
// ...your actual content
];
const vectors = [];
for (const doc of docs) {
const out = await embed(doc.text, { pooling: 'mean', normalize: true });
vectors.push(...Array.from(out.data));
}
// Build HNSW index
const engine = WasmSearchEngine.from_vectors(
new Float32Array(vectors),
384, // dimensions (all-MiniLM-L6-v2 output size)
16, // M — connections per node
200, // ef_construction — build quality
50 // ef_search — query recall
);
// Serialize to a binary file
fs.writeFileSync('./public/search-index.bin', Buffer.from(engine.serialize()));
console.log(`Built index: ${docs.length} docs`);
Wire it into your build:
{
"scripts": {
"prebuild": "node scripts/build-search-index.mjs",
"build": "vite build"
}
}
Now every deploy regenerates the index. For a docs site with a few hundred pages, this takes a few seconds.
Step 2: The React component
import { useState, useEffect, useRef, useCallback } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';
/console.log(`Built index: ${docs.length} docs`);
export function SearchWidget({ docs }) {
const engineRef = useRef(null);
const embedRef = useRef(null);
const [results, setResults] = useState([]);
const [loading, setLoading] = useState(true);
const timerRef = useRef(null);
useEffect(() => {
async function setup() {
await init(); // loads the 54KB WASM module
const res = await fetch('/search-index.bin');
engineRef.current = WasmSearchEngine.from_bytes(
new Uint8Array(await res.arrayBuffer())
);
// Embedding model runs in-browser, cached after first load (~23MB)
embedRef.current = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2'
);
setLoading(false);
}
setup();
}, []);
const handleSearch = useCallback((query) => {
// Debounce — embedding takes ~50ms, don't fire on every keystroke
clearTimeout(timerRef.current);
timerRef.current = setTimeout(async () => {
if (!query.trim() || !engineRef.current) return setResults([]);
const out = await embedRef.current(query, { pooling: 'mean', normalize: true });
const hits = JSON.parse(
engineRef.current.search(new Float32Array(out.data), 5)
);
setResults(hits.map(([id, score]) => ({ ...docs[id], score })));
}, 200);
}, [docs]);
if (loading) return <p>Loading search…</p>;
return (
<div>
<input
type="search"
placeholder="Search docs…"
onChange={e => handleSearch(e.target.value)}
/>
<ul>
{results.map(r => (
<li key={r.id}>
<a href={r.url}>{r.title}</a>
</li>
))}
</ul>
</div>
);
}
No API routes. No environment variables. No billing dashboard.
Production tip: move the engine + embedding model into a Web Worker so search never blocks the main thread. See the web worker guide.
The numbers
- Query time: <1ms p95 for 10K vectors (384 dimensions) in Chrome
- WASM size: 54KB gzipped — loads in ~100ms on a 4G connection
- Index size: ~17MB for 10K documents (served from CDN, cached after first load)
- Embedding model: ~23MB first load, then cached in browser storage
- Per-query cost: $0
The first load is heavier than Fuse.js because you're downloading a model. If that's a dealbreaker, precompute all embeddings at build time and skip the in-browser model entirely — then query time is literally just the WASM search.
When NOT to use this
I want to be honest about the tradeoffs:
- Millions of documents → the index file gets too big to serve efficiently. Use a server.
- Real-time index updates → the index is rebuilt at deploy time. Not suitable for user-generated content that changes constantly.
- Private/sensitive content → if documents are secret, you can't ship the index to every user's browser.
- Need Algolia-style faceting and merchandising → altor-vec doesn't have that. Algolia is genuinely better for search-as-a-product.
For documentation sites, internal tools, marketing sites, personal projects, and anywhere the content is public and updates on deploys: this works very well.
Get started
npm install altor-vec
- Getting started (5-minute guide): altorlab.dev/getting-started
- API reference: altorlab.dev/api
- React full guide (with Web Worker, debounce, error handling): altorlab.dev/guides/react/document-search
- vs Fuse.js (detailed comparison): altorlab.dev/vs/fuse-js
- Migrating from Algolia: altorlab.dev/migrate-from/algolia
- GitHub: github.com/altor-lab/altor-vec
I've been running this on my own docs for a few months. First-load is heavier than Fuse.js, but after the model caches, search latency is genuinely sub-millisecond — you feel the difference.
If you try it and hit something broken, open a GitHub issue or drop a comment here. Happy to help debug.
Top comments (0)