DEV Community

Anshul Basia
Anshul Basia

Posted on • Originally published at altorlab.dev

I added semantic search to my React app without a backend (and it's under 1ms)

How I built browser-native semantic vector search using WASM — no server, no API keys, no per-query cost. Full code walkthrough with React.

My documentation site had search that sent every user query to Algolia. Fine for open source (free tier), annoying for a paid product where $1/1K searches compounds faster than you expect.

Fuse.js was the obvious alternative — runs in the browser, zero config. But Fuse.js is fuzzy text matching. A user typing "cancel my plan" will never find a doc titled "end your subscription." That's the gap I wanted to close.

What I wanted: search that understands meaning, runs entirely in the browser, and has zero per-query cost.

So I built altor-vec — HNSW vector search compiled to 54KB of WASM. No server. No API keys. The index is a static file on your CDN.

Fuse.js vs semantic search — the actual difference

Fuse.js asks: "does this string look like that string?" (Bitap / Levenshtein distance)

altor-vec asks: "does this meaning resemble that meaning?" (HNSW + embeddings)

`Query: "how do I cancel"

Fuse.js matches: docs containing the word "cancel"
altor-vec matches: docs about cancellation, ending service,
unsubscribing, account closure, billing stop`

The cost of semantic search: you need embeddings — a one-time build step. If you want typo tolerance over a short autocomplete list, Fuse.js is the right call. If you want understanding, keep reading.

Step 1: Generate the index at build time

// scripts/build-search-index.mjs
import { pipeline } from '@huggingface/transformers';
import { WasmSearchEngine } from 'altor-vec/node';
import fs from 'fs';

// Free embedding model, runs in Node.js — no API call needed
const embed = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

const docs = [
  { id: 0, text: 'How to cancel your subscription and manage billing' },
  { id: 1, text: 'Account settings and profile preferences' },
  { id: 2, text: 'Getting started with the API and authentication' },
  { id: 3, text: 'Troubleshooting login and password reset' },
  // ...your actual content
];

const vectors = [];
for (const doc of docs) {
  const out = await embed(doc.text, { pooling: 'mean', normalize: true });
  vectors.push(...Array.from(out.data));
}

// Build HNSW index
const engine = WasmSearchEngine.from_vectors(
  new Float32Array(vectors),
  384,   // dimensions (all-MiniLM-L6-v2 output size)
  16,    // M — connections per node
  200,   // ef_construction — build quality
  50     // ef_search — query recall
);

// Serialize to a binary file
fs.writeFileSync('./public/search-index.bin', Buffer.from(engine.serialize()));
console.log(`Built index: ${docs.length} docs`);
Enter fullscreen mode Exit fullscreen mode

Wire it into your build:

{
  "scripts": {
    "prebuild": "node scripts/build-search-index.mjs",
    "build": "vite build"
  }
}
Enter fullscreen mode Exit fullscreen mode

Now every deploy regenerates the index. For a docs site with a few hundred pages, this takes a few seconds.

Step 2: The React component

import { useState, useEffect, useRef, useCallback } from 'react';
import init, { WasmSearchEngine } from 'altor-vec';
/console.log(`Built index: ${docs.length} docs`);
export function SearchWidget({ docs }) {
  const engineRef = useRef(null);
  const embedRef = useRef(null);
  const [results, setResults] = useState([]);
  const [loading, setLoading] = useState(true);
  const timerRef = useRef(null);

  useEffect(() => {
    async function setup() {
      await init(); // loads the 54KB WASM module
      const res = await fetch('/search-index.bin');
      engineRef.current = WasmSearchEngine.from_bytes(
        new Uint8Array(await res.arrayBuffer())
      );
      // Embedding model runs in-browser, cached after first load (~23MB)
      embedRef.current = await pipeline(
        'feature-extraction',
        'Xenova/all-MiniLM-L6-v2'
      );
      setLoading(false);
    }
    setup();
  }, []);

  const handleSearch = useCallback((query) => {
    // Debounce — embedding takes ~50ms, don't fire on every keystroke
    clearTimeout(timerRef.current);
    timerRef.current = setTimeout(async () => {
      if (!query.trim() || !engineRef.current) return setResults([]);
      const out = await embedRef.current(query, { pooling: 'mean', normalize: true });
      const hits = JSON.parse(
        engineRef.current.search(new Float32Array(out.data), 5)
      );
      setResults(hits.map(([id, score]) => ({ ...docs[id], score })));
    }, 200);
  }, [docs]);

  if (loading) return <p>Loading search…</p>;

return (
    <div>
      <input
        type="search"
        placeholder="Search docs…"
        onChange={e => handleSearch(e.target.value)}
      />
      <ul>
        {results.map(r => (
          <li key={r.id}>
            <a href={r.url}>{r.title}</a>
          </li>
        ))}
      </ul>
    </div>
  );
}
Enter fullscreen mode Exit fullscreen mode

No API routes. No environment variables. No billing dashboard.

Production tip: move the engine + embedding model into a Web Worker so search never blocks the main thread. See the web worker guide.

The numbers

  • Query time: <1ms p95 for 10K vectors (384 dimensions) in Chrome
  • WASM size: 54KB gzipped — loads in ~100ms on a 4G connection
  • Index size: ~17MB for 10K documents (served from CDN, cached after first load)
  • Embedding model: ~23MB first load, then cached in browser storage
  • Per-query cost: $0

The first load is heavier than Fuse.js because you're downloading a model. If that's a dealbreaker, precompute all embeddings at build time and skip the in-browser model entirely — then query time is literally just the WASM search.

When NOT to use this

I want to be honest about the tradeoffs:

  • Millions of documents → the index file gets too big to serve efficiently. Use a server.
  • Real-time index updates → the index is rebuilt at deploy time. Not suitable for user-generated content that changes constantly.
  • Private/sensitive content → if documents are secret, you can't ship the index to every user's browser.
  • Need Algolia-style faceting and merchandising → altor-vec doesn't have that. Algolia is genuinely better for search-as-a-product.

For documentation sites, internal tools, marketing sites, personal projects, and anywhere the content is public and updates on deploys: this works very well.

Get started

npm install altor-vec
Enter fullscreen mode Exit fullscreen mode

I've been running this on my own docs for a few months. First-load is heavier than Fuse.js, but after the model caches, search latency is genuinely sub-millisecond — you feel the difference.

If you try it and hit something broken, open a GitHub issue or drop a comment here. Happy to help debug.

Top comments (0)