DEV Community

Matteo Panzeri
Matteo Panzeri

Posted on

Building Production-Ready Vector Search for the Browser with Rust and WebAssembly

The Problem: Vector Search at the Edge

If you've worked with AI applications, you've probably used vector databases like Pinecone, Weaviate, or Qdrant. They're excellent for server-side deployments, but what happens when you need vector search:

  • In a browser — without sending user data to external servers?
  • Offline — where network connectivity isn't guaranteed?
  • At the edge — where latency to cloud services is unacceptable?

This was the problem I set out to solve with EdgeVec.

What is EdgeVec?

EdgeVec is an embedded vector database built in Rust that compiles to WebAssembly. It runs sub-millisecond nearest neighbor search directly in browsers, Node.js, and edge devices.

Full disclosure: I built this as a solo developer with LLM assistance (Claude). The AI helped with architecture review, boilerplate code, test generation, and documentation. I handled core algorithm decisions, performance optimization, and quality verification. This collaboration model accelerated development significantly.

Key numbers:

  • 329µs search latency at 100k vectors (768 dimensions)
  • 227 KB gzipped bundle size
  • 3.6x memory compression via scalar quantization
  • 24x faster than voy (the fastest pure-WASM alternative)
import init, { EdgeVec, EdgeVecConfig } from 'edgevec';

await init();
const config = new EdgeVecConfig(128);  // 128 dimensions
const index = new EdgeVec(config);

// Insert vectors
const embedding = new Float32Array(128).fill(0.1);
const id = index.insert(embedding);

// Search
const results = index.search(embedding, 10);
console.log(results);  // [{ id: 0, score: 0.0 }, ...]

// Persist to IndexedDB
await index.save("my-vectors");
Enter fullscreen mode Exit fullscreen mode

Why WebAssembly?

I chose WebAssembly (WASM) for three reasons:

1. Performance

JavaScript is fast for most tasks, but vector similarity search involves millions of floating-point operations. WASM runs at near-native speed, and Rust's zero-cost abstractions mean no runtime overhead.

2. Portability

The same compiled WASM module runs in:

  • Chrome, Firefox, Safari, Edge
  • Node.js
  • Deno
  • Cloudflare Workers
  • Any WASM runtime

One codebase, deploy anywhere.

3. Security

WASM runs in a sandboxed environment with no access to the file system or network by default. User embeddings never leave the device unless you explicitly send them.

The Architecture

EdgeVec uses three core components:

HNSW Index (Hierarchical Navigable Small World)

HNSW is a graph-based algorithm for approximate nearest neighbor search with O(log n) query complexity. Each vector becomes a node in a multi-layer graph:

  • Layer 0: Contains all vectors, dense connections
  • Higher layers: Contain fewer vectors, act as "express lanes"

Search starts at the top layer and descends, following the greedy path to the nearest neighbor.

SQ8 Quantization

Full-precision vectors use 4 bytes per dimension. For 768-dimensional embeddings (like OpenAI's), that's ~3 KB per vector.

EdgeVec's scalar quantization (SQ8) compresses each dimension to 1 byte:

  • 3.6x memory reduction
  • Minimal recall loss (~1-2% at k=10)
  • SIMD-accelerated distance calculations

IndexedDB Persistence

In browsers, EdgeVec stores data in IndexedDB — the same storage API used by offline-first web apps. Your vectors survive page refreshes and browser restarts.

// Save
await index.save("my-index");

// Load
const loaded = await EdgeVec.load("my-index");
Enter fullscreen mode Exit fullscreen mode

What's New in v0.4.0

Version 0.4.0 focuses on production readiness:

Documentation Suite

  • Tutorial: Step-by-step getting started guide
  • Performance Tuning: HNSW parameter optimization
  • Troubleshooting: Top 10 common errors and solutions
  • Integration Guide: Works with transformers.js, TensorFlow.js, OpenAI API

Migration Guides

Detailed instructions for migrating from:

  • hnswlib (Python/C++)
  • FAISS (Python)
  • Pinecone (Cloud)

Quality Infrastructure

  • 15 chaos tests: Empty index, max dimensions (4096), all deleted, etc.
  • Load tests: 100k vector stress tests
  • P99 latency tracking: Percentile benchmarks in CI
  • Regression detection: 10% threshold enforcement

Performance Comparison

Benchmarked on AMD Ryzen 7 5700U, 16GB RAM:

Library Search P50 Type Notes
EdgeVec 0.20ms WASM Fastest pure-WASM
hnswlib-node 0.05ms Native C++ Requires compilation
voy 4.78ms WASM KD-tree algorithm

EdgeVec is 24x faster than voy while both run as pure WASM. Native bindings (hnswlib-node) are faster but don't work in browsers and require a C++ toolchain.

Development Philosophy

LLMs are tools, like IDEs and linters. The key is understanding what you're building and verifying the output. Every piece of generated code went through manual review, unit tests, integration tests, and benchmark validation.

Use Cases

Offline-First Semantic Search

Build search features that work without internet:

// User searches their local notes
const queryEmbedding = await embed("meeting notes from last week");
const results = index.search(queryEmbedding, 5);
Enter fullscreen mode Exit fullscreen mode

Privacy-Preserving RAG

Run retrieval-augmented generation without sending documents to external APIs:

// All embeddings stay on device
const context = index.search(questionEmbedding, 3);
const answer = await localLLM.generate(question, context);
Enter fullscreen mode Exit fullscreen mode

Browser-Based AI Assistants

Add semantic memory to chatbots running entirely client-side:

// Store conversation embeddings
index.insert(await embed(userMessage));

// Retrieve relevant context
const relevant = index.search(await embed(newMessage), 5);
Enter fullscreen mode Exit fullscreen mode

Getting Started

Installation

npm install edgevec
Enter fullscreen mode Exit fullscreen mode

Rust Usage

use edgevec::{HnswConfig, HnswIndex, VectorStorage};

let config = HnswConfig::new(128);
let mut storage = VectorStorage::new(&config, None);
let mut index = HnswIndex::new(config, &storage)?;

let id = index.insert(&vec![1.0; 128], &mut storage)?;
let results = index.search(&vec![1.0; 128], 10, &storage)?;
Enter fullscreen mode Exit fullscreen mode

With OpenAI Embeddings

import init, { EdgeVec, EdgeVecConfig } from 'edgevec';
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function getEmbedding(text) {
    const response = await openai.embeddings.create({
        model: "text-embedding-3-small",
        input: text,
    });
    return new Float32Array(response.data[0].embedding);
}

await init();
const index = new EdgeVec(new EdgeVecConfig(1536));

// Index your documents
const docs = ["Hello world", "Vector search is cool"];
for (const doc of docs) {
    index.insert(await getEmbedding(doc));
}

// Search
const query = await getEmbedding("greeting");
const results = index.search(query, 5);
Enter fullscreen mode Exit fullscreen mode

Future Roadmap (v0.5.0+)

  • ARM/NEON optimization: Verified cross-platform SIMD
  • Mobile support: Formalized iOS Safari and Android Chrome testing
  • Enhanced metadata: Native metadata storage alongside vectors
  • CLI tools: Optional developer command-line interface

Links

Conclusion

EdgeVec proves that high-performance vector search doesn't require cloud infrastructure. With Rust and WebAssembly, we can bring AI capabilities directly to users' devices — faster, more private, and more reliable.

If you're building offline-first AI applications, privacy-preserving search, or browser-based ML tools, give EdgeVec a try. It's MIT/Apache-2.0 dual-licensed, and I welcome contributions and feedback.


Questions? Drop a comment below or open an issue on GitHub. I'm happy to discuss the implementation details or help troubleshoot integration issues.


References

Top comments (0)