I Built an Offline-First Semantic Search Engine in JavaScript
Search is deceptively hard.
Most JavaScript search libraries stop at keywords or fuzzy matching, and most semantic search solutions assume external APIs, vector databases, or hosted services.
I wanted something different:
- runs fully locally
- works in Node.js or the browser
- understands meaning, not just text
- doesn’t require standing up new infrastructure
That led me to build Simile Search — an offline-first semantic + fuzzy search engine in JavaScript.
What Simile Does Differently
Simile combines multiple techniques instead of relying on a single scoring method:
🧠 Semantic Search (Local Embeddings)
Uses transformer-based embeddings (via transformers.js) to capture meaning, so queries like:
“phone charger” → “USB-C cable”
work even when there’s no keyword overlap.
No APIs. No Python. No server calls.
⚡ Fast Vector Search with HNSW
To keep semantic search fast, Simile uses HNSW (Hierarchical Navigable Small World) indexing for approximate nearest-neighbor search.
This gives:
- sub-linear search time
- predictable performance as the catalog grows
- practical latency for interactive search
🗜 Vector Quantization
Raw float vectors are memory-heavy. Simile applies vector quantization to reduce memory usage while keeping similarity quality high.
This matters when:
- running inside Node.js
- embedding catalogs that aren’t tiny
- keeping everything in memory
💾 Vector Caching & Persistence
Embedding is the slowest part of semantic search.
Simile avoids repeating work by:
- caching vectors for previously seen text
- allowing full snapshot save/load
- restoring instantly without re-embedding
This makes it viable for real backend services.
🔤 Fuzzy Matching + 🎯 Keyword Boosting
Semantic similarity alone isn’t enough.
Simile blends:
- fuzzy matching (typos, partial input)
- exact keyword boosting (precision)
- normalized scoring so no method dominates unfairly
You can tune the weights depending on your domain.
🔗 Nested Object Search
Instead of flattening data manually, Simile can search directly across nested paths:
metadata.author.firstName
metadata.tags
items[0].name
This makes it practical for real product catalogs and structured data.
Where This Is Actually Useful
Simile works best for:
- product & inventory catalogs
- internal tools and dashboards
- knowledge bases
- autocomplete / typeahead search
- privacy-first or offline-capable apps
- NestJS backends without extra search infrastructure
It’s not trying to replace MeiliSearch, Elastic, or large vector databases.
It’s meant for small-to-medium datasets where meaning matters and infra should stay simple.
Why I Built This
I kept seeing projects where:
- a full search engine was overkill
- a database existed just to store an index
- fuzzy search wasn’t good enough
- semantic search required too much setup
Simile is an attempt to close that gap.
Links
I’m sharing this to get feedback from people building search, developer tooling, or AI-powered UX.
Top comments (0)