Aavash Baral

Posted on Dec 29, 2025

I Built an Offline-First Semantic Search Engine in JavaScript

#machinelearning #javascript #showdev #ai

I Built an Offline-First Semantic Search Engine in JavaScript

Search is deceptively hard.

Most JavaScript search libraries stop at keywords or fuzzy matching, and most semantic search solutions assume external APIs, vector databases, or hosted services.

I wanted something different:

runs fully locally
works in Node.js or the browser
understands meaning, not just text
doesn’t require standing up new infrastructure

That led me to build Simile Search — an offline-first semantic + fuzzy search engine in JavaScript.

What Simile Does Differently

Simile combines multiple techniques instead of relying on a single scoring method:

🧠 Semantic Search (Local Embeddings)

Uses transformer-based embeddings (via transformers.js) to capture meaning, so queries like:

“phone charger” → “USB-C cable”

work even when there’s no keyword overlap.

No APIs. No Python. No server calls.

⚡ Fast Vector Search with HNSW

To keep semantic search fast, Simile uses HNSW (Hierarchical Navigable Small World) indexing for approximate nearest-neighbor search.

This gives:

sub-linear search time
predictable performance as the catalog grows
practical latency for interactive search

🗜 Vector Quantization

Raw float vectors are memory-heavy. Simile applies vector quantization to reduce memory usage while keeping similarity quality high.

This matters when:

running inside Node.js
embedding catalogs that aren’t tiny
keeping everything in memory

💾 Vector Caching & Persistence

Embedding is the slowest part of semantic search.

Simile avoids repeating work by:

caching vectors for previously seen text
allowing full snapshot save/load
restoring instantly without re-embedding

This makes it viable for real backend services.

🔤 Fuzzy Matching + 🎯 Keyword Boosting

Semantic similarity alone isn’t enough.

Simile blends:

fuzzy matching (typos, partial input)
exact keyword boosting (precision)
normalized scoring so no method dominates unfairly

You can tune the weights depending on your domain.

🔗 Nested Object Search

Instead of flattening data manually, Simile can search directly across nested paths:

metadata.author.firstName
metadata.tags
items[0].name

This makes it practical for real product catalogs and structured data.

Where This Is Actually Useful

Simile works best for:

product & inventory catalogs
internal tools and dashboards
knowledge bases
autocomplete / typeahead search
privacy-first or offline-capable apps
NestJS backends without extra search infrastructure

It’s not trying to replace MeiliSearch, Elastic, or large vector databases.
It’s meant for small-to-medium datasets where meaning matters and infra should stay simple.

Why I Built This

I kept seeing projects where:

a full search engine was overkill
a database existed just to store an index
fuzzy search wasn’t good enough
semantic search required too much setup

Simile is an attempt to close that gap.

DEV Community

I Built an Offline-First Semantic Search Engine in JavaScript

I Built an Offline-First Semantic Search Engine in JavaScript

What Simile Does Differently

🧠 Semantic Search (Local Embeddings)

⚡ Fast Vector Search with HNSW

🗜 Vector Quantization

💾 Vector Caching & Persistence

🔤 Fuzzy Matching + 🎯 Keyword Boosting

🔗 Nested Object Search

Where This Is Actually Useful

Why I Built This

Links

Top comments (0)