DEV Community

Aavash Baral
Aavash Baral

Posted on

I Built an Offline-First Semantic Search Engine in JavaScript

I Built an Offline-First Semantic Search Engine in JavaScript

Search is deceptively hard.

Most JavaScript search libraries stop at keywords or fuzzy matching, and most semantic search solutions assume external APIs, vector databases, or hosted services.

I wanted something different:

  • runs fully locally
  • works in Node.js or the browser
  • understands meaning, not just text
  • doesn’t require standing up new infrastructure

That led me to build Simile Search — an offline-first semantic + fuzzy search engine in JavaScript.


What Simile Does Differently

Simile combines multiple techniques instead of relying on a single scoring method:

🧠 Semantic Search (Local Embeddings)

Uses transformer-based embeddings (via transformers.js) to capture meaning, so queries like:

“phone charger” → “USB-C cable”

work even when there’s no keyword overlap.

No APIs. No Python. No server calls.


⚡ Fast Vector Search with HNSW

To keep semantic search fast, Simile uses HNSW (Hierarchical Navigable Small World) indexing for approximate nearest-neighbor search.

This gives:

  • sub-linear search time
  • predictable performance as the catalog grows
  • practical latency for interactive search

🗜 Vector Quantization

Raw float vectors are memory-heavy. Simile applies vector quantization to reduce memory usage while keeping similarity quality high.

This matters when:

  • running inside Node.js
  • embedding catalogs that aren’t tiny
  • keeping everything in memory

💾 Vector Caching & Persistence

Embedding is the slowest part of semantic search.

Simile avoids repeating work by:

  • caching vectors for previously seen text
  • allowing full snapshot save/load
  • restoring instantly without re-embedding

This makes it viable for real backend services.


🔤 Fuzzy Matching + 🎯 Keyword Boosting

Semantic similarity alone isn’t enough.

Simile blends:

  • fuzzy matching (typos, partial input)
  • exact keyword boosting (precision)
  • normalized scoring so no method dominates unfairly

You can tune the weights depending on your domain.


🔗 Nested Object Search

Instead of flattening data manually, Simile can search directly across nested paths:

metadata.author.firstName
metadata.tags
items[0].name
Enter fullscreen mode Exit fullscreen mode

This makes it practical for real product catalogs and structured data.


Where This Is Actually Useful

Simile works best for:

  • product & inventory catalogs
  • internal tools and dashboards
  • knowledge bases
  • autocomplete / typeahead search
  • privacy-first or offline-capable apps
  • NestJS backends without extra search infrastructure

It’s not trying to replace MeiliSearch, Elastic, or large vector databases.
It’s meant for small-to-medium datasets where meaning matters and infra should stay simple.


Why I Built This

I kept seeing projects where:

  • a full search engine was overkill
  • a database existed just to store an index
  • fuzzy search wasn’t good enough
  • semantic search required too much setup

Simile is an attempt to close that gap.


Links

I’m sharing this to get feedback from people building search, developer tooling, or AI-powered UX.

Top comments (0)