<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aavash Baral</title>
    <description>The latest articles on DEV Community by Aavash Baral (@iaavas).</description>
    <link>https://dev.to/iaavas</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3683667%2Fa1d274a3-d66f-4e56-9b6e-09ff6d1e2004.jpeg</url>
      <title>DEV Community: Aavash Baral</title>
      <link>https://dev.to/iaavas</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/iaavas"/>
    <language>en</language>
    <item>
      <title>I Built an Offline-First Semantic Search Engine in JavaScript</title>
      <dc:creator>Aavash Baral</dc:creator>
      <pubDate>Mon, 29 Dec 2025 06:57:52 +0000</pubDate>
      <link>https://dev.to/iaavas/i-built-an-offline-first-semantic-search-engine-in-javascript-345b</link>
      <guid>https://dev.to/iaavas/i-built-an-offline-first-semantic-search-engine-in-javascript-345b</guid>
      <description>&lt;h1&gt;
  
  
  I Built an Offline-First Semantic Search Engine in JavaScript
&lt;/h1&gt;

&lt;p&gt;Search is deceptively hard.&lt;/p&gt;

&lt;p&gt;Most JavaScript search libraries stop at &lt;strong&gt;keywords or fuzzy matching&lt;/strong&gt;, and most semantic search solutions assume &lt;strong&gt;external APIs, vector databases, or hosted services&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I wanted something different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;runs &lt;strong&gt;fully locally&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;works in &lt;strong&gt;Node.js or the browser&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;understands &lt;strong&gt;meaning&lt;/strong&gt;, not just text&lt;/li&gt;
&lt;li&gt;doesn’t require standing up new infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That led me to build &lt;strong&gt;Simile Search&lt;/strong&gt; — an offline-first &lt;strong&gt;semantic + fuzzy search engine&lt;/strong&gt; in JavaScript.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Simile Does Differently
&lt;/h2&gt;

&lt;p&gt;Simile combines multiple techniques instead of relying on a single scoring method:&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 Semantic Search (Local Embeddings)
&lt;/h3&gt;

&lt;p&gt;Uses transformer-based embeddings (via &lt;code&gt;transformers.js&lt;/code&gt;) to capture &lt;strong&gt;meaning&lt;/strong&gt;, so queries like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“phone charger” → “USB-C cable”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;work even when there’s no keyword overlap.&lt;/p&gt;

&lt;p&gt;No APIs. No Python. No server calls.&lt;/p&gt;




&lt;h3&gt;
  
  
  ⚡ Fast Vector Search with HNSW
&lt;/h3&gt;

&lt;p&gt;To keep semantic search fast, Simile uses &lt;strong&gt;HNSW (Hierarchical Navigable Small World)&lt;/strong&gt; indexing for approximate nearest-neighbor search.&lt;/p&gt;

&lt;p&gt;This gives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sub-linear search time&lt;/li&gt;
&lt;li&gt;predictable performance as the catalog grows&lt;/li&gt;
&lt;li&gt;practical latency for interactive search&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🗜 Vector Quantization
&lt;/h3&gt;

&lt;p&gt;Raw float vectors are memory-heavy. Simile applies &lt;strong&gt;vector quantization&lt;/strong&gt; to reduce memory usage while keeping similarity quality high.&lt;/p&gt;

&lt;p&gt;This matters when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;running inside Node.js&lt;/li&gt;
&lt;li&gt;embedding catalogs that aren’t tiny&lt;/li&gt;
&lt;li&gt;keeping everything in memory&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  💾 Vector Caching &amp;amp; Persistence
&lt;/h3&gt;

&lt;p&gt;Embedding is the slowest part of semantic search.&lt;/p&gt;

&lt;p&gt;Simile avoids repeating work by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;caching vectors for previously seen text&lt;/li&gt;
&lt;li&gt;allowing full snapshot save/load&lt;/li&gt;
&lt;li&gt;restoring instantly without re-embedding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it viable for real backend services.&lt;/p&gt;




&lt;h3&gt;
  
  
  🔤 Fuzzy Matching + 🎯 Keyword Boosting
&lt;/h3&gt;

&lt;p&gt;Semantic similarity alone isn’t enough.&lt;/p&gt;

&lt;p&gt;Simile blends:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;fuzzy matching&lt;/strong&gt; (typos, partial input)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;exact keyword boosting&lt;/strong&gt; (precision)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;normalized scoring&lt;/strong&gt; so no method dominates unfairly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can tune the weights depending on your domain.&lt;/p&gt;




&lt;h3&gt;
  
  
  🔗 Nested Object Search
&lt;/h3&gt;

&lt;p&gt;Instead of flattening data manually, Simile can search directly across nested paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;firstName&lt;/span&gt;
&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;
&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes it practical for real product catalogs and structured data.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Is Actually Useful
&lt;/h2&gt;

&lt;p&gt;Simile works best for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;product &amp;amp; inventory catalogs&lt;/li&gt;
&lt;li&gt;internal tools and dashboards&lt;/li&gt;
&lt;li&gt;knowledge bases&lt;/li&gt;
&lt;li&gt;autocomplete / typeahead search&lt;/li&gt;
&lt;li&gt;privacy-first or offline-capable apps&lt;/li&gt;
&lt;li&gt;NestJS backends without extra search infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s &lt;strong&gt;not&lt;/strong&gt; trying to replace MeiliSearch, Elastic, or large vector databases.&lt;br&gt;
It’s meant for &lt;strong&gt;small-to-medium datasets where meaning matters and infra should stay simple&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;I kept seeing projects where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a full search engine was overkill&lt;/li&gt;
&lt;li&gt;a database existed just to store an index&lt;/li&gt;
&lt;li&gt;fuzzy search wasn’t good enough&lt;/li&gt;
&lt;li&gt;semantic search required too much setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Simile is an attempt to close that gap.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;npm: &lt;a href="https://www.npmjs.com/package/simile-search" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/simile-search&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/iaavas/simile-search" rel="noopener noreferrer"&gt;https://github.com/iaavas/simile-search&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m sharing this to get feedback from people building search, developer tooling, or AI-powered UX.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>javascript</category>
      <category>showdev</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
