<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aakash T M</title>
    <description>The latest articles on DEV Community by Aakash T M (@thealpha93).</description>
    <link>https://dev.to/thealpha93</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F355447%2Fcfe3705a-0167-4e41-87cd-9087a4e11397.png</url>
      <title>DEV Community: Aakash T M</title>
      <link>https://dev.to/thealpha93</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thealpha93"/>
    <language>en</language>
    <item>
      <title>I built a vector search library in Rust/WASM. Here's what I learned about performance, browser limits, and building in public with AI</title>
      <dc:creator>Aakash T M</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:51:29 +0000</pubDate>
      <link>https://dev.to/thealpha93/i-built-a-vector-search-library-in-rustwasm-heres-what-i-learned-about-performance-browser-172c</link>
      <guid>https://dev.to/thealpha93/i-built-a-vector-search-library-in-rustwasm-heres-what-i-learned-about-performance-browser-172c</guid>
      <description>&lt;p&gt;I wanted to build a privacy-first RAG app. The kind where your documents never leave the browser. It means no API keys, no server, no third-party vector database watching what you search for.&lt;/p&gt;

&lt;p&gt;The architecture was obvious: embed documents client-side with something like &lt;code&gt;Transformers.js&lt;/code&gt;, store the vectors locally, and search them with cosine similarity. Simple enough. Except the "search them" part fell apart at about 5,000 vectors.&lt;/p&gt;

&lt;p&gt;Pure JavaScript vector search has a ceiling, and it's lower than you'd think.&lt;/p&gt;

&lt;p&gt;The math itself isn't that complicated, cosine similarity is just a dot product divided by two norms. But when you're doing it across 10,000 vectors, each with 1,536 dimensions (standard for OpenAI embeddings), you're running 15 million floating-point multiplications per query. JavaScript's garbage collector doesn't care that you're in a hot loop. It will pause when it wants to.&lt;/p&gt;

&lt;p&gt;I benchmarked every existing client-side library I could find. The results were consistent:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Runtime&lt;/th&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;client-vector-search&lt;/td&gt;
&lt;td&gt;Pure JS&lt;/td&gt;
&lt;td&gt;Brute-force&lt;/td&gt;
&lt;td&gt;~37ms at 10k&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MeMemo&lt;/td&gt;
&lt;td&gt;Pure JS&lt;/td&gt;
&lt;td&gt;HNSW&lt;/td&gt;
&lt;td&gt;~17ms at 10k, but minutes to build the index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vectra&lt;/td&gt;
&lt;td&gt;Pure JS&lt;/td&gt;
&lt;td&gt;Brute-force&lt;/td&gt;
&lt;td&gt;Node.js only, requires OpenAI API key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VecLite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Rust/WASM + SIMD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Flat index&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~8ms at 10k&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I benchmarked them all. At 10,000 vectors with 1,536 dimensions, client-vector-search took 37ms per query. MeMemo's HNSW was faster at 17ms. But building the index took minutes, and it's still twice as slow as VecLite's 8ms. Beyond 10k, pure JS starts to hurt. At 100k vectors, we're talking 1.5+ seconds per search.&lt;/p&gt;

&lt;p&gt;The gap in the ecosystem was clear: nothing existed between "toy library that caps at 1k vectors" and "production vector database that requires a server." I needed something that worked at 100k vectors, entirely in the browser, with sub-second latency.&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://github.com/thealpha93/VecLite" rel="noopener noreferrer"&gt;VecLite&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The architecture decision
&lt;/h2&gt;

&lt;p&gt;The first question was whether to stay in JavaScript or reach for something lower-level. I chose Rust compiled to WebAssembly, and every subsequent decision cascaded from that one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Rust/WASM
&lt;/h3&gt;

&lt;p&gt;Rust/WASM gives you three things JavaScript can't:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No garbage collection.&lt;/strong&gt; WASM linear memory is a flat buffer. No GC pauses, no surprise latency spikes during search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SIMD instructions.&lt;/strong&gt; WebAssembly SIMD (&lt;code&gt;simd128&lt;/code&gt;) lets you process four &lt;code&gt;f32&lt;/code&gt; values per cycle. At 1,536 dimensions, that's 384 SIMD iterations instead of 1,536 scalar ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictable memory layout.&lt;/strong&gt; Contiguous &lt;code&gt;f32&lt;/code&gt; arrays mean cache-friendly access patterns, which matters enormously when you're iterating over 100k vectors.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This isn't speculative. PGlite, DuckDB-WASM, and Figma all use the same Rust/WASM architecture for the same reasons. The toolchain is mature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqdhcnob6n9i273g0hx1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqdhcnob6n9i273g0hx1.png" alt="VecLite three-layer architecture: TypeScript API layer handles validation and persistence, WASM boundary enforces batching and Float32Array transfer, Rust core does pure computation with SIMD" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The WASM boundary rules
&lt;/h3&gt;

&lt;p&gt;The architecture has three strict layers - TypeScript, the WASM boundary, and Rust. And the boundary rules matter more than the algorithm:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Always batch.&lt;/strong&gt; Never call WASM in a loop. One crossing per operation, no matter how many vectors you're upserting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pass vectors as flat Float32Array.&lt;/strong&gt; Not nested JS arrays. This enables zero-copy transfer into WASM linear memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serialize metadata to JSON before crossing.&lt;/strong&gt; Rust parses it on the other side.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate everything in TypeScript.&lt;/strong&gt; Rust should never receive malformed input ever, no NaN, no Infinity, no wrong dimensions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In practice, the &lt;code&gt;upsert&lt;/code&gt; path looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// TypeScript validates, flattens, serialises and then ONE crossing&lt;/span&gt;
&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nf"&gt;flattenVectors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   &lt;span class="c1"&gt;// → contiguous Float32Array&lt;/span&gt;
  &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;metas&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every WASM crossing has overhead. The difference between calling WASM once with 10,000 vectors and calling it 10,000 times with one vector is the difference between 40ms and 4 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  f32 vs f64: the decision that halved memory
&lt;/h3&gt;

&lt;p&gt;All vectors are stored and computed as &lt;code&gt;f32&lt;/code&gt;, never &lt;code&gt;f64&lt;/code&gt;. This was a deliberate choice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI, Cohere, and most embedding models output &lt;code&gt;f32&lt;/code&gt; precision. There's no meaningful accuracy gain from &lt;code&gt;f64&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;At 100k vectors × 1,536 dimensions, &lt;code&gt;f32&lt;/code&gt; uses ~600MB. &lt;code&gt;f64&lt;/code&gt; would be 1.2GB At that scale, your browser crashes.&lt;/li&gt;
&lt;li&gt;WASM SIMD intrinsics process four &lt;code&gt;f32&lt;/code&gt; values per instruction. With &lt;code&gt;f64&lt;/code&gt; you'd get two.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The storage adapter pattern: keeping it dumb
&lt;/h3&gt;

&lt;p&gt;Storage adapters in VecLite are deliberately simple / dumb, a four-method key/value interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;StorageAdapter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The adapter has zero knowledge of vectors, metadata, or search. VecLite owns all serialisation. This means community adapters for localStorage, React Native AsyncStorage, or SQLite are 10–20 lines of code.&lt;/p&gt;

&lt;p&gt;The temptation was to make the storage layer "smart" by allowing it do partial loads, or native querying. I resisted. A thin interface is easy to implement, easy to test, and easy to swap. The complexity belongs in one place (VecLite), not spread across every adapter.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest benchmarks
&lt;/h2&gt;

&lt;p&gt;I ran benchmarks with Vitest against a pure-JS Float32Array implementation, the fairest possible comparison. Both use &lt;code&gt;Float32Array&lt;/code&gt;, both use the same algorithm (brute-force cosine similarity), the only difference is JS vs WASM+SIMD.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I expected and what happened in real
&lt;/h3&gt;

&lt;p&gt;I expected around 10–20× speedup based on Rust/WASM benchmarks blindly. I got ~4×.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dataset&lt;/th&gt;
&lt;th&gt;VecLite (Rust/WASM)&lt;/th&gt;
&lt;th&gt;Pure JS&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10k vectors, dim=1536&lt;/td&gt;
&lt;td&gt;40ms&lt;/td&gt;
&lt;td&gt;152ms&lt;/td&gt;
&lt;td&gt;3.8×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50k vectors, dim=1536&lt;/td&gt;
&lt;td&gt;200ms&lt;/td&gt;
&lt;td&gt;778ms&lt;/td&gt;
&lt;td&gt;3.9×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100k vectors, dim=1536&lt;/td&gt;
&lt;td&gt;400ms&lt;/td&gt;
&lt;td&gt;1,576ms&lt;/td&gt;
&lt;td&gt;3.9×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Why not 20×? Because V8 is genuinely good at optimising tight &lt;code&gt;Float32Array&lt;/code&gt; loops. The JS baseline isn't naive, it's a well-optimised hot loop. The WASM advantage comes from SIMD parallelism and no GC pauses, not from some fundamental inefficiency in JavaScript. 4× is real, repeatable, and honest.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why HNSW lost to flat index at dim=1536
&lt;/h3&gt;

&lt;p&gt;This was the most counterintuitive result. HNSW (Hierarchical Navigable Small World) is the gold standard for approximate nearest neighbour search. Every production vector database uses it. So I implemented it in v0.3.&lt;/p&gt;

&lt;p&gt;It was slower than brute force. At every scale.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scale&lt;/th&gt;
&lt;th&gt;Flat (exact)&lt;/th&gt;
&lt;th&gt;HNSW ef=200&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1k vectors&lt;/td&gt;
&lt;td&gt;0.83ms&lt;/td&gt;
&lt;td&gt;0.95ms&lt;/td&gt;
&lt;td&gt;Flat, 1.14× faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5k vectors&lt;/td&gt;
&lt;td&gt;4.1ms&lt;/td&gt;
&lt;td&gt;4.4ms&lt;/td&gt;
&lt;td&gt;Flat, 1.08× faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10k vectors&lt;/td&gt;
&lt;td&gt;8.2ms&lt;/td&gt;
&lt;td&gt;8.8ms&lt;/td&gt;
&lt;td&gt;Flat, 1.07× faster&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The reason: at 1,536 dimensions, each hop in the HNSW graph traverses a massive vector space. The "neighbourhood structure" that makes HNSW efficient at low dimensions (where nearby points are geometrically clustered) becomes noise at high dimensions. Graph traversal overhead exceeds the savings from reducing the candidate set.&lt;/p&gt;

&lt;p&gt;HNSW would probably win (untested) at dim=128 with 500k+ vectors. But at the dimensions real embedding models use (512–3072 MB), brute-force SIMD is faster at every practical browser scale.&lt;/p&gt;

&lt;p&gt;I kept HNSW in the library for users with specific use cases, but the flat index is the default and recommended path.&lt;/p&gt;

&lt;h3&gt;
  
  
  The filter pre-computation result that surprised me
&lt;/h3&gt;

&lt;p&gt;Pre-filtering (applying metadata filters &lt;em&gt;before&lt;/em&gt; scoring) produced dramatic speedups:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Filter&lt;/th&gt;
&lt;th&gt;Mean latency&lt;/th&gt;
&lt;th&gt;vs unfiltered&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;$gte&lt;/code&gt; (~50% selectivity)&lt;/td&gt;
&lt;td&gt;10ms&lt;/td&gt;
&lt;td&gt;3.9× faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;$in&lt;/code&gt; (~25% selectivity)&lt;/td&gt;
&lt;td&gt;3ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12× faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At 25% selectivity, you're only computing cosine similarity on a quarter of the vectors. The filter itself is cheap (string/number comparison), so selective filters genuinely eliminate compute rather than just hiding it.&lt;/p&gt;

&lt;h3&gt;
  
  
  P99 spikes
&lt;/h3&gt;

&lt;p&gt;The honest part: p99 latency spikes exist. WASM initialisation has a cold-start cost (mitigated by caching). And at 100k vectors, serialising the full index to JSON for persistence is the real bottleneck — not search.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building with AI
&lt;/h2&gt;

&lt;p&gt;VecLite was built with Claude Code, and one file changed everything: &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How CLAUDE.md changed my workflow
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; is a 257-line file at the root of the repo. It contains the architecture, every locked decision, the API surface, WASM boundary rules, security constraints, and what's deliberately deferred.&lt;/p&gt;

&lt;p&gt;The effect was immediate. With &lt;code&gt;CLAUDE.md&lt;/code&gt; in context, Claude Code stopped suggesting things I'd already rejectedm no proposals to inline WASM as base64, no attempts to break the three-layer architecture, no re-inventing HNSW from scratch when the flat index was the right choice.&lt;/p&gt;

&lt;p&gt;It turned the AI from a "clever junior who hasn't read the docs" into a collaborator who understood the project's constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code is good at vs where it needs guidance
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Where it exceled:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boilerplate with consistent patterns like adapter implementations, test scaffolding, TypeScript types&lt;/li&gt;
&lt;li&gt;WASM-bindgen ceremony, a.k.a the glue code between JS and Rust&lt;/li&gt;
&lt;li&gt;Test generation: given a function and edge cases, it writes thorough test suites&lt;/li&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Where it needs guidance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rust SIMD intrinsics: the &lt;code&gt;core::arch::wasm32&lt;/code&gt; f32x4 path took iteration. AI-generated SIMD code often compiles but has subtle correctness issues (wrong horizontal sum, missing scalar tail)&lt;/li&gt;
&lt;li&gt;Crate selection for WASM targets: It suggested crates that depend on &lt;code&gt;rayon&lt;/code&gt;, which is incompatible with &lt;code&gt;wasm32-unknown-unknown&lt;/code&gt;. I had to manually verify WASM compatibility for every Rust dependency&lt;/li&gt;
&lt;li&gt;Performance-critical architecture decisions: The &lt;code&gt;u32&lt;/code&gt; bit-pattern trick for HNSW distance metrics (reinterpreting f32 as u32 for the graph's comparison function) required domain expertise&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;veclite/rag:&lt;/strong&gt; A batteries-included RAG pipeline as a sub-path export. Bring a document, get semantic search. Chunking, local embeddings via &lt;code&gt;Transformers.js&lt;/code&gt;, and VecLite search under the hood. Zero config. No API keys. No data leaves the device.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunked persistence:&lt;/strong&gt; The current JSON blob works at 50k vectors but won't scale forever&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Worker support:&lt;/strong&gt; Embedding already runs off-thread, but the core search path could benefit too&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;React Native / Node.js:&lt;/strong&gt; The architecture supports it, the plumbing doesn't exist yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community adapters:&lt;/strong&gt; SQLite, AsyncStorage, localStorage adapters following the 4-method interface&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Try it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;veclite
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/thealpha93/VecLite" rel="noopener noreferrer"&gt;github.com/thealpha93/VecLite&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're building something with client-side vector search, I'd love to hear about it. Open an issue, submit a PR, or just star the repo so I know people care. This is being built in public, your feedback shapes the roadmap.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>webassembly</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
