<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yichuan Wang</title>
    <description>The latest articles on DEV Community by Yichuan Wang (@yichuan_wang_fcf06c22a529).</description>
    <link>https://dev.to/yichuan_wang_fcf06c22a529</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3439701%2Fd7916cf2-c54f-45a7-99a7-770c443f1ec3.jpg</url>
      <title>DEV Community: Yichuan Wang</title>
      <link>https://dev.to/yichuan_wang_fcf06c22a529</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yichuan_wang_fcf06c22a529"/>
    <language>en</language>
    <item>
      <title>LEANN: The World's Most Lightweight Semantic Search Backend for RAG Everything 🎉</title>
      <dc:creator>Yichuan Wang</dc:creator>
      <pubDate>Sun, 17 Aug 2025 00:47:15 +0000</pubDate>
      <link>https://dev.to/yichuan_wang_fcf06c22a529/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-57l9</link>
      <guid>https://dev.to/yichuan_wang_fcf06c22a529/leann-the-worlds-most-lightweight-semantic-search-backend-for-rag-everything-57l9</guid>
      <description>&lt;p&gt;&lt;em&gt;Introducing our team's latest creation - a revolutionary approach to local RAG applications&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: We built LEANN, the world's most "lightweight" semantic search backend that achieves &lt;strong&gt;97% storage savings&lt;/strong&gt; compared to traditional solutions while maintaining high accuracy and performance. Perfect for privacy-focused RAG applications on your local machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 Quick Start
&lt;/h2&gt;

&lt;p&gt;Want to try it right now? Run this single command on your MacBook:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv pip &lt;span class="nb"&gt;install &lt;/span&gt;leann
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  📚 Repository &amp;amp; Paper
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/yichuan-w/LEANN" rel="noopener noreferrer"&gt;https://github.com/yichuan-w/LEANN&lt;/a&gt; ⭐ (Star us!)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paper&lt;/strong&gt;: Available on arXiv&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc0szyfj3ha75e0qf8gpu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc0szyfj3ha75e0qf8gpu.png" alt=" " width="800" height="537"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is RAG Everything?
&lt;/h2&gt;

&lt;p&gt;RAG (Retrieval-Augmented Generation) has become the first true "killer application" of the LLM era. It seamlessly integrates private data that wasn't part of the training set into large model inference pipelines.&lt;/p&gt;

&lt;p&gt;Privacy scenarios are absolutely the most important deployment direction - especially for your personal data and in highly sensitive domains like healthcare and finance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG Everything&lt;/strong&gt; starts from the most essential needs of personal laptops. We natively support a bunch of out-of-the-box scenarios (currently supporting macOS and Linux, Windows users need WSL):&lt;/p&gt;

&lt;h3&gt;
  
  
  🔍 Supported Applications
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;File System RAG&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Replace Spotlight Search entirely. Spotlight not only consumes disk space but only does keyword matching. We transform it into a semantic search powerhouse.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Apple Mail RAG&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Easily find answers to personal questions (like "How many courses should Berkeley EECS freshmen take in their first semester?").&lt;/p&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Google Browser History RAG&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Track down those vague search records you suddenly forgot - the ones you only have a fuzzy impression of.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;WeChat Chat History RAG&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;This is what I use most!&lt;/em&gt; I've used LEANN to summarize conversations with friends and extract research ideas + slides. We implemented a small hack to bypass WeChat's encrypted database and extract chat records - don't worry, everything stays local with zero leakage.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. &lt;strong&gt;Claude Code Semantic Search Enhancement&lt;/strong&gt; 🔥
&lt;/h4&gt;

&lt;p&gt;One of Claude Code's biggest pain points is that it's always grepping and finding nothing. LEANN is one of the first open-source projects to bring true semantic search to Claude Code through an MCP server - enabling it with just one line of code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03zqn7k7cxfv7tylgo1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03zqn7k7cxfv7tylgo1r.png" alt=" " width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;These are just the scenarios we think have the most "potential" - we'll continuously integrate more features based on user feedback until it becomes a personalized local Agent that remembers your LLM memory and masters all your private data.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why LEANN? The Technical Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem with Current Vector Databases
&lt;/h3&gt;

&lt;p&gt;Current mainstream vector databases excel in &lt;strong&gt;latency&lt;/strong&gt; - most queries complete within 10ms-100ms even with millions of data points. In RAG's search + generation pipeline, search time is "far below" generation time, especially with reasoning models and long chain-of-thought processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latency isn't the bottleneck in RAG - storage is.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most important RAG deployment scenario is &lt;strong&gt;privacy&lt;/strong&gt;, especially on personal computers where resources are naturally scarce. Consider this reality check:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For high recall in text RAG, you need fine chunk sizes → embedding storage becomes &lt;strong&gt;3-10x the original text size&lt;/strong&gt; → Real example: 70GB raw data → 220GB+ index storage&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Our Solution: Trade Storage for Compute
&lt;/h3&gt;

&lt;p&gt;LEANN makes a bold design choice: &lt;strong&gt;replace storage with recomputation&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Innovation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Key Observation&lt;/strong&gt;: In graph-based indices, a query actually accesses very few nodes → Why store all embeddings?&lt;/p&gt;

&lt;p&gt;Our pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build&lt;/strong&gt; a normal vector store&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delete&lt;/strong&gt; all embeddings, keeping only the Proximity Graph to record relationships between data chunks
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convert&lt;/strong&gt; memory loading to recomputation during inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverage&lt;/strong&gt; lightweight embedding models for efficient graph-based recomputation&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Graph Structure Pruning
&lt;/h3&gt;

&lt;p&gt;We observed significant visit skewness patterns in post-RNG graphs. Our strategy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Keep high-degree nodes&lt;/strong&gt; to ensure connectivity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limit out-edges&lt;/strong&gt; for low-degree nodes while allowing unlimited in-edges&lt;/li&gt;
&lt;li&gt;Use heuristics to preserve only essential high-degree nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Results That Matter
&lt;/h3&gt;

&lt;p&gt;✅ &lt;strong&gt;97%+ reduction&lt;/strong&gt; in index size&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;&amp;lt;2 seconds&lt;/strong&gt; retrieval time on 3090-level hardware&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;90%+ Top-3 recall&lt;/strong&gt; on real RAG benchmarks&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Zero vector storage&lt;/strong&gt; - all in 200GB+ embedding spaces&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Under this high compression rate, PQ, OPQ, and even state-of-the-art RaBitQ cannot guarantee high accuracy - proven in our paper.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Performance Optimizations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive pipeline&lt;/strong&gt; combining coarse-grained and accurate search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient GPU batching&lt;/strong&gt; for better utilization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZMQ communication&lt;/strong&gt; using distances instead of embeddings&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CPU/GPU overlapping&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selective caching&lt;/strong&gt; of high-degree nodes&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The Vision: RAG Everything
&lt;/h2&gt;

&lt;p&gt;We're continuously maintaining this open-source project at &lt;strong&gt;Berkeley SkyLab&lt;/strong&gt; with full-stack optimization across algorithms, applications, system design, vector databases, and kernel acceleration.&lt;/p&gt;
&lt;h3&gt;
  
  
  Our Goals
&lt;/h3&gt;

&lt;p&gt;🎯 &lt;strong&gt;Seamlessly connect&lt;/strong&gt; all your private data&lt;br&gt;&lt;br&gt;
🧠 &lt;strong&gt;Build long-term&lt;/strong&gt; local AI memory and agents&lt;br&gt;&lt;br&gt;
💻 &lt;strong&gt;Zero cloud dependency&lt;/strong&gt;, low-cost operation  &lt;/p&gt;
&lt;h2&gt;
  
  
  Technical Details &amp;amp; Future Work
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;If you want to dive deeper into implementation details, check our arXiv paper and repository. I can write a follow-up post covering all implementation specifics if there's interest.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We hope LEANN inspires more vector search researchers to think about vector databases from a different angle, especially in popular RAG settings. We were fortunate to discuss our work at SIGMOD/ICML vector search workshops this year and received great recognition from the community.&lt;/p&gt;


&lt;h2&gt;
  
  
  Get Involved
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;⭐ &lt;strong&gt;Star&lt;/strong&gt; our repository&lt;/li&gt;
&lt;li&gt;🤝 &lt;strong&gt;Contribute&lt;/strong&gt; to the project
&lt;/li&gt;
&lt;li&gt;🔗 &lt;strong&gt;Join&lt;/strong&gt; our Berkeley SkyLab team&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ready to transform your local machine into a RAG powerhouse?&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv pip &lt;span class="nb"&gt;install &lt;/span&gt;leann
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;em&gt;What private data would you want to RAG first? Drop a comment below! 👇&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Tags
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;#rag&lt;/code&gt; &lt;code&gt;#vectordatabase&lt;/code&gt; &lt;code&gt;#semanticsearch&lt;/code&gt; &lt;code&gt;#privacy&lt;/code&gt; &lt;code&gt;#opensource&lt;/code&gt; &lt;code&gt;#machinelearning&lt;/code&gt; &lt;code&gt;#ai&lt;/code&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
