<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Siraj Lakhani</title>
    <description>The latest articles on DEV Community by Siraj Lakhani (@siraj_lakhani).</description>
    <link>https://dev.to/siraj_lakhani</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3935426%2F6e03e568-c677-4394-959a-7b310e98b2c3.jpg</url>
      <title>DEV Community: Siraj Lakhani</title>
      <link>https://dev.to/siraj_lakhani</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/siraj_lakhani"/>
    <language>en</language>
    <item>
      <title>Search in the LLM Era: Vector RAG vs Vectorless Search vs LLM-Wiki</title>
      <dc:creator>Siraj Lakhani</dc:creator>
      <pubDate>Fri, 15 May 2026 15:03:47 +0000</pubDate>
      <link>https://dev.to/siraj_lakhani/search-in-the-llm-era-vector-rag-vs-vectorless-search-vs-llm-wiki-34ho</link>
      <guid>https://dev.to/siraj_lakhani/search-in-the-llm-era-vector-rag-vs-vectorless-search-vs-llm-wiki-34ho</guid>
      <description>&lt;p&gt;&lt;em&gt;Why the future of AI retrieval is moving beyond vector databases.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A lot of people think the hardest part of building AI systems is the model.In reality, the hardest part is usually search. Once you connect an LLM to real-world data — engineering docs, APIs, support tickets, Terraform modules, codebases, Slack conversations — the biggest challenge becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How does the AI find the right information at the right time?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the real bottleneck and over the last year, I’ve noticed something interesting:&lt;/p&gt;

&lt;p&gt;The industry is slowly moving beyond the classic “just use vector databases” approach.&lt;/p&gt;

&lt;p&gt;Now we’re seeing three different retrieval styles emerge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Vector RAG&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vectorless retrieval&lt;/strong&gt; (PageIndex + FastMemory)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LLM-Wiki systems&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each solves a different problem and understanding the difference completely changes how you design AI systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Simplest Way to Understand the Difference
&lt;/h3&gt;

&lt;p&gt;Imagine you ask an AI assistant:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Where is the payment retry logic implemented?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each system searches differently.&lt;/p&gt;

&lt;h4&gt;
  
  
  Vector RAG
&lt;/h4&gt;

&lt;p&gt;Searches by &lt;strong&gt;meaning&lt;/strong&gt;. It tries to find text that &lt;em&gt;sounds semantically similar&lt;/em&gt; to your question. Like asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Find me anything related to payment retries.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  PageIndex
&lt;/h4&gt;

&lt;p&gt;Searches by &lt;strong&gt;structure&lt;/strong&gt;. It navigates documents like folders, sections, and hierarchies. Like using a table of contents.&lt;/p&gt;

&lt;h4&gt;
  
  
  FastMemory
&lt;/h4&gt;

&lt;p&gt;Searches by &lt;strong&gt;relationships and memory&lt;/strong&gt;. It remembers how systems connect together. Like asking a senior engineer who already knows the architecture.&lt;/p&gt;

&lt;h4&gt;
  
  
  LLM-Wiki
&lt;/h4&gt;

&lt;p&gt;Searches through &lt;strong&gt;organized knowledge&lt;/strong&gt;. Instead of searching raw documents repeatedly, the AI builds its own evolving wiki. Like having an internal Wikipedia for your company.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Vector RAG — Search by Meaning
&lt;/h3&gt;

&lt;p&gt;Vector RAG is still the most common retrieval architecture today. The idea is straightforward: Convert documents into embeddings (vectors), then search for similar vectors. The Flow&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Documents
   ↓
Chunking
   ↓
Embeddings
   ↓
Vector Database
   ↓
Similarity Search
   ↓
LLM Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Why People Love It
&lt;/h4&gt;

&lt;p&gt;Vector search is incredibly good at understanding intent. For example, if someone asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“How does payment retry work?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It may still retrieve a document saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Transaction recovery mechanism handles failed gateway calls.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Even though the wording is different. That semantic flexibility is what made RAG explode in popularity.&lt;/p&gt;

&lt;h4&gt;
  
  
  But There’s a Catch
&lt;/h4&gt;

&lt;p&gt;As systems grow, vector search starts creating operational pain. Especially in enterprise environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common issues:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Embedding pipelines become expensive&lt;/li&gt;
&lt;li&gt;Re-indexing takes time&lt;/li&gt;
&lt;li&gt;Large vector databases cost a lot&lt;/li&gt;
&lt;li&gt;Retrieval becomes harder to debug&lt;/li&gt;
&lt;li&gt;Results can be “similar but wrong”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last issue matters more than people realize. Because in production systems:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Almost correct” can still break things.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2. Vectorless Retrieval — Search by Structure and Memory
&lt;/h3&gt;

&lt;p&gt;This is where things get interesting. A newer category of systems is emerging that avoids depending heavily on embeddings.&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“What sounds similar?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These systems ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“How is the information organized?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“How are concepts connected?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two approaches I’ve been experimenting with recently are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PageIndex&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;FastMemory&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  PageIndex — Search Like a File System
&lt;/h3&gt;

&lt;p&gt;PageIndex focuses on structure. Instead of embedding everything, it indexes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sections&lt;/li&gt;
&lt;li&gt;pages&lt;/li&gt;
&lt;li&gt;headings&lt;/li&gt;
&lt;li&gt;file hierarchy&lt;/li&gt;
&lt;li&gt;relationships&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like navigating documentation manually.&lt;/p&gt;

&lt;p&gt;Example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Repository
 ├── Payments
 │ ├── RetryService
 │ ├── QueueWorker
 │ └── GatewayClient
 │
 ├── Terraform
 └── Runbooks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when someone asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Which Terraform file creates the production VPC?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The system jumps directly to the correct section. No embedding similarity needed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why This Feels Different
&lt;/h4&gt;

&lt;p&gt;Vector search feels probabilistic. PageIndex feels deterministic. You can trace exactly &lt;em&gt;why&lt;/em&gt; the result appeared. That’s incredibly valuable for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;infrastructure systems&lt;/li&gt;
&lt;li&gt;compliance&lt;/li&gt;
&lt;li&gt;security operations&lt;/li&gt;
&lt;li&gt;engineering copilots&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The Tradeoff
&lt;/h4&gt;

&lt;p&gt;The downside is simple:&lt;/p&gt;

&lt;p&gt;PageIndex works best when your data is already well organized. Messy, noisy, unstructured content is harder.&lt;/p&gt;

&lt;h3&gt;
  
  
  FastMemory — Search Like Human Memory
&lt;/h3&gt;

&lt;p&gt;This was probably the most interesting shift for me.&lt;/p&gt;

&lt;p&gt;Instead of searching documents repeatedly, FastMemory tries to preserve relationships between concepts.&lt;/p&gt;

&lt;p&gt;The mental model is closer to human memory than database search.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Payment Service
   ├── Retry Logic
   ├── Gateway Failure
   ├── Fraud Check
   └── Queue System
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of retrieving isolated chunks, the system remembers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what connects together&lt;/li&gt;
&lt;li&gt;what was recently used&lt;/li&gt;
&lt;li&gt;which concepts are related&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Why This Matters for AI Agents
&lt;/h4&gt;

&lt;p&gt;Traditional RAG works well for single questions. But AI agents are different. Agents need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;continuity&lt;/li&gt;
&lt;li&gt;memory&lt;/li&gt;
&lt;li&gt;workflow context&lt;/li&gt;
&lt;li&gt;long-running state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where FastMemory becomes powerful. It feels less like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Search everything again.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And more like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Remember how this system works.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3. LLM-Wiki — Search Through Compiled Knowledge
&lt;/h3&gt;

&lt;p&gt;This approach became more popular after ideas shared by Andrej Karpathy around building AI-generated knowledge systems.&lt;/p&gt;

&lt;p&gt;The core idea is simple: Instead of repeatedly retrieving raw documents…&lt;/p&gt;

&lt;p&gt;The AI continuously builds a structured wiki.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Flow
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Raw Documents
      ↓
LLM Processes Information
      ↓
Creates Wiki Pages
      ↓
Links Related Concepts
      ↓
Search the Wiki
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why This Is Different
&lt;/h3&gt;

&lt;p&gt;This shifts retrieval from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Search at query time”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Organize knowledge ahead of time”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a very different philosophy.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;

&lt;p&gt;Instead of repeatedly searching payment logs, the AI creates knowledge pages like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Payment System&lt;/li&gt;
&lt;li&gt;Retry Logic&lt;/li&gt;
&lt;li&gt;Failure Handling&lt;/li&gt;
&lt;li&gt;Gateway Recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over time, the knowledge base becomes richer and more connected. Almost like a living internal encyclopedia.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Big Advantage
&lt;/h4&gt;

&lt;p&gt;LLM-Wiki systems become smarter over time because the information becomes increasingly organized. Not just retrieved. That distinction is subtle, but important.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Challenge
&lt;/h4&gt;

&lt;p&gt;The downside is maintenance. Knowledge systems can drift. Pages become stale. Bad ingestion can propagate mistakes. And synchronization becomes hard at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6iesjkyzpr3cunxsri2r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6iesjkyzpr3cunxsri2r.png" width="800" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  So… Which One Wins?
&lt;/h3&gt;

&lt;p&gt;Honestly? None of them. Because they solve different problems.&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Vector RAG when:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;your data is messy&lt;/li&gt;
&lt;li&gt;wording varies heavily&lt;/li&gt;
&lt;li&gt;semantic search matters most&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Use PageIndex when:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;your systems are structured&lt;/li&gt;
&lt;li&gt;exact answers matter&lt;/li&gt;
&lt;li&gt;traceability is important&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Use FastMemory when:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;building AI agents&lt;/li&gt;
&lt;li&gt;maintaining long workflows&lt;/li&gt;
&lt;li&gt;preserving contextual memory&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Use LLM-Wiki when:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;building research systems&lt;/li&gt;
&lt;li&gt;organizing long-term knowledge&lt;/li&gt;
&lt;li&gt;creating internal AI knowledge platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Real Future Is Hybrid
&lt;/h3&gt;

&lt;p&gt;The most interesting systems I’m seeing now combine all of them together.&lt;/p&gt;

&lt;p&gt;Something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Semantic Search (Vector RAG)
        +
Structure (PageIndex)
        +
Memory (FastMemory)
        +
Knowledge (LLM-Wiki)
        =
Modern AI Retrieval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And honestly, that hybrid approach makes sense. Because real-world systems don’t operate in a single mode. Sometimes you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic understanding&lt;/li&gt;
&lt;li&gt;exact technical retrieval&lt;/li&gt;
&lt;li&gt;memory continuity&lt;/li&gt;
&lt;li&gt;long-term knowledge organization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Thought
&lt;/h3&gt;

&lt;p&gt;We’re entering a new phase of AI infrastructure. The conversation is shifting from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Which model should we use?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“How should the system think about information?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that shift is much bigger than most people realize.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpp0pz2c4axxg2tycmrf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpp0pz2c4axxg2tycmrf.png" alt="Vector vs Vectorless RAG" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>database</category>
      <category>llm</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
