Siraj Lakhani

Posted on May 16 • Originally published at Medium on May 15

Search in the LLM Era: Vector RAG vs Vectorless Search vs LLM-Wiki

#ai #database #llm #rag

Why the future of AI retrieval is moving beyond vector databases.

A lot of people think the hardest part of building AI systems is the model.In reality, the hardest part is usually search. Once you connect an LLM to real-world data — engineering docs, APIs, support tickets, Terraform modules, codebases, Slack conversations — the biggest challenge becomes:

How does the AI find the right information at the right time?

That’s the real bottleneck and over the last year, I’ve noticed something interesting:

The industry is slowly moving beyond the classic “just use vector databases” approach.

Now we’re seeing three different retrieval styles emerge:

Vector RAG
Vectorless retrieval (PageIndex + FastMemory)
LLM-Wiki systems

Each solves a different problem and understanding the difference completely changes how you design AI systems.

The Simplest Way to Understand the Difference

Imagine you ask an AI assistant:

“Where is the payment retry logic implemented?”

Each system searches differently.

Vector RAG

Searches by meaning. It tries to find text that sounds semantically similar to your question. Like asking:

“Find me anything related to payment retries.”

PageIndex

Searches by structure. It navigates documents like folders, sections, and hierarchies. Like using a table of contents.

FastMemory

Searches by relationships and memory. It remembers how systems connect together. Like asking a senior engineer who already knows the architecture.

LLM-Wiki

Searches through organized knowledge. Instead of searching raw documents repeatedly, the AI builds its own evolving wiki. Like having an internal Wikipedia for your company.

1. Vector RAG — Search by Meaning

Vector RAG is still the most common retrieval architecture today. The idea is straightforward: Convert documents into embeddings (vectors), then search for similar vectors. The Flow

Documents
   ↓
Chunking
   ↓
Embeddings
   ↓
Vector Database
   ↓
Similarity Search
   ↓
LLM Response

Why People Love It

Vector search is incredibly good at understanding intent. For example, if someone asks:

“How does payment retry work?”

It may still retrieve a document saying:

“Transaction recovery mechanism handles failed gateway calls.”

Even though the wording is different. That semantic flexibility is what made RAG explode in popularity.

But There’s a Catch

As systems grow, vector search starts creating operational pain. Especially in enterprise environments.

Common issues:

Embedding pipelines become expensive
Re-indexing takes time
Large vector databases cost a lot
Retrieval becomes harder to debug
Results can be “similar but wrong”

That last issue matters more than people realize. Because in production systems:

“Almost correct” can still break things.

2. Vectorless Retrieval — Search by Structure and Memory

This is where things get interesting. A newer category of systems is emerging that avoids depending heavily on embeddings.

Instead of asking:

“What sounds similar?”

These systems ask:

“How is the information organized?”

“How are concepts connected?”

Two approaches I’ve been experimenting with recently are:

PageIndex
FastMemory

PageIndex — Search Like a File System

PageIndex focuses on structure. Instead of embedding everything, it indexes:

sections
pages
headings
file hierarchy
relationships

Think of it like navigating documentation manually.

Example

Repository
 ├── Payments
 │ ├── RetryService
 │ ├── QueueWorker
 │ └── GatewayClient
 │
 ├── Terraform
 └── Runbooks

Now when someone asks:

“Which Terraform file creates the production VPC?”

The system jumps directly to the correct section. No embedding similarity needed.

Why This Feels Different

Vector search feels probabilistic. PageIndex feels deterministic. You can trace exactly why the result appeared. That’s incredibly valuable for:

infrastructure systems
compliance
security operations
engineering copilots

The Tradeoff

The downside is simple:

PageIndex works best when your data is already well organized. Messy, noisy, unstructured content is harder.

FastMemory — Search Like Human Memory

This was probably the most interesting shift for me.

Instead of searching documents repeatedly, FastMemory tries to preserve relationships between concepts.

The mental model is closer to human memory than database search.

Example

Payment Service
   ├── Retry Logic
   ├── Gateway Failure
   ├── Fraud Check
   └── Queue System

Instead of retrieving isolated chunks, the system remembers:

what connects together
what was recently used
which concepts are related

Why This Matters for AI Agents

Traditional RAG works well for single questions. But AI agents are different. Agents need:

continuity
memory
workflow context
long-running state

That’s where FastMemory becomes powerful. It feels less like:

“Search everything again.”

And more like:

“Remember how this system works.”

3. LLM-Wiki — Search Through Compiled Knowledge

This approach became more popular after ideas shared by Andrej Karpathy around building AI-generated knowledge systems.

The core idea is simple: Instead of repeatedly retrieving raw documents…

The AI continuously builds a structured wiki.

The Flow

Raw Documents
      ↓
LLM Processes Information
      ↓
Creates Wiki Pages
      ↓
Links Related Concepts
      ↓
Search the Wiki

Why This Is Different

This shifts retrieval from:

“Search at query time”

to:

“Organize knowledge ahead of time”

That’s a very different philosophy.

Example

Instead of repeatedly searching payment logs, the AI creates knowledge pages like:

Payment System
Retry Logic
Failure Handling
Gateway Recovery

Over time, the knowledge base becomes richer and more connected. Almost like a living internal encyclopedia.

The Big Advantage

LLM-Wiki systems become smarter over time because the information becomes increasingly organized. Not just retrieved. That distinction is subtle, but important.

The Challenge

The downside is maintenance. Knowledge systems can drift. Pages become stale. Bad ingestion can propagate mistakes. And synchronization becomes hard at scale.

Quick Comparison

So… Which One Wins?

Honestly? None of them. Because they solve different problems.

Use Vector RAG when:

your data is messy
wording varies heavily
semantic search matters most

Use PageIndex when:

your systems are structured
exact answers matter
traceability is important

Use FastMemory when:

building AI agents
maintaining long workflows
preserving contextual memory

Use LLM-Wiki when:

building research systems
organizing long-term knowledge
creating internal AI knowledge platforms

The Real Future Is Hybrid

The most interesting systems I’m seeing now combine all of them together.

Something like:

Semantic Search (Vector RAG)
        +
Structure (PageIndex)
        +
Memory (FastMemory)
        +
Knowledge (LLM-Wiki)
        =
Modern AI Retrieval

And honestly, that hybrid approach makes sense. Because real-world systems don’t operate in a single mode. Sometimes you need:

semantic understanding
exact technical retrieval
memory continuity
long-term knowledge organization

All at once.

Final Thought

We’re entering a new phase of AI infrastructure. The conversation is shifting from:

“Which model should we use?”

to:

“How should the system think about information?”

And that shift is much bigger than most people realize.

DEV Community

Search in the LLM Era: Vector RAG vs Vectorless Search vs LLM-Wiki

The Simplest Way to Understand the Difference

Vector RAG

PageIndex

FastMemory

LLM-Wiki

1. Vector RAG — Search by Meaning

Why People Love It

But There’s a Catch

Common issues:

2. Vectorless Retrieval — Search by Structure and Memory

PageIndex — Search Like a File System

Why This Feels Different

The Tradeoff

FastMemory — Search Like Human Memory

Example

Why This Matters for AI Agents

3. LLM-Wiki — Search Through Compiled Knowledge

The Flow

Why This Is Different

Example

The Big Advantage

The Challenge

Quick Comparison

So… Which One Wins?

Use Vector RAG when:

Use PageIndex when:

Use FastMemory when:

Use LLM-Wiki when:

The Real Future Is Hybrid

Final Thought

Top comments (0)