DEV Community

Siraj Lakhani
Siraj Lakhani

Posted on • Originally published at Medium on

Search in the LLM Era: Vector RAG vs Vectorless Search vs LLM-Wiki

Why the future of AI retrieval is moving beyond vector databases.

A lot of people think the hardest part of building AI systems is the model.In reality, the hardest part is usually search. Once you connect an LLM to real-world data — engineering docs, APIs, support tickets, Terraform modules, codebases, Slack conversations — the biggest challenge becomes:

How does the AI find the right information at the right time?

That’s the real bottleneck and over the last year, I’ve noticed something interesting:

The industry is slowly moving beyond the classic “just use vector databases” approach.

Now we’re seeing three different retrieval styles emerge:

  • Vector RAG
  • Vectorless retrieval (PageIndex + FastMemory)
  • LLM-Wiki systems

Each solves a different problem and understanding the difference completely changes how you design AI systems.

The Simplest Way to Understand the Difference

Imagine you ask an AI assistant:

“Where is the payment retry logic implemented?”

Each system searches differently.

Vector RAG

Searches by meaning. It tries to find text that sounds semantically similar to your question. Like asking:

“Find me anything related to payment retries.”

PageIndex

Searches by structure. It navigates documents like folders, sections, and hierarchies. Like using a table of contents.

FastMemory

Searches by relationships and memory. It remembers how systems connect together. Like asking a senior engineer who already knows the architecture.

LLM-Wiki

Searches through organized knowledge. Instead of searching raw documents repeatedly, the AI builds its own evolving wiki. Like having an internal Wikipedia for your company.

1. Vector RAG — Search by Meaning

Vector RAG is still the most common retrieval architecture today. The idea is straightforward: Convert documents into embeddings (vectors), then search for similar vectors. The Flow

Documents
   ↓
Chunking
   ↓
Embeddings
   ↓
Vector Database
   ↓
Similarity Search
   ↓
LLM Response
Enter fullscreen mode Exit fullscreen mode

Why People Love It

Vector search is incredibly good at understanding intent. For example, if someone asks:

“How does payment retry work?”

It may still retrieve a document saying:

“Transaction recovery mechanism handles failed gateway calls.”

Even though the wording is different. That semantic flexibility is what made RAG explode in popularity.

But There’s a Catch

As systems grow, vector search starts creating operational pain. Especially in enterprise environments.

Common issues:

  • Embedding pipelines become expensive
  • Re-indexing takes time
  • Large vector databases cost a lot
  • Retrieval becomes harder to debug
  • Results can be “similar but wrong”

That last issue matters more than people realize. Because in production systems:

“Almost correct” can still break things.

2. Vectorless Retrieval — Search by Structure and Memory

This is where things get interesting. A newer category of systems is emerging that avoids depending heavily on embeddings.

Instead of asking:

“What sounds similar?”

These systems ask:

“How is the information organized?”

or

“How are concepts connected?”

Two approaches I’ve been experimenting with recently are:

  • PageIndex
  • FastMemory

PageIndex — Search Like a File System

PageIndex focuses on structure. Instead of embedding everything, it indexes:

  • sections
  • pages
  • headings
  • file hierarchy
  • relationships

Think of it like navigating documentation manually.

Example

Repository
 ├── Payments
 │ ├── RetryService
 │ ├── QueueWorker
 │ └── GatewayClient
 │
 ├── Terraform
 └── Runbooks
Enter fullscreen mode Exit fullscreen mode

Now when someone asks:

“Which Terraform file creates the production VPC?”

The system jumps directly to the correct section. No embedding similarity needed.

Why This Feels Different

Vector search feels probabilistic. PageIndex feels deterministic. You can trace exactly why the result appeared. That’s incredibly valuable for:

  • infrastructure systems
  • compliance
  • security operations
  • engineering copilots

The Tradeoff

The downside is simple:

PageIndex works best when your data is already well organized. Messy, noisy, unstructured content is harder.

FastMemory — Search Like Human Memory

This was probably the most interesting shift for me.

Instead of searching documents repeatedly, FastMemory tries to preserve relationships between concepts.

The mental model is closer to human memory than database search.

Example

Payment Service
   ├── Retry Logic
   ├── Gateway Failure
   ├── Fraud Check
   └── Queue System
Enter fullscreen mode Exit fullscreen mode

Instead of retrieving isolated chunks, the system remembers:

  • what connects together
  • what was recently used
  • which concepts are related

Why This Matters for AI Agents

Traditional RAG works well for single questions. But AI agents are different. Agents need:

  • continuity
  • memory
  • workflow context
  • long-running state

That’s where FastMemory becomes powerful. It feels less like:

“Search everything again.”

And more like:

“Remember how this system works.”

3. LLM-Wiki — Search Through Compiled Knowledge

This approach became more popular after ideas shared by Andrej Karpathy around building AI-generated knowledge systems.

The core idea is simple: Instead of repeatedly retrieving raw documents…

The AI continuously builds a structured wiki.

The Flow

Raw Documents
      ↓
LLM Processes Information
      ↓
Creates Wiki Pages
      ↓
Links Related Concepts
      ↓
Search the Wiki
Enter fullscreen mode Exit fullscreen mode

Why This Is Different

This shifts retrieval from:

“Search at query time”

to:

“Organize knowledge ahead of time”

That’s a very different philosophy.

Example

Instead of repeatedly searching payment logs, the AI creates knowledge pages like:

  • Payment System
  • Retry Logic
  • Failure Handling
  • Gateway Recovery

Over time, the knowledge base becomes richer and more connected. Almost like a living internal encyclopedia.

The Big Advantage

LLM-Wiki systems become smarter over time because the information becomes increasingly organized. Not just retrieved. That distinction is subtle, but important.

The Challenge

The downside is maintenance. Knowledge systems can drift. Pages become stale. Bad ingestion can propagate mistakes. And synchronization becomes hard at scale.

Quick Comparison

So… Which One Wins?

Honestly? None of them. Because they solve different problems.

Use Vector RAG when:

  • your data is messy
  • wording varies heavily
  • semantic search matters most

Use PageIndex when:

  • your systems are structured
  • exact answers matter
  • traceability is important

Use FastMemory when:

  • building AI agents
  • maintaining long workflows
  • preserving contextual memory

Use LLM-Wiki when:

  • building research systems
  • organizing long-term knowledge
  • creating internal AI knowledge platforms

The Real Future Is Hybrid

The most interesting systems I’m seeing now combine all of them together.

Something like:

Semantic Search (Vector RAG)
        +
Structure (PageIndex)
        +
Memory (FastMemory)
        +
Knowledge (LLM-Wiki)
        =
Modern AI Retrieval
Enter fullscreen mode Exit fullscreen mode

And honestly, that hybrid approach makes sense. Because real-world systems don’t operate in a single mode. Sometimes you need:

  • semantic understanding
  • exact technical retrieval
  • memory continuity
  • long-term knowledge organization

All at once.

Final Thought

We’re entering a new phase of AI infrastructure. The conversation is shifting from:

“Which model should we use?”

to:

“How should the system think about information?”

And that shift is much bigger than most people realize.

Vector vs Vectorless RAG

Top comments (0)