Why the future of AI retrieval is moving beyond vector databases.
A lot of people think the hardest part of building AI systems is the model.In reality, the hardest part is usually search. Once you connect an LLM to real-world data — engineering docs, APIs, support tickets, Terraform modules, codebases, Slack conversations — the biggest challenge becomes:
How does the AI find the right information at the right time?
That’s the real bottleneck and over the last year, I’ve noticed something interesting:
The industry is slowly moving beyond the classic “just use vector databases” approach.
Now we’re seeing three different retrieval styles emerge:
- Vector RAG
- Vectorless retrieval (PageIndex + FastMemory)
- LLM-Wiki systems
Each solves a different problem and understanding the difference completely changes how you design AI systems.
The Simplest Way to Understand the Difference
Imagine you ask an AI assistant:
“Where is the payment retry logic implemented?”
Each system searches differently.
Vector RAG
Searches by meaning. It tries to find text that sounds semantically similar to your question. Like asking:
“Find me anything related to payment retries.”
PageIndex
Searches by structure. It navigates documents like folders, sections, and hierarchies. Like using a table of contents.
FastMemory
Searches by relationships and memory. It remembers how systems connect together. Like asking a senior engineer who already knows the architecture.
LLM-Wiki
Searches through organized knowledge. Instead of searching raw documents repeatedly, the AI builds its own evolving wiki. Like having an internal Wikipedia for your company.
1. Vector RAG — Search by Meaning
Vector RAG is still the most common retrieval architecture today. The idea is straightforward: Convert documents into embeddings (vectors), then search for similar vectors. The Flow
Documents
↓
Chunking
↓
Embeddings
↓
Vector Database
↓
Similarity Search
↓
LLM Response
Why People Love It
Vector search is incredibly good at understanding intent. For example, if someone asks:
“How does payment retry work?”
It may still retrieve a document saying:
“Transaction recovery mechanism handles failed gateway calls.”
Even though the wording is different. That semantic flexibility is what made RAG explode in popularity.
But There’s a Catch
As systems grow, vector search starts creating operational pain. Especially in enterprise environments.
Common issues:
- Embedding pipelines become expensive
- Re-indexing takes time
- Large vector databases cost a lot
- Retrieval becomes harder to debug
- Results can be “similar but wrong”
That last issue matters more than people realize. Because in production systems:
“Almost correct” can still break things.
2. Vectorless Retrieval — Search by Structure and Memory
This is where things get interesting. A newer category of systems is emerging that avoids depending heavily on embeddings.
Instead of asking:
“What sounds similar?”
These systems ask:
“How is the information organized?”
or
“How are concepts connected?”
Two approaches I’ve been experimenting with recently are:
- PageIndex
- FastMemory
PageIndex — Search Like a File System
PageIndex focuses on structure. Instead of embedding everything, it indexes:
- sections
- pages
- headings
- file hierarchy
- relationships
Think of it like navigating documentation manually.
Example
Repository
├── Payments
│ ├── RetryService
│ ├── QueueWorker
│ └── GatewayClient
│
├── Terraform
└── Runbooks
Now when someone asks:
“Which Terraform file creates the production VPC?”
The system jumps directly to the correct section. No embedding similarity needed.
Why This Feels Different
Vector search feels probabilistic. PageIndex feels deterministic. You can trace exactly why the result appeared. That’s incredibly valuable for:
- infrastructure systems
- compliance
- security operations
- engineering copilots
The Tradeoff
The downside is simple:
PageIndex works best when your data is already well organized. Messy, noisy, unstructured content is harder.
FastMemory — Search Like Human Memory
This was probably the most interesting shift for me.
Instead of searching documents repeatedly, FastMemory tries to preserve relationships between concepts.
The mental model is closer to human memory than database search.
Example
Payment Service
├── Retry Logic
├── Gateway Failure
├── Fraud Check
└── Queue System
Instead of retrieving isolated chunks, the system remembers:
- what connects together
- what was recently used
- which concepts are related
Why This Matters for AI Agents
Traditional RAG works well for single questions. But AI agents are different. Agents need:
- continuity
- memory
- workflow context
- long-running state
That’s where FastMemory becomes powerful. It feels less like:
“Search everything again.”
And more like:
“Remember how this system works.”
3. LLM-Wiki — Search Through Compiled Knowledge
This approach became more popular after ideas shared by Andrej Karpathy around building AI-generated knowledge systems.
The core idea is simple: Instead of repeatedly retrieving raw documents…
The AI continuously builds a structured wiki.
The Flow
Raw Documents
↓
LLM Processes Information
↓
Creates Wiki Pages
↓
Links Related Concepts
↓
Search the Wiki
Why This Is Different
This shifts retrieval from:
“Search at query time”
to:
“Organize knowledge ahead of time”
That’s a very different philosophy.
Example
Instead of repeatedly searching payment logs, the AI creates knowledge pages like:
- Payment System
- Retry Logic
- Failure Handling
- Gateway Recovery
Over time, the knowledge base becomes richer and more connected. Almost like a living internal encyclopedia.
The Big Advantage
LLM-Wiki systems become smarter over time because the information becomes increasingly organized. Not just retrieved. That distinction is subtle, but important.
The Challenge
The downside is maintenance. Knowledge systems can drift. Pages become stale. Bad ingestion can propagate mistakes. And synchronization becomes hard at scale.
Quick Comparison
So… Which One Wins?
Honestly? None of them. Because they solve different problems.
Use Vector RAG when:
- your data is messy
- wording varies heavily
- semantic search matters most
Use PageIndex when:
- your systems are structured
- exact answers matter
- traceability is important
Use FastMemory when:
- building AI agents
- maintaining long workflows
- preserving contextual memory
Use LLM-Wiki when:
- building research systems
- organizing long-term knowledge
- creating internal AI knowledge platforms
The Real Future Is Hybrid
The most interesting systems I’m seeing now combine all of them together.
Something like:
Semantic Search (Vector RAG)
+
Structure (PageIndex)
+
Memory (FastMemory)
+
Knowledge (LLM-Wiki)
=
Modern AI Retrieval
And honestly, that hybrid approach makes sense. Because real-world systems don’t operate in a single mode. Sometimes you need:
- semantic understanding
- exact technical retrieval
- memory continuity
- long-term knowledge organization
All at once.
Final Thought
We’re entering a new phase of AI infrastructure. The conversation is shifting from:
“Which model should we use?”
to:
“How should the system think about information?”
And that shift is much bigger than most people realize.


Top comments (0)