RAG Without Vectors: How LLMs Are Learning to Navigate Documents Like Humans

#ai #rag #llm #architecture

Retrieval-Augmented Generation has been powering a large number of question-answering chatbots with document reference by combining the power of LLMs with data fetched from external sources.

The mental model required to understand the approach to traditional RAGs is to:

Get the data extracted from relevant sources.
Chunk them suitably with a definite chunk size.
Generate embeddings and store them in the Vector DBs.
Get the query, convert it to an embedding.
Do a similarity search and get the relevant vectors to cater to the query.
Generate answers to the question as per the context

As long as the vector DB is inexpensive, or the size of the retrieved data converted to embedding is suitable to be considered for storing in the vector DB.

One of the most beautiful aspects of data is that it always scales up. Anybody working in the Data domain might be well aware that, no matter how much cleansing, reduction, and relevance are found from the source data, along with the size of the source data, the size of the relevant data also increases.

That said, the number of embeddings is directly proportional to the increase in the size of the relevant data. As data grows, so do embedding costs, storage bills, and the maintenance burden of keeping vectors in sync — challenges that compound quickly in production — unless we choose the Vectorless RAG approach

How does Vectorless RAG work?

Let’s first understand the human approach to searching for relevant data when you have a question. We either,

Try to search for the answer to a context we remember without referring to anything external ( Vectors stored in VectorDBs) Or,
Try to go through external sources (Maybe check books, PDFs, etc)

Let’s go by the 2nd method — If we are referring to a book, we first go through the index page, check the relevant topic, and refer to the page number of that topic discussed in the book, and jump directly to the page if we are not interested in the rest of the contents of the book.

That’s exactly what Vectorless RAG does. The framework used is called “PageIndex”. PageIndex is an open source document indexing system that organises the documents into a hierarchical tree structure and allows LLMs to search on reasoning-based retrieval over that structure.

Why shift the Approach?

The reasoning behind the Vectorless RAGs is

User Query → Document Tree Structure → LLM Reasoning → Relevant Nodes Retrieved → LLM Generates Answer

The entire approach is reasoning-based rather than semantic similarity-based retrieval. With the Vectorless approach, the whole section of the preprocessing stage that involved splitting the doc into chunks, vectorising the chunks, and embedding into a vector DB, querying using the semantic match and retrieval of top k results of similar chunks, and then forming the model’s input context can be eliminated.

There is no harm in the above-mentioned stages; it might work perfectly for most of the cases, but we do come across scenarios where semantic match doesn’t mean it’s relevant.

For example, if you ask a librarian for “this year’s election results” and the librarian hands you 3 books and says, “all 3 have the cover page that says election results”!

These issues can be handled using metadata filtering, re-ranking, and fine-grained chunking strategies — or the vectorless RAGs :)

Vectorless RAGs also solve the hard chunking issues that cut through the sentences, which might fragment the meaning and context.

Then there are things like “refer Page 15, Table 3.2” like strings in the documents that don’t get matched with any semantic similarity, even if contextually that might have helped.

LLMs can solve that problem by tracing the references while making conclusions on the generated data, and all it needs is a reasoning-based RAG framework.

PageIndex, a reasoning-based RAG framework that overcomes the constraints of vector-based systems and brings the power of agentic retrieval to long-form, structured documents.

PageIndex Retrieval:

As the name suggests, this approach acts just like we would go to the index page of a book, to understand how the book is structured, find the relevant section which we are looking for, and do an exhaustive search in the selected context to come up with a conclusion or opinions

If the information is sufficient, it proceeds to answer the question; otherwise, it checks the index again to find a better section than the one selected during the first iteration in terms of relevance.

The Table of Contents is a JSON-based hierarchical structure for unstructured data.

The ToC follows as a hierarchical node representation with each node that can have metadata, description, or a reference to further relevant context.

This way, the traversing is traced using node_id, and the LLM can traverse recursively and also associate contextual metadata using semantic tags.

Each node directly points to raw content, allowing the LLM to make context-aware access to the data, since the ToC index resides within the active reasoning context called “in-context-index” of the LLM and not as a static embeddings index.

Since the searches are iterative and not limited to the selective context, the logical continuity is preserved, and hallucinations can be minimised.

Documents can be searched across multiple documents by metadata, semantics, and description.

Integrations:

PageIndex easily connects with the agent framework or LLM through MCP by using the PageIndex API under the hood, so if you are using PageIndex API keys to connect your LLMs with PageIndex, the same keys can be used for MCP integrations as well.

All you need to add is after you create the API key :

{ “mcpServers”: { “pageindex”: { “type”: “http”, “url”: “https://api.pageindex.ai/mcp”, “headers”: { “Authorization”: “Bearer your_api_key” } } } }
It also integrates seamlessly with the Python SDK, where we need to submit the PDF document and get a doc_id in return, and be assured that the PageIndex hierarchical tree has been created. This tree is then integrated with the prompt that calls the LLM to generate meaningful results.

The results from the LLM can then be converted back to JSON to get the visibility on the reasoning process, retrieved nodes, and references to the text used and the conclusions arrived at.

The same process can be extended to incorporate multi-node reasoning and content extraction, multi-document search, tree search, and expert knowledge integration without requiring fine-tuning.

With the rise of Vision-Language Models, Question answering based on long documents has been handled without OCR ( Optical Character Recognition), using the PageIndex as the reasoning-based retrieval layer.

When to Use Vector DB-based RAGs vs Vectorless RAG:
No one solution fits all when it comes to using one of these. The Hybrid approach might be the best solution sometimes. Otherwise,

If the search involves large-scale, real-time retrieval across many documents and semantic similarity is hand in hand with the context, or if real-time retrieval is required over larger datasets, Vector RAGs make the perfect solution. Vector DBs are perfect for unstructured data or data with large corpora.

But if the Document hierarchy is clear or if the documents have references to multiple sections scattered across the volume of documents, and higher retrieval accuracy is critical, then Vectorless RAGs would be the best solution. Also, vector RAGs can come with the cost and complexity of maintaining the embeddings and Vector DBs, so Vectorless RAGs could be the clear choice if cost reduction is kept in mind with the rest of these aspects.

A common production pattern is a hybrid pipeline. This is like first going to the right shelf in a library (PageIndex), then asking the librarian which book is most relevant (Vector) :

PageIndex first: narrow down to the right document or section using metadata and keywords.
Vector search second: find the most relevant chunk within that section semantically.

Traditional RAG optimises for “what sounds similar?” while reasoning-based RAG asks “where should I look, and why?” As documents grow longer, more cross-referenced, and more structured, that distinction will only matter more. PageIndex is still early, and hybrid pipelines that combine both approaches will likely dominate production systems for the foreseeable future. But the direction is clear: the next generation of RAG isn’t about better similarity search, it’s about giving LLMs the ability to navigate knowledge the way an expert would, not just pattern-match against it. If you’re building document-heavy AI pipelines today, Vectorless RAG is worth a serious look and not as a replacement for everything you know, but as another powerful tool that knows when to reach past the vector index entirely.