DEV Community

Cover image for RAG is Not Dead! No Chunking, No Vectors, Just Vectorless to Get the Higher Accuracy
Gao Dalie (Ilyass)
Gao Dalie (Ilyass)

Posted on

RAG is Not Dead! No Chunking, No Vectors, Just Vectorless to Get the Higher Accuracy

Over the past two years, I have written numerous articles on how Retrieval-Augmented Generation has become a standard feature in nearly all AI applications.

Whether it's intelligent customer service, enterprise knowledge bases, financial analysis, or legal document Q&A, they all use the same logic: document segmentation, vectorisation, matching using cosine similarity, and then feeding the retrieved content into a large model for answering.

This solution is simple and effective, but the problem is also obvious - when the problem becomes complex, spans multiple pages, or even involves multiple layers of logic, vector similarity retrieval often goes in the wrong direction. For example:

  • You asked, "What will be the year-over-year change in the company's cash flow from operating activities in 2023?"

  • A traditional RAG might find a bunch of paragraphs containing "cash flow" but miss out on key context: operating activities vs. investing activities, 2023 vs. 2022.

  • The result: high similarity, but poor correlation.

Many RAGs have used a technology called vector DB, which converts text into numerical values ​​to search for "similar text." Still, the problem is that "similar" does not necessarily mean "desired information. "

For example, a common problem occurs when a similar paragraph in a manual is hit, but important conditions or exceptions are overlooked.
That's where PageIndex comes in. This is a new RAG mechanism devised by Vectify AI, and the idea behind it is very simple.

When people read a book, they first look at the table of contents, open the chapter they are looking for, and then follow the subheadings to get to the desired location.

PageIndex lets AI do exactly this, allowing you to find " truly related parts " rather than "similar sentences ."

So, let me give you a quick demo of a live chatbot to show you what I mean.

Check a video

I will ask the chatbot a question: "What is DeepSeek-R1-Zero?" Feel free to ask any questions you want.

If you look at how the chatbot generates the output, you will see that when I input a query, the agent first loads a PDF file, downloads it locally using Python's requests module, and saves it to a structured folder. It then submits the PDF to PageIndex, which builds a hierarchical tree structure of the document, organising it into natural sections and generating summaries for each node. 

PageIndex is a new reasoning-based, vectorless RAG framework that performs retrieval in two steps: first, it generates a tree structure index of documents, and second, it performs reasoning-based retrieval through tree search. Unlike traditional vector-based RAG systems, PageIndex does not require vectors or artificial chunking, simulates human-like navigation through the document, and provides a transparent, reasoning-based process instead of approximate semantic search.

Next, I prepare a carefully crafted prompt for the LLM that includes my question and the simplified tree (with text removed to reduce size) and ask the model to identify the nodes most likely to contain the answer, returning both its reasoning and a list of node IDs in structured JSON. 

I then create a mapping of all nodes in the document tree, parse the LLM response, and print the model's reasoning for why it selected certain nodes. After that, I loop through the identified node IDs, retrieve their titles, page numbers, and text, and compile this content into a readable context.

What is Pageindex?

PageIndex is a new method for improving RAG accuracy. In normal RAG, sentences are vectorised and then highly similar sentences are searched for and referenced. However, this method retrieves information that is "similar in meaning but different in context," which reduces the accuracy of the answer

Therefore, PageIndex proposes a RAG that does not use a vector database.
Specifically, the PageIndex method converts a document into a hierarchical tree structure (similar to a table of contents), and LLM searches through that structure, making it possible to understand the context and find the information you need, just like a human reading a document.

How does it work?

PageIndex works in three major steps.

OCR (clear document reading)

While ordinary OCR processes each page, which can lead to disorganised headings and lists, PageIndex's OCR understands the entire document as a single structure and digitises it neatly while preserving headings and tables.

Tree Generation (Create a table of contents tree)

Convert documents directly into a hierarchical structure, like a table of contents. A tree structure with chapters, sections, and subsections is created, making it easy to navigate even long reports without getting lost.

Retrieval (searching by tracing the tree)

The AI ​​searches the tree based on the question and picks up all relevant parts. It also knows which pages and chapters have been visited, so the search results are well-founded.

PageIndex Vs Conventional RAG

Conventional RAG vectorises entire documents and stores them in a vector database. It then searches for relevant documents based on the similarity between the user's question and the content of the documents.

However, this method relies only on the statistical similarity of words and sentences, so it may not always capture true relevance. 

Long documents are also broken into chunks, which disrupts context and hides important connections. PageIndex solves these problems by using the inherent hierarchical structure of documents without breaking them down into small pieces

 This allows LLMs to retrieve information based on contextual semantic relevance rather than simple word similarity.

Let's start coding 

Let us now explore step by step and unravel the answer to how to create a Vectorless. We will install the libraries that support the model. For this, we will do a pip install requirements.

%pip install -q --upgrade pageindex
Enter fullscreen mode Exit fullscreen mode

0.2 Setup PageIndex

First, I import the PageIndexClient class from the pageindex package and also bring in some helper functions from pageindex. Then, I generate my own API key, which I copy and paste into the variable PAGEINDEX_API_KEY.

After that, I create a client instance called pi_client by passing my API key into PageIndexClient, and now I'm ready to use this client to interact with the PageIndex API for searching, indexing, or managing documents

from pageindex import PageIndexClient
import pageindex.utils as utils

# Get your PageIndex API key from https://dash.pageindex.ai/api-keys
PAGEINDEX_API_KEY = "YOUR_PAGEINDEX_API_KEY"
pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY)

Enter fullscreen mode Exit fullscreen mode

0.3 Setup LLM

let's import the openai package and set my OPENAI_API_KEY to the value I got from the OpenAI dashboard; then, I define an asynchronous function called call_llm that takes a prompt, with optional parameters model (defaulting to "gpt-4.1") and temperature (defaulting to 0 for deterministic answers); inside the function, I create a new AsyncOpenAI client using my API key; 

Next, I call client.chat.completions.create(...) where I pass in the model name, the conversation messages (in this case, just a single user message with my prompt), and the temperature; once the response comes back, I take the first choice's message content, strip extra whitespace, and return it.

import openai
OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"

async def call_llm(prompt, model="gpt-4.1", temperature=0):
    client = openai.AsyncOpenAI(api_key=OPENAI_API_KEY)
    response = await client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature
    )
    return response.choices[0].message.content.strip()
Enter fullscreen mode Exit fullscreen mode

Step 1: PageIndex Tree Generation

I import the os and requests modules since I need them to handle file paths and download files; then, I set pdf_url to the link of the paper I want to fetch, a PDF from arXiv; next, I build a local path called pdf_path with the filename extracted from the URL. 

After that, I sent a GET request to download the PDF using requests. get, open a new file in write-binary mode, and save the content locally; once the file is saved, 

I finally use pi_client.submit_document(pdf_path) to upload the saved PDF into PageIndex, take the returned "doc_id", and print it out to confirm the document has been successfully submitted for indexing.

import os, requests

# You can also use our GitHub repo to generate PageIndex tree
# https://github.com/VectifyAI/PageIndex

pdf_url = "https://arxiv.org/pdf/2501.12948.pdf"
pdf_path = os.path.join("../data", pdf_url.split('/')[-1])
os.makedirs(os.path.dirname(pdf_path), exist_ok=True)

response = requests.get(pdf_url)
with open(pdf_path, "wb") as f:
    f.write(response.content)
print(f"Downloaded {pdf_url}")

doc_id = pi_client.submit_document(pdf_path)["doc_id"]
print('Document Submitted:', doc_id)
Enter fullscreen mode Exit fullscreen mode

1.2 Get the generated PageIndex tree structure

After I've submitted the document and received a doc_id, check if the document is ready for retrieval by calling pi_client.is_retrieval_ready(doc_id). If it's ready, I then call pi_client.get_tree(doc_id, node_summary=True), which gives me the structured outline of the document, and I extract the ['result'] part from the response

if pi_client.is_retrieval_ready(doc_id):
    tree = pi_client.get_tree(doc_id, node_summary=True)['result']
    print('Simplified Tree Structure of the Document:')
    utils.print_tree(tree)
else:
    print("Processing document, please try again later...")
Enter fullscreen mode Exit fullscreen mode

Step 2: Reasoning-Based Retrieval with Tree Search

Next, I import JSON to handle and format the document tree. I set up my query, here asking, "What are the conclusions in this document?" To simplify the tree, I removed the full text and kept only titles and summaries, using utils.remove_fields. 
I create a search_prompt that tells the LLM to identify relevant nodes and return a JSON with "thinking" and "node_list". I embed the question and the simplified tree into this prompt. Finally, I call call_llm(search_prompt) to obtain structured JSON that points to the most relevant parts of the document.

import json

query = "What are the conclusions in this document?"

tree_without_text = utils.remove_fields(tree.copy(), fields=['text'])

search_prompt = f"""
You are given a question and a tree structure of a document.
Each node contains a node ID, a node title, and a corresponding summary.
Your task is to find all nodes that are likely to contain the answer to the question.

Question: {query}

Document tree structure:
{json.dumps(tree_without_text, indent=2)}

Please reply in the following JSON format:
{{
    "thinking": "<Your thinking process on which nodes are relevant to the question>",
    "node_list": ["node_id_1", "node_id_2", ..., "node_id_n"]
}}
Directly return the final JSON structure. Do not output anything else.
"""

tree_search_result = await call_llm(search_prompt)
Enter fullscreen mode Exit fullscreen mode

2.2 Print retrieved nodes and reasoning process

Later on, I create a lookup table for the tree by calling utils.create_node_mapping(tree), which gives me a dictionary where each key is a node_id and the value is the corresponding node details; then, since my tree_search_result from the LLM is just a JSON string, I parse it into a Python dictionary with json. loads.

Next, I print out the model's reasoning process by passing tree_search_result_json['thinking'] into utils.print_wrapped, which formats long text nicely so it's easier to read; after that, I loop through each node_id in tree_search_result_json["node_list"], take the matching node from my node_map, and print its ID, page number, and title, so I can clearly see which parts of the document the LLM thought were relevant to my query.

node_map = utils.create_node_mapping(tree)
tree_search_result_json = json.loads(tree_search_result)

print('Reasoning Process:')
utils.print_wrapped(tree_search_result_json['thinking'])

print('\nRetrieved Nodes:')
for node_id in tree_search_result_json["node_list"]:
    node = node_map[node_id]
    print(f"Node ID: {node['node_id']}\t Page: {node['page_index']}\t Title: {node['title']}")

Enter fullscreen mode Exit fullscreen mode

Step 3: Answer Generation

3.1 Extract relevant context from retrieved nodes
Finally, I take the JSON string from tree_search_result and parse it again with json. loads, then take just the "node_list" which contains the IDs of relevant nodes; after that, I build a big string called relevant_content by joining together the "text" fields of each node in that list, separated by two newlines, so it reads cleanly.

node_list = json.loads(tree_search_result)["node_list"]
relevant_content = "\n\n".join(node_map[node_id]["text"] for node_id in node_list)

print('Retrieved Context:\n')
utils.print_wrapped(relevant_content[:1000] + '...')
Enter fullscreen mode Exit fullscreen mode

Conclusion :

PageIndex is a new RAG mechanism. The AI explores documents as a table-of-contents tree, ensuring high relevance and providing clear evidence. It does not require a vector database, making it suitable for on-premise environments and for searching confidential documents. PageIndex is especially effective in fields where accuracy is critical, such as contracts, technology, and finance.

🧙‍♂️ I am an AI Generative expert! If you want to collaborate on a project, drop an inquiry here or book a 1-on-1 Consulting Call With Me.

I would highly appreciate it if you

❣ Join my Patreon: https://www.patreon.com/GaoDalie_AI

Book an Appointment with me: https://topmate.io/gaodalie_ai
Support the Content (every Dollar goes back into the -video):https://buymeacoffee.com/gaodalie98d
Subscribe to the Newsletter for free:https://substack.com/@gaodalie

Top comments (0)