DEV Community

Cover image for Vectorless Rag with AWS Bedrock and PageIndex

Vectorless Rag with AWS Bedrock and PageIndex

PageIndex caught my eye when it hit GitHub's trending page. It's a RAG framework that ditches vectors entirely in favor of document structure and LLM reasoning. I had to try it.

The open-source version turned out to be pretty bare-bones though. It handles tree generation fine, but if you want to actually query your documents locally, you're on your own. So I forked it, added the missing retrieval pieces, and threw in AWS Bedrock support while I was at it.

This post walks through how to run the full PageIndex pipeline locally with Bedrock as your LLM provider.

Full code: github.com/b-d055/PageIndex
(Clone the repo and run local_rag.py with examples provided to get started right away)

What is PageIndex?

Traditional vector-based RAG relies heavily on semantic similarity as a proxy for relevance. While this works well for many use cases, it often breaks down with long, structured, or highly technical documents that require domain knowledge and multi-step reasoning. In those cases, retrieving text that merely “sounds similar” to a query isn’t enough. PageIndex takes a different approach by using LLM reasoning to navigate a document’s structure, prioritizing sections based on how a human expert would actually search for an answer.

For a deeper dive into the motivation and design, check out PageIndex's introductory blog post.

PageIndex mimics how a human expert navigates documents:

  1. Read the structure - Parse the document's hierarchy (like a table of contents) to understand what's where
  2. Reason over sections - Use LLM reasoning to identify which sections likely contain relevant information
  3. Extract and evaluate - Pull content from selected sections and assess if it's sufficient to answer the query
  4. Iterate or answer - If more context is needed, revisit the structure and select additional sections. Otherwise, generate the response.

The output is a hierarchical tree that mirrors how a human would navigate a document.

The Problem: Open Source vs API

The PageIndex GitHub repo provides tree generation, but the cookbooks all use their API (PageIndexClient) for RAG queries. It's free to start but may cost you depending on your features and usage. If you want to run everything locally or use your own LLM provider (like bedrock), you need to bridge this gap.

What the open-source repo includes:

  • run_pageindex.py - generates tree structures from PDFs
  • md_to_tree() - generates trees from Markdown
  • Utilities for PDF parsing, token counting, etc.

What's missing:

  • Query/retrieval functionality
  • Helper functions like create_node_mapping(), print_tree()
  • Support for non-OpenAI providers

Step 1: Generate a Tree Structure

First, generate a tree from your document. This step currently requires OpenAI. I may add alternative provider support in my fork later.

Add your OpenAI API key to a .env file:

OPENAI_API_KEY=your-key
Enter fullscreen mode Exit fullscreen mode

Then run:

python run_pageindex.py --pdf_path document.pdf
Enter fullscreen mode Exit fullscreen mode

Note: The upstream repo uses CHATGPT_API_KEY internally, but my fork accepts OPENAI_API_KEY and sets it automatically.

This creates a JSON file in results/ with:

  • Hierarchical sections extracted from the document
  • Page ranges for each section
  • AI-generated summaries
  • Full text content (requires --if-add-node-text yes)

Example tree structure:

{
  "doc_name": "quarterly-report.pdf",
  "structure": [
    {
      "title": "Financial Results",
      "start_index": 1,
      "end_index": 5,
      "node_id": "0001",
      "summary": "Overview of Q1 financial performance...",
      "text": "Full text content...",
      "nodes": [...]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Important: The tree must include the text field for retrieval to work. Use --if-add-node-text yes during generation. It's off by default.

Step 2: Set Up AWS Bedrock

Generate a Bedrock API Key

AWS Bedrock now supports API key authentication. This simplifies setup significantly.

  1. Go to the AWS Bedrock Console
  2. Navigate to Model access and ensure you have access to Claude models (edit: model access page has been retired and is no longer required)
  3. Go to API keys in the sidebar (you may need to scroll down)
  4. Create a new API key (these can be short-term or long-term)
  5. Copy the key. It's only shown once.

For more details, see the AWS documentation on Bedrock API keys.

Configure Environment

# .env
OPENAI_API_KEY=sk-...               # Required for tree generation
AWS_BEARER_TOKEN_BEDROCK=your-key   # For Bedrock queries
AWS_REGION=us-east-1                # Or your preferred region
Enter fullscreen mode Exit fullscreen mode

How Authentication Works

boto3 automatically picks up the AWS_BEARER_TOKEN_BEDROCK environment variable:

import boto3
import os

os.environ['AWS_BEARER_TOKEN_BEDROCK'] = "your-api-key"

client = boto3.client('bedrock-runtime', region_name='us-east-1')
response = client.converse(modelId=model_id, messages=messages)
Enter fullscreen mode Exit fullscreen mode

No IAM roles or AWS CLI configuration needed when using API key auth.

Step 3: Query with Bedrock

Now you can query your document using Bedrock:

python local_rag.py --provider bedrock \
    --model us.anthropic.claude-haiku-4-5-20251001-v1:0 \
    --tree results/document_structure.json \
    --query "What are the main conclusions?"
Enter fullscreen mode Exit fullscreen mode

Or use interactive mode for multiple questions:

python local_rag.py --provider bedrock \
    --model us.anthropic.claude-haiku-4-5-20251001-v1:0 \
    --tree results/document_structure.json \
    -i
Enter fullscreen mode Exit fullscreen mode

Some Bedrock Models to Try

Use the us. prefix for cross-region inference:

Model ID
Claude Sonnet 4.5 us.anthropic.claude-sonnet-4-5-20250929-v1:0
Claude Haiku 4.5 us.anthropic.claude-haiku-4-5-20251001-v1:0
Amazon Nova Pro us.amazon.nova-pro-v1:0
Amazon Nova Lite us.amazon.nova-lite-v1:0

Tip: Claude Haiku 4.5 offers a good balance of speed and cost for RAG queries.

How the RAG Pipeline Works

The local RAG script implements a three-step pipeline:

1. Tree Search

Send the tree structure (without text) to the LLM and ask it to identify relevant nodes:

prompt = f"""
You are given a question and a tree structure of a document.
Find all nodes that are likely to contain the answer.

Question: {query}
Document tree structure: {tree_json}

Reply with: {{"thinking": "...", "node_list": ["0001", "0002"]}}
"""
Enter fullscreen mode Exit fullscreen mode

2. Content Extraction

Retrieve the full text from the identified nodes:

for node_id in search_result['node_list']:
    context += node_map[node_id]['text']
Enter fullscreen mode Exit fullscreen mode

3. Answer Generation

Send the extracted content to the LLM to generate an answer:

prompt = f"""
Answer the question based on the context:
Question: {query}
Context: {context}
"""
Enter fullscreen mode Exit fullscreen mode

Key Implementation Details

Helper Functions

The open-source repo doesn't include these, so we implement them:

def create_node_mapping(tree_structure):
    """Create a flat mapping of node_id -> node for easy lookup."""
    node_map = {}

    def traverse(nodes):
        for node in nodes:
            if 'node_id' in node:
                node_map[node['node_id']] = node
            if 'nodes' in node:
                traverse(node['nodes'])

    traverse(tree_structure)
    return node_map
Enter fullscreen mode Exit fullscreen mode

Bedrock Provider

class BedrockProvider:
    def __init__(self, model, region):
        self.client = boto3.client('bedrock-runtime', region_name=region)
        self.model = model

    def call(self, prompt):
        response = self.client.converse(
            modelId=self.model,
            messages=[{"role": "user", "content": [{"text": prompt}]}],
            inferenceConfig={"temperature": 0, "maxTokens": 4096}
        )
        return response['output']['message']['content'][0]['text']
Enter fullscreen mode Exit fullscreen mode

Two-Phase Workflow

The key insight is separating tree generation from querying:

Phase Provider What Happens
Generation OpenAI (required, for now) Parse PDF, extract structure, generate summaries
Querying Any (OpenAI/Bedrock) Tree search, content extraction, answer generation

This means you can:

  • Generate the tree once
  • Query many times with any provider (use Haiku or Nova for speed)
  • Share tree files across team members

Files Reference

Notable files in my fork.

File Purpose
local_rag.py Main script with OpenAI + Bedrock support
run_pageindex.py Tree generation from PDFs
.env API keys (copy from .env.example)
results/*.json Generated tree structures
requirements.txt Dependencies including boto3

Conclusion

PageIndex is a refreshing take on RAG. Using document structure and reasoning instead of vector similarity can yield smarter retrieval. This is especially true for complex documents.

This implementation is intentionally simple. It's a starting point, not a production-ready system. The two-phase workflow (generate once, query many) keeps things practical. The tree structures are just human-readable JSON, so it's easy to inspect what's happening and build on top of it.

If you're tired of fighting with chunking strategies and embedding quality, give it a shot.


Resources


You can find me on LinkedIn | CTO & Partner @ EES.

Top comments (0)