DEV Community

Kashif Eqbal
Kashif Eqbal

Posted on

Vectorless RAG Meets Agent Memory: Running Hindsight + PageIndex Fully Local

Most RAG systems work the same way: chunk documents, embed them into vectors, run similarity search, and surface the closest match. It works — until it doesn't. Similarity is not relevance. On complex professional documents, that gap shows up quickly.

A Different Retrieval Model

PageIndex from VectifyAI skips chunking and embedding entirely. It builds a hierarchical tree index from the document structure — effectively an auto-generated table of contents — then uses LLM reasoning to navigate that structure. No vector database. No chunking pipeline. Reported accuracy: 98.7% on FinanceBench.

Memory, Not Just Retrieval

Hindsight by Vectorize.io handles long-term agent memory. It organises memory into three types:

  • World facts
  • Experiences
  • Mental models

...accessed through a retain → recall → reflect API. It leads the LongMemEval benchmark for agent memory accuracy.

The Problem

Both systems are capable — but both depend on external APIs. I wanted the same functionality running fully local, offline, and deterministic. So I built hindsight-pageindex.

What It Is

A local runtime scaffold that vendors PageIndex and exposes a Hindsight-compatible REST interface:

POST /index   → ingest .md or .pdf
POST /query   → retrieve top-K relevant sections
GET  /docs    → list indexed documents
Enter fullscreen mode Exit fullscreen mode

Retrieval uses PageIndex's lexical + document-structure scoring. Markdown hierarchy is preserved, so queries resolve against document meaning rather than raw keyword matches. Fast. Deterministic. No external API calls.

Setup

git clone https://github.com/kashifeqbal/hindsight-pageindex
cd hindsight-pageindex
npm run setup:local
cp .env.example .env
# Set CHATGPT_API_KEY and API_TOKEN
npm run start
# → Listening on 127.0.0.1:8787
Enter fullscreen mode Exit fullscreen mode

Try It in Under a Minute

cat >/tmp/hindsight-sample.md <<'MD'
# User Profile
## Location
Based in Gurgaon, India.
## Preference
Prefers concise, direct answers.
MD

export API_TOKEN='your-token-here'
node scripts/test-index.mjs
node scripts/test-query.mjs
Enter fullscreen mode Exit fullscreen mode

Ranked sections return instantly — no embedding service, no network round-trip.

Why This Pairing Works

Hindsight manages memory lifecycle. PageIndex handles document reasoning retrieval. Together they cover the full local memory stack:

Layer Tool
Memory lifecycle Hindsight
Document retrieval PageIndex
Infrastructure Your machine

When to Use This

  • Air-gapped or privacy-sensitive environments — memory stays on device
  • Personal AI assistants — profiles and preferences that shouldn't reach a cloud API
  • Prototyping before committing to the full Hindsight hosted stack
  • Where explainability matters — tree traversal is traceable; cosine similarity isn't

What's Next

  • [ ] LLM-guided tree search in /query — the full PageIndex reasoning pass, locally
  • [ ] Multi-doc cross-query support
  • [ ] Optional embedding scorer as a drop-in upgrade path

Repo: github.com/kashifeqbal/hindsight-pageindex

If you're building local-first agent memory or have used PageIndex or Hindsight in a different setup, happy to compare notes.

Top comments (0)