Most RAG systems work the same way: chunk documents, embed them into vectors, run similarity search, and surface the closest match. It works — until it doesn't. Similarity is not relevance. On complex professional documents, that gap shows up quickly.
A Different Retrieval Model
PageIndex from VectifyAI skips chunking and embedding entirely. It builds a hierarchical tree index from the document structure — effectively an auto-generated table of contents — then uses LLM reasoning to navigate that structure. No vector database. No chunking pipeline. Reported accuracy: 98.7% on FinanceBench.
Memory, Not Just Retrieval
Hindsight by Vectorize.io handles long-term agent memory. It organises memory into three types:
- World facts
- Experiences
- Mental models
...accessed through a retain → recall → reflect API. It leads the LongMemEval benchmark for agent memory accuracy.
The Problem
Both systems are capable — but both depend on external APIs. I wanted the same functionality running fully local, offline, and deterministic. So I built hindsight-pageindex.
What It Is
A local runtime scaffold that vendors PageIndex and exposes a Hindsight-compatible REST interface:
POST /index → ingest .md or .pdf
POST /query → retrieve top-K relevant sections
GET /docs → list indexed documents
Retrieval uses PageIndex's lexical + document-structure scoring. Markdown hierarchy is preserved, so queries resolve against document meaning rather than raw keyword matches. Fast. Deterministic. No external API calls.
Setup
git clone https://github.com/kashifeqbal/hindsight-pageindex
cd hindsight-pageindex
npm run setup:local
cp .env.example .env
# Set CHATGPT_API_KEY and API_TOKEN
npm run start
# → Listening on 127.0.0.1:8787
Try It in Under a Minute
cat >/tmp/hindsight-sample.md <<'MD'
# User Profile
## Location
Based in Gurgaon, India.
## Preference
Prefers concise, direct answers.
MD
export API_TOKEN='your-token-here'
node scripts/test-index.mjs
node scripts/test-query.mjs
Ranked sections return instantly — no embedding service, no network round-trip.
Why This Pairing Works
Hindsight manages memory lifecycle. PageIndex handles document reasoning retrieval. Together they cover the full local memory stack:
| Layer | Tool |
|---|---|
| Memory lifecycle | Hindsight |
| Document retrieval | PageIndex |
| Infrastructure | Your machine |
When to Use This
- Air-gapped or privacy-sensitive environments — memory stays on device
- Personal AI assistants — profiles and preferences that shouldn't reach a cloud API
- Prototyping before committing to the full Hindsight hosted stack
- Where explainability matters — tree traversal is traceable; cosine similarity isn't
What's Next
- [ ] LLM-guided tree search in
/query— the full PageIndex reasoning pass, locally - [ ] Multi-doc cross-query support
- [ ] Optional embedding scorer as a drop-in upgrade path
Repo: github.com/kashifeqbal/hindsight-pageindex
If you're building local-first agent memory or have used PageIndex or Hindsight in a different setup, happy to compare notes.
Top comments (0)