Ajain Vivek

Posted on Feb 24

Why Don't Databases Understand Documents?

#ai #database #agents #rust

I've spent the last three years building a knowledge intelligence layer for customers at Brainfish, the company I co-founded. Our job was straightforward on paper: help businesses turn their documents into AI-powered answers. Support docs, knowledge bases, product manuals — ingest them, make them searchable, let AI respond to customer questions.

Simple, right?

It nearly broke me.

The Frankenstein stack

Here's what our architecture looked like after three years of iteration:

A vector database for embeddings. A graph database for relationships. A custom RAG pipeline stitching them together. Chunking strategies we'd rewrite every few months. Embedding models we'd swap when accuracy tanked. Re-ranking layers. Hybrid search. Post-processing filters. Guardrails on top of guardrails.

We weren't building a product anymore. We were maintaining a Rube Goldberg machine where every piece existed to compensate for the failures of another piece.

And here's the thing nobody talks about at AI conferences: when search fails, your AI fails. It doesn't fail gracefully. It fails confidently. It hallucinates with authority, citing information that doesn't exist, mashing together fragments from unrelated sections, giving your customers answers that sound perfect and are completely wrong.

We spent more engineering hours debugging retrieval quality than building actual product features.

The realization that wouldn't go away

There wasn't a single eureka moment. It was death by a thousand paper cuts.

Every week, the same pattern. A customer reports that the AI gave a wrong answer. We dig in. The answer existed in their documents — clearly written, well-organized, exactly where you'd expect it. But our search didn't return it.

So we'd debug. Was the chunking strategy wrong? Were the chunks too big, too small, overlapping too much, not enough? Was the embedding model the problem? Should we switch from Embedding Large to a fine-tuned model? Maybe we need re-ranking. Maybe hybrid search. Maybe a knowledge graph on top. Maybe a different vector database entirely.

We'd fix one case and break three others. We'd tune the pipeline for legal documents and watch it degrade on support docs. We'd add a re-ranking layer that improved accuracy by 8% on benchmarks and made zero difference to the customer who triggered the investigation in the first place.

The hardest part wasn't that search failed. It was understanding why it failed. You'd stare at embeddings in a 1536-dimensional space and try to reason about why "termination conditions" wasn't close enough to "Either party may terminate this Agreement upon 90 days written notice." There's no intuition there. It's a black box all the way down.

Meanwhile, every single time, I could find the answer myself in seconds. Open the document. Scan the headings. Navigate to the right section. Read. Done. The information was right there, perfectly organized by the person who wrote it.

That's when it started to gnaw at me. The AI models were plenty smart — GPT-5, Claude, Gemini, GLM5 can reason circles around most humans. The problem was that we'd destroyed the very structure they needed to reason with. We'd taken documents with clear headings, sections, and subsections, shredded them into 512-token chunks, embedded those chunks into a flat vector space, and then wondered why the AI couldn't find anything.

We weren't giving AI documents. We were giving it confetti.

The question that started ReasonDB

I wrote a question in my notebook that wouldn't leave me alone:

Why don't databases natively solve this? Why do we keep building complex pipelines to compensate for dumb storage?

Think about it. Every AI application today follows the same pattern:

Store documents in a database that doesn't understand them
Build an elaborate pipeline to make up for it (chunking, embedding, re-ranking, hybrid search, guardrails)
Pray that the right chunks surface
Feed whatever you got to the LLM and hope for the best

We've built an entire industry around compensating for databases that treat documents as opaque blobs. Vector databases were a step forward — at least they understood similarity. But similarity isn't understanding. Finding chunks that look like your question isn't the same as finding chunks that answer your question.

A contract's termination clause isn't "similar" to your question about exit conditions. But it's the answer.

What if the database could reason?

I started experimenting. Nights and weekends at first, then full-time. The core insight was deceptively simple:

Documents have structure. That structure has meaning. A database for AI should preserve and leverage that structure, not destroy it.

When you read a legal contract, you don't scan every paragraph hoping to stumble on the termination clause. You look at the table of contents, navigate to the relevant section, read the subsection headings, and drill into the specific clause. You reason through the document's hierarchy.

What if a database let AI do the same thing?

That's what became ReasonDB.

How it actually works

When you ingest a document into ReasonDB, it doesn't shred it into flat chunks. It builds a hierarchical tree that preserves the document's natural structure:

Master Services Agreement
├── Section 1: Definitions
├── Section 2: Scope of Services
├── Section 3: Payment Terms
│   ├── 3.1 Fees
│   ├── 3.2 Payment Schedule
│   └── 3.3 Late Penalties
├── Section 4: Termination
│   ├── 4.1 Termination for Cause
│   ├── 4.2 Termination for Convenience
│   └── 4.3 Effect of Termination
└── ...

Then an LLM generates a summary for every node, bottom-up. Leaves get summarized first. Parents get summaries synthesized from their children. The root gets a summary of the whole document.

Now here's where it gets interesting. When you ask a question, the LLM doesn't search — it navigates. I call this Hierarchical Reasoning Retrieval (HRR):

The LLM reads the root summaries and picks the most promising branches ("Section 4: Termination looks relevant, confidence 0.92")
It drops into that branch, reads the children's summaries ("4.2: Termination for Convenience is what we want")
It reaches the leaf node, reads the actual content, and extracts a precise answer
It returns the answer, a confidence score, and the full reasoning path it took

The AI navigates your documents the way a domain expert would. It doesn't hope to find the right chunk. It reasons its way to it.

For the same termination question that stumped our entire Brainfish pipeline, ReasonDB visits about 8 nodes out of hundreds, makes 4 LLM calls, and returns the complete answer with the exact clause cited — in under two seconds.

But what about scale?

This is the question every database person asks, and the answer is what convinced me this approach has legs.

Trees are natural indexes. At every level, the LLM prunes irrelevant branches. In a knowledge base with a million documents and 50 million nodes, ReasonDB might visit 25-50 nodes total. Each level is an exponential filter.

But we're not naive about it. A pure LLM-guided traversal from a million documents would be too slow. So ReasonDB uses a 4-phase pipeline:

Phase 1: BM25 keyword search narrows millions of documents to ~100 candidates. Zero LLM calls. Milliseconds.
Phase 2: Recursive tree-grep walks each candidate's node hierarchy, matching query terms against titles and summaries. Still zero LLM calls. Microseconds.
Phase 3: The LLM reads summaries of the remaining candidates and ranks them. One LLM call.
Phase 4: Deep tree traversal on the top results, in parallel. This is where the magic happens.

BM25 for breadth. Structure for precision. LLM for intelligence. Trees for depth. Each phase narrows the funnel.

The research backs this up

When I started building ReasonDB, I was working from intuition and pain. Since then, the academic community has started validating the same core idea.

LATTICE (2025) — an LLM-guided hierarchical retrieval framework strikingly similar to HRR — achieved up to 9% improvement in Recall@100 and 5% improvement in nDCG@10 over flat dense retrieval baselines on the BRIGHT benchmark. Zero-shot, no fine-tuning. Comparable to specialized fine-tuned methods.

Semantic Pyramid Indexing showed 5.7x retrieval speedup and 2.5-point QA F1 improvement by indexing at multiple resolutions instead of a single flat layer.

Meanwhile, research on embedding-free retrieval found it outperforms embedding-based methods on long-context QA benchmarks while reducing storage and runtime by over an order of magnitude. There's even a documented phenomenon called "lost-in-the-long-distance" where embedding models degrade for documents that are structurally distant in hierarchies — exactly the failure mode that hierarchical retrieval sidesteps.

The pattern is clear: flat vector search has a ceiling. Structure-aware, LLM-guided retrieval is what breaks through it. ReasonDB bakes this into the database itself so you don't have to build it as a pipeline.

RQL: Because SQL developers shouldn't need a PhD in embeddings

I also built a query language, because I was tired of context-switching between six different APIs to query documents. RQL looks like SQL with two new clauses — SEARCH and REASON:

SELECT * FROM contracts
WHERE tags CONTAINS ANY ('vendor')
  AND metadata.value_usd > 50000
SEARCH 'payment penalties'
REASON 'What happens if we miss a payment deadline?'
LIMIT 5;

One query. Metadata filters, keyword search, and LLM-guided reasoning — composed together. No pipeline. No glue code. No praying.

SEARCH gives you fast BM25 keyword matching (~50ms). REASON triggers the full 4-phase HRR pipeline. Use one or both.

What I believe about the future of databases for AI

After three years of building knowledge infrastructure and watching every team I know struggle with the same problems, I've come to believe a few things:

1. The "dumb database + smart pipeline" era is ending. We've been asking databases to do one thing (store and retrieve bytes) and building increasingly complex pipelines to compensate. That's the wrong abstraction boundary. The database should understand content natively.

2. RAG isn't the answer — it's a workaround. RAG was a brilliant hack: take a dumb database, bolt on embeddings, and feed the results to an LLM. But it's fundamentally limited by the quality of retrieval, and retrieval over flat chunk pools has a ceiling. The next generation of AI data infrastructure needs to reason, not just retrieve.

3. Structure is the missing primitive. Vector databases threw away document structure in favor of semantic similarity. Knowledge graphs tried to impose structure but required manual extraction. The answer is preserving the structure that already exists in documents and letting AI leverage it.

4. Databases should be AI-native, not AI-compatible. Adding a vector column to Postgres doesn't make it an AI database. An AI-native database is designed from the ground up for how AI actually works — reasoning over content, not just pattern-matching over numbers.

ReasonDB today

ReasonDB is written in Rust. It ships as a single binary. It's ACID-compliant, has API key auth, rate limiting, and async parallel traversal. It supports Anthropic, OpenAI, Gemini, and Cohere out of the box. It has a plugin system for custom extractors (PDF, Word, Excel, images, audio, URLs — all supported via plugins). It runs in Docker with one command.

It is the database I wished existed three years ago.

docker run --rm -p 4444:4444 \
  -e REASONDB_LLM_PROVIDER=openai \
  -e REASONDB_LLM_API_KEY=sk-... \
  ajainvivek/reasondb:latest serve

Ingest a document. Ask a question. Watch the AI actually find the right answer.

Try it

reasondb / reasondb

The first database built to let AI agents think their way to the right answer using structural reasoning, rather than guessing based on vector similarity.

AI-Native Document Intelligence

The database that understands your documents.
Built for AI agents that need to reason, not just retrieve

Docs • Quick Start • API Reference

What is ReasonDB?

ReasonDB is an AI-native document database built in Rust, designed to go beyond simple retrieval. While traditional databases and vector stores treat documents as data to be indexed, ReasonDB treats them as knowledge to be understood - preserving document structure, enabling LLM-guided traversal, and extracting precise answers with full context.

ReasonDB introduces Hierarchical Reasoning Retrieval (HRR), a fundamentally new architecture where the LLM doesn't just consume retrieved content - it actively navigates your document structure to find exactly what it needs, like a human expert scanning summaries, drilling into relevant sections, and synthesizing answers.

ReasonDB is not another vector database. It's a reasoning engine that preserves document hierarchy, enabling AI to traverse your knowledge the way a domain expert…

View on GitHub

Docs: reason-db.devdoc.sh

If you've ever stared at your RAG pipeline wondering why the right chunk didn't surface, if you've ever explained to a stakeholder that "the AI sometimes gets confused," if you've ever wished the database just understood — I built this for you.

I'd love your feedback. Star the repo, try it out, break it, tell me what's wrong. The best tools are forged by the community that uses them.

Stop searching. Start reasoning.

DEV Community