Search with no AI in the answer, and why I chose plain chunks over tree-RAG

#ai #machinelearning #webdev #wordpress

`> I develop a privacy-first RAG chatbot for WordPress. This combines two writeups from pagecoder.ai - on search and on chunking - and continues my earlier post on what AI chat plugins leak per question.

Last cycle I added two things to my WordPress RAG plugin that people kept asking for: visitor-facing search, and indexing big PDFs. Each one came down to a retrieval decision worth writing down.

1. Search: retrieve, but don't generate

My chatbot answers in two stages - retrieve the most relevant content, then hand it to a model that writes a reply. Search is just the first stage, stopped before the second.

When you type into the search box, I run the same retrieval the chatbot uses and then stop. No model is asked to compose an answer. You get the raw matches back: ranked results with your terms highlighted.

Stopping early is the point. It buys three things:

No hallucination. Nothing is generated, so nothing can be invented. Every result links to a real page.
You land on the source. A ranked link, not a paraphrase that may or may not match the page.
It's lighter. The slow, expensive part of a chat reply is the model writing a few hundred words. Skip it and the same lookup gets cheaper and faster.

One honest caveat, because the whole product is about not overclaiming: search is not zero AI. To match on meaning and not just keywords, the query is still turned into an embedding (same backend the chat uses, then discarded). What it does not do is the thing people actually worry about - no model writes text about your content.

2. Chunking: the boring method beat the clever one

To index a big PDF you first have to cut it into pieces. There's a boring way and a clever way.

Boring: fixed-size chunks. Walk the document, cut it into roughly equal pieces of a few paragraphs each, with a little overlap so a sentence on a boundary isn't lost. No idea what a heading is. Just consistent slices.

Clever: tree-RAG (e.g. PageIndex). Build a tree from the table of contents, sections become nodes, and at query time you walk down to the most relevant branch and pull that whole section. On paper it's obviously better for long structured documents.

I wanted the clever one - it's the more impressive thing to say you built. So I tested it properly instead of guessing: a graded eval (every answer scored correct / partial / wrong), run on ordinary pages and on the long, table-of-contents-heavy PDFs that are the tree's home turf.

It didn't win. Directional results - I'd rather you run your own eval than trust my numbers:

What I measured	Fixed-size chunks	Tree-RAG
Accuracy on ordinary pages	Higher	Lower
Accuracy on long structured PDFs	More reliable	Mixed
Cost per question	Cheaper	Noticeably more
Tokens fed to the model	Lean	Much heavier
The biggest PDF I threw at it	Indexed fine	Failed to build its tree at all
Where it actually won	Most question types	Broad "summarize the whole thing" questions

Why it won: tokens and cost

It comes down to how much you hand the model. When the tree pulls "the most relevant section," that section can be pages long and all of it goes into the prompt. Fixed-size chunks hand over a few tight pieces and nothing else. That shows up on the bill (you pay for what the model reads) and in quality (burying the answer in surrounding text gives the model room to wander). Lean retrieval is often more accurate, not just cheaper.

The honest caveat

The tree wasn't useless. For "what is this entire document about?" it was genuinely better - it can feed the model a whole structured section at once. If your use case is almost entirely whole-document summarization, it might be worth the cost. For the specific "where does it say X?" questions real visitors ask, the boring slices won. So I shipped chunks and parked the tree.

What this means for PDFs (and your data)

I index big PDFs with the method that proved itself. Text is extracted, sliced, and searched - and on the privacy side it works like the rest of the product: extraction happens in memory, the file itself is never stored on my side, and the searchable pieces live in your own database.

Take it as a nudge, not gospel

The reason I tested instead of reading opinions is that this stuff is easy to assert and easy to check. If you're choosing a retrieval or chunking strategy, build a small graded eval on your documents before adopting the fancy thing. It's the most useful afternoon you'll spend on RAG.

I build RAG Chat - a privacy-first AI chatbot + search for WordPress. 7-day free trial, no card. Need something custom built? Tell me what you need. No tracking pixels were used in this post.`