DEV Community

Cover image for Benchmark: I replaced Gemini CLI's Vector RAG with Context Trees to stop the hallucinations (99% Token Reduction)
chi lan
chi lan

Posted on

Benchmark: I replaced Gemini CLI's Vector RAG with Context Trees to stop the hallucinations (99% Token Reduction)

If you’ve been following the recent debates here between Gemini CLI (Gemini 3/Flash) and Claude Code, you probably know:

  • Gemini CLI: Incredible value (free tier/high limits) but feels chaotic on large repos. It often hallucinates imports or gets lazy (refuses to code) when you load too many files.
  • Claude Code: Smarter reasoning, but you hit the 5-hour Usage Limit extremely fast if you work with large contexts.

I spent the last week testing a theory: The problem isn't that Gemini is dumb but Context Dumping makes it dumb.

Most users (and tools like Cursor's Index) either:

  1. Context Dump: Stuff 50 files into the window (Gemini Default). This causes Context Dilution - the model gets overwhelmed by noise and hallucinates.
  2. Vector RAG: Use embeddings to find similar code. This fails because it retrieves 10 versions of Auth_v1.ts instead of the one User.ts that you actually need.

The Benchmark Results
(Tested on a ~1,300 file codebase, asking: "Refactor the User Auth Controller")

Why this matters
We often blame the model for being lazy, but if you look at the logs, it's usually because the context window is polluted with 50 irrelevant files.
By moving the retrieval logic client-side (building a dependency tree locally and pruning it before sending to the API), you achieve two things:

  1. Unlimited Workflow: You stop burning 10k tokens per query, so you stay under the radar for usage limits.
  2. Smarter Code: Gemini 3 actually performs better than Claude 3.5 Sonnet when it isn't distracted by noise.

The Code/ Reproduction

If you’re curious how this actually played out step by step, check out the full article where I include the detailed results & analysis and reproduction guide.

Top comments (0)