Nimesh Kulkarni

Posted on May 26

RAG Is Not Always the Answer Anymore: How AI Agents Search Code in 2026

#ai #rag #programming #machinelearning

RAG Is Not Always the Answer Anymore: How AI Agents Search Code in 2026

For the last couple of years, "add RAG" became the default answer to almost every AI product question.

Need the model to understand docs? Add RAG.
Need it to answer questions over a repo? Add RAG.
Need it to stop hallucinating? Add RAG and pray a little.

RAG is still useful. I am not here to bury it. But for codebases, the default is changing. Modern AI coding agents do not always need a vector database to find the right context. A lot of the time, they need the same things a good developer uses every day: file names, grep, symbols, imports, tests, and exact source reads.

That shift matters because code is not just text. Code has names, paths, references, call graphs, package boundaries, tests, config files, and error strings. Treating it like a pile of semantically similar paragraphs can work, but it can also lose the structure that makes code understandable.

The old RAG reflex

Classic RAG usually looks like this:

Repository
  -> split files into chunks
  -> create embeddings for each chunk
  -> store vectors in a vector database
  -> retrieve similar chunks for a query
  -> send those chunks to the model

That flow is solid for many kinds of unstructured knowledge: support docs, PDFs, internal wiki pages, research notes, policies, transcripts. If the user asks a conceptual question and the answer may be hidden across lots of prose, semantic search helps.

But code retrieval has different pressure points.

If I ask an agent, "Why is checkout failing when the coupon is expired?", I do not only need something semantically close to "checkout" and "coupon". I may need:

the route that handles checkout
the exact error string from the UI
the coupon validation function
the test that describes expired coupons
the feature flag that changes behavior in production
the migration that added a nullable column

A vector search might find some of that. A good coding agent will usually search more like a developer.

How agents actually search code now

A practical code-search loop often looks closer to this:

User asks about a bug
  -> glob for likely files
  -> grep exact names, strings, routes, errors, flags
  -> read promising files
  -> follow imports and references
  -> inspect tests
  -> run the code or test suite
  -> refine the search

That is not anti-RAG. It is agentic retrieval. The model does not receive one static bundle of chunks at the start. It keeps asking for better evidence as it learns more.

Example:

rg "expired coupon|coupon expired|CouponExpired" .
rg "validateCoupon|applyCoupon|coupon" src tests
rg "checkout" src/routes src/app tests

Then the agent reads the actual files instead of guessing from snippets:

Read src/services/coupons.ts
Read src/routes/checkout.ts
Read tests/checkout/coupon-expiry.test.ts

This is boring. That is the point. Boring retrieval is often better than clever retrieval when the answer depends on exact symbols.

Why grep can beat embeddings for code

Embeddings are great when words are fuzzy. Code often is not fuzzy.

If the bug mentions STRIPE_WEBHOOK_SECRET, the agent should search for that exact string. If the stack trace says calculateFinalPrice, the agent should jump to the function. If the failing test is should_reject_expired_coupon, the agent should read that test.

Semantic similarity can miss these because it is trying to answer a softer question: "What chunk is conceptually close to this query?"

Code search often asks a harder, more literal question: "Where is this symbol defined, used, mutated, mocked, or tested?"

That is why tools like grep, glob, file reads, and language-server navigation are so useful. They preserve evidence. They give paths and line numbers. They let the agent verify what it found.

The chunking problem

Chunking is one of the weirdest parts of RAG for code.

A function may start in one chunk and end in another. A class may depend on imports that got chopped off. A route handler may look harmless until you read the middleware above it. A test may only make sense with the fixture defined 80 lines earlier.

When chunks break structure, retrieval can return technically relevant but practically incomplete context.

This is why repository-level code retrieval research is moving toward more structure-aware methods. Some approaches combine lexical search with post-processing. Others use dependency-aware retrieval or repository graphs. The common theme is simple: code needs structure, not just similarity.

Long context changed the tradeoff too

Another reason the RAG reflex is weakening: context windows got bigger.

If the useful part of a repo fits in context, the best retrieval system might be no retrieval system. Just read the files. If the agent can inspect the relevant source directly, a vector database may add more moving parts than value.

This does not mean "throw the whole repo into the prompt." That is lazy and expensive. But it does mean the agent can use a different strategy:

Search narrowly -> read complete files -> keep only what matters -> continue

That is closer to how developers work. We do not embed the repo in our brain before debugging. We search, open files, follow clues, and build a mental model as we go.

When RAG still makes sense

RAG is not dead. It is just not the automatic first move for every code problem.

Use vector RAG when:

the corpus is mostly unstructured prose
users ask conceptual questions with many possible phrasings
you need semantic recall across docs, tickets, comments, or design notes
the source content changes slowly enough that indexing is manageable
exact names are unknown or unreliable

For code agents, RAG can still help with:

architecture decision records
long product specs
old issue discussions
design docs
support tickets connected to bugs
natural-language explanations around the code

The better pattern is usually hybrid. Let lexical search and symbol navigation handle source code. Let semantic retrieval handle messy human text. Let the agent decide which tool fits the question.

A better mental model

Instead of asking, "Should I use RAG?", ask this:

What kind of evidence does the agent need?

If the answer is exact evidence, use exact tools:

file paths
symbols
imports
tests
error strings
config keys
logs

If the answer is semantic evidence, use semantic tools:

docs
notes
tickets
research
policies
discussion threads

If the answer needs both, combine them.

A production-ready code agent should not be a chatbot with a vector database attached. It should be closer to a junior developer with a terminal, editor, search tools, test runner, and enough judgment to know when it has weak evidence.

What this means for developers

If you are building AI coding tools in 2026, do not start by wiring up embeddings. Start with the boring tools:

glob to find likely files
grep or ripgrep for exact search
file reads with line ranges
language-server symbol lookup
test discovery and execution
git history when behavior changed over time

Then add semantic search where it earns its keep.

That last part is important. RAG is infrastructure. Every index needs chunking, syncing, invalidation, permissions, ranking, evaluation, and debugging. If grep plus file reads solve the problem, that is not primitive. That is good engineering.

Final thought

RAG used to feel like the magic layer that made LLMs useful over private data. For many use cases, it still is.

But codebases are not just private data. They are executable systems with structure. The best AI agents are starting to treat them that way.

So no, RAG is not always the answer anymore.

Sometimes the answer is:

rg "the thing that broke"

And honestly, that feels very developer-coded.

References

Anthropic, "Building agents with the Claude Agent SDK" — https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk
Anthropic, "How we built our multi-agent research system" — https://www.anthropic.com/engineering/multi-agent-research-system
LlamaIndex documentation, "Welcome to LlamaIndex" — https://developers.llamaindex.ai/python/framework/
LlamaIndex documentation, "Multi-agent patterns in LlamaIndex" — https://developers.llamaindex.ai/python/framework/understanding/agent/multi_agent/
arXiv, "GrepRAG" paper on lightweight lexical retrieval for repository-level code completion — https://arxiv.org/pdf/2601.23254
arXiv, "Do Not Treat Code as Natural Language" / Hydra repository-level code generation — https://arxiv.org/pdf/2602.11671
arXiv, "LARGER: Lexically Anchored Repository Graph Exploration and Retrieval" — https://arxiv.org/html/2605.16352

Top comments (4)

Harjot Singh • May 29

grep + symbols beating RAG for codesearch was the right call - embeddings hide more bugs than they catch in well-structured code. for greenfield saas gen the same logic: structured code-graph > vector recall. been baking that into moonshift's gen pipeline. $3 per shipped saas. happy to compare retrieval choices if u r working on the same problem space.

Nimesh Kulkarni • May 30

Yep bro you are right...!
Yep I m on it..

RAGPrep • Jun 1

Good piece, and the framing is sharper than most "RAG is dead" takes that have been circulating. The distinction between RAG-as-default vs RAG-as-one-tool-among-several is the right one, and code is the cleanest case where alternatives often outperform.
A few things worth tightening from the field:
RAG fails on code for specific structural reasons, not because RAG is broken. Code has dependency relationships, scope, type information, and execution semantics that vector similarity simply doesn't capture. A function that calls another function is semantically related in a way that embeddings model poorly. AST-aware search, symbol resolution, and graph traversal capture these relationships natively. Of course they outperform vector retrieval — they're the right tool for that data structure. This isn't an indictment of RAG, it's an indictment of using RAG on data that has stronger structure than embeddings can represent.
The "agentic search" framing is doing a lot of work in your piece. Agentic search is still using retrieval underneath — it's just deciding what to retrieve dynamically rather than retrieving once upfront. For codebases, that often means hybrid: AST search for symbol-aware queries, embedding search for natural-language code intent ("find the function that handles user authentication"), graph traversal for dependency questions. The agent picks the tool. RAG isn't being replaced, it's becoming one tool the agent calls when embedding similarity is the right primitive.
For non-code domains the argument inverts. Policy documents, legal text, support documentation, knowledge bases — these don't have the structural relationships that code has. Vector retrieval is still the right tool because the data has no stronger structure to exploit. Agentic search on a 50,000-document policy corpus without RAG underneath would be extraordinarily expensive and wouldn't be more accurate.
The strongest version of the argument in your piece, which I'd want to see expanded: RAG was oversold as the universal solution. It is the right primitive for unstructured text where similarity is the strongest signal you have. It is the wrong primitive for code, where syntactic and semantic relationships are explicit and traversable. Choose the tool to match the data structure.

Nimesh Kulkarni • Jun 2

👍