Authora Dev

Posted on Apr 9

Why MCP agents keep hallucinating in big codebases (and how knowledge graphs fix it)

#ai #programming #devops #security

Last week, an agent was asked a very normal question in a very not-normal codebase:

“Add audit logging to the user deletion flow.”

It found a deleteUser() function.
It found an AuditService.
It made the change.
It passed local checks.

And it was still wrong.

Why? Because in this repo, user deletion actually happened through a saga, the audit event was emitted from a worker, and the “obvious” function it edited was only used in tests. The agent didn’t fail because it was dumb. It failed because it had a flat view of a graph-shaped system.

That’s the real reason MCP agents hallucinate in complex codebases: they retrieve files, not relationships.

The problem isn’t just context windows

A lot of people frame this as a token problem:

repo too big
too many files
not enough context
model guesses

That’s true, but incomplete.

In large systems, the hard part isn’t finding a file. It’s understanding:

which service actually owns the behavior
which code path is production vs dead code
what calls what
what data shape flows where
which permissions or policies gate execution
which tool or MCP server should even be used

A vector search can find “similar text.”
It does not reliably tell an agent:

“this method is a wrapper around a deprecated internal API, and the real side effect happens three hops later in a queue consumer.”

That’s where a knowledge graph helps.

What the graph gives the agent

Think of a knowledge graph as a map of the codebase and tooling:

files
functions
classes
APIs
services
schemas
owners
MCP tools
auth policies
runtime dependencies

And, more importantly, the edges between them:

calls
imports
owns
emits
consumes
requires_role
served_by
deprecated_in_favor_of

So instead of asking:

“What file mentions user deletion?”

the agent can ask:

“What is the production execution path for user deletion, and what policy + audit components are attached to it?”

That’s a much better question.

The shape of the fix

Here’s the mental model:

User request
   |
   v
LLM agent
   |
   +--> vector search: "find relevant files"
   |
   +--> knowledge graph: "find real relationships"
   |
   v
Grounded plan
   |
   v
MCP tools / code changes

Vector search is still useful. Keep it.
But in a complex repo, vector search should retrieve candidates, and the graph should validate the path.

A tiny runnable example

If you want to see the pattern in code, here’s a minimal graph query example in Node using graphology:

npm install graphology

const Graph = require("graphology");

const graph = new Graph();
graph.addNode("api/deleteUser");
graph.addNode("worker/userDeletionSaga");
graph.addNode("audit/logUserDeletion");
graph.addEdge("api/deleteUser", "worker/userDeletionSaga", { type: "emits" });
graph.addEdge("worker/userDeletionSaga", "audit/logUserDeletion", { type: "calls" });

console.log("Downstream from api/deleteUser:");
graph.forEachOutboundNeighbor("api/deleteUser", (node) => console.log("-", node));

That example is tiny, but the idea scales: once your agent can traverse relationships instead of matching text, it stops “fixing” the wrong place.

Where this matters most with MCP

MCP makes agents more useful because they can actually do things: read code, call internal tools, inspect docs, hit APIs.

It also makes mistakes more expensive.

If an agent hallucinates while choosing among 50+ tools, or picks the right tool with the wrong assumptions about the code path, you get confident nonsense with side effects.

In practice, the worst failures I’ve seen look like this:

agent retrieves a plausible file
agent infers architecture from naming
agent calls the wrong MCP tool or edits the wrong layer
output looks clean, but behavior is wrong

A knowledge graph reduces that by giving the agent a way to verify:

“Is this path actually reachable?”
“What service owns this?”
“What tool is allowed to act here?”
“What approval or policy is required before execution?”

If you already have OPA or another policy engine in place, great. Use it. The graph doesn’t replace policy; it gives the agent better grounding before policy enforcement kicks in.

Practical advice if you want to implement this

You do not need a giant “AI graph platform” project to get value.

Start small:

build nodes for files, functions, services, MCP tools
add edges for imports, calls, ownership, and auth requirements
mark deprecated paths explicitly
let retrieval fetch top candidate files
let graph traversal rank or reject candidate actions

Even a partial graph can dramatically cut false assumptions.

A simple rule of thumb:

If your agent can answer “what mentions this?” but not “what actually depends on this?”, it will hallucinate in production-shaped repos.

Try it yourself

If you’re working with MCP servers or agent-heavy workflows, a few free tools that may help:

Want to check your MCP server? Try https://tools.authora.dev
Run npx @authora/agent-audit to scan your codebase
Add a verified badge to your agent: https://passport.authora.dev
Check out https://github.com/authora-dev/awesome-agent-security for more resources

The takeaway

Agents don’t just need more context.
They need structured context.

In simple repos, embeddings and file search can get surprisingly far.
In complex codebases, the missing piece is usually relationship awareness: execution paths, ownership, policy, and tool boundaries.

That’s what knowledge graphs are good at.
Not because they’re fancy, but because your system is already a graph whether your agent knows it or not.

How are you grounding agents in large codebases today: embeddings, static analysis, graphs, something else? Drop your approach below.

-- Authora team

This post was created with AI assistance.

DEV Community