Last week, an agent was asked a very normal question in a very not-normal codebase:
“Add audit logging to the user deletion flow.”
It found a deleteUser() function.
It found an AuditService.
It made the change.
It passed local checks.
And it was still wrong.
Why? Because in this repo, user deletion actually happened through a saga, the audit event was emitted from a worker, and the “obvious” function it edited was only used in tests. The agent didn’t fail because it was dumb. It failed because it had a flat view of a graph-shaped system.
That’s the real reason MCP agents hallucinate in complex codebases: they retrieve files, not relationships.
The problem isn’t just context windows
A lot of people frame this as a token problem:
- repo too big
- too many files
- not enough context
- model guesses
That’s true, but incomplete.
In large systems, the hard part isn’t finding a file. It’s understanding:
- which service actually owns the behavior
- which code path is production vs dead code
- what calls what
- what data shape flows where
- which permissions or policies gate execution
- which tool or MCP server should even be used
A vector search can find “similar text.”
It does not reliably tell an agent:
“this method is a wrapper around a deprecated internal API, and the real side effect happens three hops later in a queue consumer.”
That’s where a knowledge graph helps.
What the graph gives the agent
Think of a knowledge graph as a map of the codebase and tooling:
- files
- functions
- classes
- APIs
- services
- schemas
- owners
- MCP tools
- auth policies
- runtime dependencies
And, more importantly, the edges between them:
callsimportsownsemitsconsumesrequires_roleserved_bydeprecated_in_favor_of
So instead of asking:
“What file mentions user deletion?”
the agent can ask:
“What is the production execution path for user deletion, and what policy + audit components are attached to it?”
That’s a much better question.
The shape of the fix
Here’s the mental model:
User request
|
v
LLM agent
|
+--> vector search: "find relevant files"
|
+--> knowledge graph: "find real relationships"
|
v
Grounded plan
|
v
MCP tools / code changes
Vector search is still useful. Keep it.
But in a complex repo, vector search should retrieve candidates, and the graph should validate the path.
A tiny runnable example
If you want to see the pattern in code, here’s a minimal graph query example in Node using graphology:
npm install graphology
const Graph = require("graphology");
const graph = new Graph();
graph.addNode("api/deleteUser");
graph.addNode("worker/userDeletionSaga");
graph.addNode("audit/logUserDeletion");
graph.addEdge("api/deleteUser", "worker/userDeletionSaga", { type: "emits" });
graph.addEdge("worker/userDeletionSaga", "audit/logUserDeletion", { type: "calls" });
console.log("Downstream from api/deleteUser:");
graph.forEachOutboundNeighbor("api/deleteUser", (node) => console.log("-", node));
That example is tiny, but the idea scales: once your agent can traverse relationships instead of matching text, it stops “fixing” the wrong place.
Where this matters most with MCP
MCP makes agents more useful because they can actually do things: read code, call internal tools, inspect docs, hit APIs.
It also makes mistakes more expensive.
If an agent hallucinates while choosing among 50+ tools, or picks the right tool with the wrong assumptions about the code path, you get confident nonsense with side effects.
In practice, the worst failures I’ve seen look like this:
- agent retrieves a plausible file
- agent infers architecture from naming
- agent calls the wrong MCP tool or edits the wrong layer
- output looks clean, but behavior is wrong
A knowledge graph reduces that by giving the agent a way to verify:
- “Is this path actually reachable?”
- “What service owns this?”
- “What tool is allowed to act here?”
- “What approval or policy is required before execution?”
If you already have OPA or another policy engine in place, great. Use it. The graph doesn’t replace policy; it gives the agent better grounding before policy enforcement kicks in.
Practical advice if you want to implement this
You do not need a giant “AI graph platform” project to get value.
Start small:
- build nodes for files, functions, services, MCP tools
- add edges for imports, calls, ownership, and auth requirements
- mark deprecated paths explicitly
- let retrieval fetch top candidate files
- let graph traversal rank or reject candidate actions
Even a partial graph can dramatically cut false assumptions.
A simple rule of thumb:
If your agent can answer “what mentions this?” but not “what actually depends on this?”, it will hallucinate in production-shaped repos.
Try it yourself
If you’re working with MCP servers or agent-heavy workflows, a few free tools that may help:
- Want to check your MCP server? Try https://tools.authora.dev
- Run
npx @authora/agent-auditto scan your codebase - Add a verified badge to your agent: https://passport.authora.dev
- Check out https://github.com/authora-dev/awesome-agent-security for more resources
The takeaway
Agents don’t just need more context.
They need structured context.
In simple repos, embeddings and file search can get surprisingly far.
In complex codebases, the missing piece is usually relationship awareness: execution paths, ownership, policy, and tool boundaries.
That’s what knowledge graphs are good at.
Not because they’re fancy, but because your system is already a graph whether your agent knows it or not.
How are you grounding agents in large codebases today: embeddings, static analysis, graphs, something else? Drop your approach below.
-- Authora team
This post was created with AI assistance.
Top comments (0)