Authora Dev

Posted on Apr 13

Why context windows keep breaking AI agents (and how knowledge graphs fix it)

#ai #programming #devops #security

Last week, an agent in a coding workflow looked perfectly fine for the first 20 minutes.

It knew the repo structure. It remembered the ticket. It even used the right MCP tools to inspect files and open a PR.

Then the session got longer.

A few more tool calls. More logs. More intermediate reasoning. More pasted docs. And suddenly the agent started acting like a teammate who joined the meeting halfway through: repeating work, forgetting constraints, and asking for data it had already seen.

That’s the part nobody tells you about “long-running” AI agents: they don’t really have memory. They have a context window budget.

And once that budget fills up, older facts get dropped, compressed, or mangled.

The real problem: context overflow looks like bad reasoning

When an agent fails after a long session, we often blame:

the model
the prompt
the tool
the framework

Sometimes the actual issue is simpler: the agent can’t keep all the important state in working memory anymore.

This gets worse with MCP-based agents because they’re constantly pulling in fresh context:

tool schemas
file contents
API responses
policy docs
previous actions
approval requirements

If everything is shoved back into the prompt on every turn, you eventually hit a wall.

Why summarization alone isn’t enough

A common fix is “just summarize old context.”

That helps, but summaries are lossy. They flatten details that may become important later.

Example:

“User asked to deploy only to staging”
“Database migration requires approval”
“This MCP server can read secrets but not rotate them”
“Alice delegated access to build-bot for 2 hours”

Those aren’t just notes. They’re relationships.

If you summarize them too aggressively, the agent loses the structure that tells it why something matters.

Knowledge graphs work better because agents need relationships, not transcripts

Instead of storing memory as a giant conversation log, store it as connected facts:

entities: user, repo, server, token, environment
actions: deployed, approved, delegated, scanned
relationships: can-access, owns, depends-on, blocked-by, approved-by

That gives the agent a memory system it can query instead of rereading.

A simple mental model:

[User: Alice] --delegated--> [Agent: build-bot]
[build-bot] --can-access--> [Repo: checkout-service]
[Repo: checkout-service] --deploys-to--> [Env: staging]
[Env: production] --requires--> [Approval: human]
[MCP: deploy-server] --exposes--> [Tool: deploy_app]

Now the agent doesn’t need the full transcript to answer:

Can I deploy this?
Who approved it?
Which MCP tool should I use?
Is this delegation still valid?

It just queries the graph.

What this looks like in practice

You don’t need a PhD project here. A lightweight pattern works:

Keep short-term context in the prompt
- current task
- latest tool outputs
- immediate plan
Store durable memory in a graph
- identities
- permissions
- resources
- prior decisions
- tool capabilities
- delegation chains
Retrieve only relevant subgraphs per step
- not the whole history
- just the facts connected to the current task

This matters a lot for MCP because tool usage is rarely just “call function X.” It’s usually constrained by identity, access, policy, and prior state.

A tiny runnable example

Here’s a simple Node example using an in-memory graph to model agent memory as relationships instead of chat history:

npm install graphology

const Graph = require("graphology");

const graph = new Graph();

graph.addNode("alice", { type: "user" });
graph.addNode("build-bot", { type: "agent" });
graph.addNode("staging", { type: "env" });
graph.addNode("production", { type: "env" });

graph.addDirectedEdge("alice", "build-bot", { rel: "delegated" });
graph.addDirectedEdge("build-bot", "staging", { rel: "can_deploy" });
graph.addDirectedEdge("production", "build-bot", { rel: "requires_human_approval" });

console.log("Agent can deploy to staging:",
  graph.hasDirectedEdge("build-bot", "staging")
);

console.log("Production approval required:",
  graph.someEdge("production", "build-bot", edge =>
    graph.getEdgeAttribute(edge, "rel") === "requires_human_approval"
  )
);

That example is intentionally small, but the pattern scales: store facts and relationships once, retrieve them when needed.

Why this is especially useful for MCP agents

MCP gives agents a clean way to interact with tools, but the hard part isn’t just tool calling. It’s knowing:

which tools exist
which identity the agent is acting as
what scope that identity has
whether delegation is valid
whether the action needs approval
what happened earlier in the workflow

That’s memory, and memory is mostly graph-shaped.

If you’re already using OPA or another policy engine for authorization, great — keep using it. A knowledge graph doesn’t replace policy. It gives the agent a better way to remember the facts that policy depends on.

A practical architecture

User request
   |
   v
Agent runtime
   |
   +--> short-term prompt context
   |
   +--> graph lookup
   |      - identities
   |      - tool permissions
   |      - prior approvals
   |      - resource relationships
   |
   +--> MCP tool call
   |
   +--> write new facts back to graph

This is how you stop agents from “forgetting” critical constraints halfway through a workflow.

Not by making the prompt longer.

By giving the agent a memory model that matches the problem.

Try it yourself

If you’re working with MCP servers or agent security, a few free tools that are actually useful:

Want to check your MCP server? Try https://tools.authora.dev
Run npx @authora/agent-audit to scan your codebase
Add a verified badge to your agent: https://passport.authora.dev
Check out https://github.com/authora-dev/awesome-agent-security for more resources

The takeaway

If your agent gets worse as the session gets longer, you may not have a reasoning problem.

You may have a memory architecture problem.

Context windows are great for working memory. They’re terrible as a source of truth.

Knowledge graphs won’t magically fix every agent, but they’re one of the most practical ways to preserve identity, permissions, and task state without drowning the model in its own transcript.

How are you handling agent memory today — summaries, vector search, graphs, or something else? Drop your approach below.

-- Authora team

This post was created with AI assistance.

DEV Community