DEV Community

Cover image for Your AI Agent Keeps Forgetting. Here's Why Context Is the Real Architecture Problem
Arief Warazuhudien
Arief Warazuhudien

Posted on

Your AI Agent Keeps Forgetting. Here's Why Context Is the Real Architecture Problem

Your finance team tries an AI agent to help close the books. It has access to data. It seems capable. But the results are unsettling: sometimes it cites an accounting policy that expired last quarter, sometimes it mixes data from different legal entities, and sometimes it simply forgets it already approved a step. The team starts to wonder—is this agent helping, or creating more work?

This isn't a model problem. It's a context problem.

Many teams try to patch this with longer prompts, more complex instructions, or more aggressive document retrieval. The result is unstable. The agent looks brilliant one moment, then pulls the wrong document, forgets a prior decision, or violates an access boundary the next.

For an enterprise, context isn't an afterthought. It's the operational layer that determines whether an agent can make decisions that are relevant, safe, and accountable.

The Misconception: Context Is Just "More Information"

When an agent works on a single case, it rarely depends on one data source. It needs to combine transaction status from ERP, policies from a knowledge base, entity relationships from master data, decision history from previous workflows, and access boundaries based on user identity. You cannot dump all of this raw data into a prompt and expect clarity.

The context layer is what transforms raw data and knowledge into usable context for decision-making. It does four things:

  • Selects what is truly relevant for the task at hand.
  • Interprets information with business meaning—distinguishing an active policy from an old draft.
  • Permissions access so the agent only sees what it's allowed to.
  • Packages context in a form the agent can use efficiently.

Without these four functions, agents fall into two bad patterns: either they rely on bloated prompts crammed with every possible instruction, or they depend on uncontrolled retrieval that returns too much or too little.

A refined watercolor diagram showing the three-layer architecture: Data Foundation at the bottom, Context Layer (RAG, Knowledge Graph, Enterprise Memory) in the middle, and Agent Execution at the top, with control points for selection, interpretation, permissioning, and packaging on the left side.
The context layer sits between your data foundation and agent execution, with four control functions ensuring agents get the right, safe, and usable context.

RAG: Retrieval That Respects Boundaries

The most common component of a context layer is retrieval-augmented generation (RAG). Its job is straightforward: help the agent find relevant documents from the enterprise knowledge base and use them to answer, reason, or prepare an action.

For many use cases, this is a sensible starting point. Service desks reading knowledge articles. HR operations answering policy questions. Procurement referencing SOPs and contracts. Legal ops comparing clauses.

But good RAG is far harder than dumping documents into a vector database. Its quality is determined by six factors upstream of the search itself:

  1. Source quality. If your corpus mixes official policies, old drafts, informal emails, and orphaned files, retrieval will produce noise. RAG is only as good as its corpus.
  2. Chunking strategy. Documents must be split into retrievable units. Too large, and retrieval is fuzzy. Too small, and context breaks. Enterprise chunking should follow business document structure, not character counts.
  3. Metadata. Often more important than embeddings. Effective dates, version numbers, region, function, confidentiality level, active status, and document owner all make retrieval far more precise.
  4. Search strategy. Similarity search alone rarely suffices. Combine semantic search, keyword filters, metadata filters, and sometimes query expansion based on workflow context.
  5. Reranking. Initial results need reordering so the most relevant and authoritative pieces appear first—especially when multiple documents seem similar but have different business status.
  6. Answer evaluation. Don't judge RAG by whether the answer "sounds good." Test whether the agent actually retrieved the right document, cited a current policy, avoided mixing conflicting sources, and produced a genuinely useful answer.

One of the most dangerous mistakes: building technically clever RAG that is blind to permissions. An agent should not retrieve a document just because it's semantically relevant. It must also check whether that document is accessible to the user or workflow it represents. Permission-aware RAG applies access control during retrieval, not after the answer is formed.

Knowledge Graphs: When Relationships Matter More Than Documents

If RAG helps agents find what is written, knowledge graphs help them understand what is connected to what. A knowledge graph explicitly represents business entities and their relationships: customers, products, suppliers, contracts, assets, locations, policies, risks, employees, cases. And the relationships between them: a customer has a contract, a supplier provides components for a product, a product is subject to a specific policy.

For enterprise agents, graphs are valuable because many operational decisions cannot be made from a single document or table. They depend on a web of relationships.

Consider a supply chain control tower. When a shipment is delayed, the agent needs to understand: which customer order is this shipment tied to? What products are in that order? Which suppliers do those products depend on? Does that customer have a priority SLA? Which locations are affected? Are there escalation policies? All of this is far easier to model as a graph than as a stack of documents.

Many organizations avoid knowledge graphs because they imagine a massive, expensive, enterprise-wide project. That's unnecessary. A more realistic approach: start with domain-specific graphs for priority use cases. A graph for vendor-contract-category-policy relationships in procurement. A graph for customer-product-ticket-SLA in customer service. A graph for entity-account-journal-control in finance close.

Domain-first graphs deliver three advantages: faster time to value, easier validation by business owners, and simpler governance than trying to map the entire company at once.

Enterprise Memory: Remembering Without Enshrining Mistakes

The third component is memory. Memory allows an agent to retain context that isn't available in a single prompt or query. This matters because most enterprise work spans multiple steps, multiple days, and sometimes multiple teams.

But memory in an enterprise must be disciplined. Not everything needs to be remembered, and not all memories should be treated equally.

Four types of memory need distinction:

  • Session memory: Context within a single conversation or interaction. The agent remembers you're discussing invoice #4023. Useful for coherence, but usually doesn't need long-term storage.
  • Workflow memory: Status of ongoing work. What steps have been completed, what documents have been reviewed, what decisions have been made, who has approved, what exceptions remain open. Critical for workflows like finance close, procurement case management, or incident response.
  • User memory: Specific preferences or context about a user—preferred report format, working patterns. Can improve experience but must be managed carefully due to privacy and fairness concerns.
  • Institutional memory: Longer-lasting organizational learning. Patterns of recurring exceptions, treatments that worked before, human feedback on agent recommendations, repeated operational knowledge. Most valuable for continuous improvement, but also most risky if uncurated.

Without memory, agents operate like they have operational amnesia. Every case starts from zero. Handoffs between sessions are poor. Previous decisions are ignored. Human feedback is lost. Long workflows become fragile.

In IT operations, an agent handling recurring incidents should remember which runbook worked before, who the relevant approver is, and what system dependencies are common root causes. In collections, an agent should remember previous promise-to-pay commitments, customer responses, and follow-up actions already taken—so it doesn't send contradictory communications.

Enterprise memory requires four minimum disciplines: retention (what is stored, for how long, when deleted), privacy (memory may contain sensitive data, so storage and use must follow strict access policies), audit (the company must be able to explain what memory was used to produce a recommendation), and correction (if an agent stores a wrong conclusion, human feedback must be able to correct or flag that memory).

The Architecture Decision That Determines Trust

RAG, knowledge graphs, and memory don't replace each other. They complement each other. RAG helps agents retrieve relevant knowledge from documents. Knowledge graphs help agents understand business entity relationships. Memory helps agents maintain continuity across sessions, workflows, and operational learning.

In a mature enterprise workflow, all three work together. In procurement exception handling, RAG retrieves the relevant purchasing policy and contract clauses, the graph shows relationships between requester, category, supplier, contract, and approval path, and memory recalls that a similar case was previously rejected due to incomplete documentation. In finance close, RAG retrieves accounting guidance and closing SOPs, the graph maps entity-account-journal-control relationships, and memory stores the history of exceptions and previous controller decisions.

This is where the context layer becomes an execution layer, not just a search layer.

What this means in practice

For CIOs, the question is whether your organization has a governable context layer, or whether you're still relying on prompt engineering and ad-hoc retrieval. For COOs, on priority workflows, what context does the agent actually need to make relevant and safe decisions? For CHROs, if agents start storing user memory or institutional memory, are your privacy, fairness, and correction policies ready? For transformation leaders, are your agentic use cases built on trustworthy enterprise context, or on a demo retrieval that looks convincing but won't scale?

If the answers are unclear, the next priority isn't adding more agents. It's building a context layer that is accurate, relevant, and safe. Because that's where operational trust in agents is truly formed.


This article is based on original research and architecture guidance published at Your AI Agent Keeps Forgetting. Here's Why Context Is the Real Architecture Problem.

Top comments (0)