DEV Community

Cover image for Your Multi-Agent System Has a Memory Problem
sly-the-fox
sly-the-fox

Posted on

Your Multi-Agent System Has a Memory Problem

Every multi-agent system I've seen break in production broke for the same reason: agents couldn't remember what already happened.

Not hallucination. Not bad prompts. Memory. The system had no way to persist decisions, track what was active, or share context between agents that needed to coordinate.

I run 39 agents in a production operating system. Solving the memory problem was harder than designing the agents themselves.

The three kinds of agent memory

Agent memory isn't one thing. It's at least three distinct problems, and most systems only solve the first.

1. Session memory (what happened in this conversation)

This is what you get for free with most LLM APIs. The context window. It works until the conversation ends, then it's gone.

# This is NOT memory. This is a conversation buffer.
messages = [
    {"role": "user", "content": "Deploy the new config"},
    {"role": "assistant", "content": "Config deployed to staging."}
]
# Session ends. Everything above disappears.
Enter fullscreen mode Exit fullscreen mode

2. Shared state (what's true right now across all agents)

This is where most systems fall apart. When Agent A makes a decision, Agent B needs to know about it before it acts. Not eventually. Before.

# Shared state: a set of canonical files every agent reads
SHARED_STATE = {
    "active/now.md": "Current tasks, owners, and status",
    "active/priorities.md": "Ranked priority list",
    "active/blockers.md": "What's stuck and why",
    "core/history/decisions.md": "Decisions with rationale"
}

def before_action(agent, task):
    """Every agent reads shared state before acting."""
    context = {}
    for path, purpose in SHARED_STATE.items():
        context[path] = read_file(path)
    return context
Enter fullscreen mode Exit fullscreen mode

The key insight: shared state must be file-based (or database-backed), not held in any single agent's context. Agents come and go. The state persists.

3. Institutional memory (what happened over time, why decisions were made)

This is the layer nobody builds until they need it. Two weeks into running a multi-agent system, someone asks "why is the deployment config set to X?" No agent remembers. The decision happened in a conversation that's long gone.

# A decision record with rationale
DECISION = {
    "id": "DEC-042",
    "date": "2026-03-15",
    "decision": "Use file-based shared state instead of Redis",
    "rationale": "Agents run in ephemeral sessions. Files persist across "
                 "sessions without requiring a running service.",
    "decided_by": "architect",
    "alternatives_considered": ["Redis", "SQLite", "env vars"]
}
Enter fullscreen mode Exit fullscreen mode

If you don't record the "why," you'll revisit the same decisions repeatedly. I've watched agents re-debate resolved questions because the rationale wasn't captured anywhere they could read.

The shared state pattern

Here's the pattern that works in my system. It's simple, and that's the point.

Rule 1: Canonical files are the source of truth.

Not agent memory. Not conversation history. Files on disk. Every agent reads them. A small set of agents are authorized to write them.

Rule 2: Read before you act.

Before any agent takes action, it reads relevant shared state. This is non-negotiable. An agent that acts without reading current state will contradict decisions made since its last session.

Rule 3: Write-back is immediate.

When an agent completes a task, it updates shared state in the same operation. Not "later." Not "at the end of the session." Immediately. Stale state causes cascading errors.

def complete_task(agent, task, result):
    """Task completion always includes state update."""
    # 1. Do the work
    execute(task)

    # 2. Update shared state (same operation)
    update_file("active/now.md", mark_complete(task))
    update_file("core/history/decisions.md", record_decision(task, result))

    # 3. Never separate these steps
Enter fullscreen mode Exit fullscreen mode

Rule 4: Ownership is explicit.

Not every agent can write to every file. The PM agent owns active/now.md. The architect owns core/history/decisions.md. Contention on shared files goes to a governor.

What this looks like in practice

My system's shared state layer is a directory called active/ with seven files:

File Purpose Owner
now.md Current tasks and status PM
priorities.md Ranked priorities Strategist
blockers.md What's stuck PM
risks.md Active risk register Sentinel
improvements.md Proposed system changes Improver
inbox.md Unrouted incoming items Router
coherence-metrics.md System health signals Evaluator

Every agent reads the files relevant to its work before acting. Seven agents write to them. The governor resolves conflicts.

Is it elegant? No. It's a directory of markdown files. But it's been running for two weeks without a single state conflict, and every agent sees the same picture of reality.

Start here

If your multi-agent system has more than two agents:

  1. Create one shared file that tracks "what's active right now"
  2. Make every agent read it before acting
  3. Make the responsible agent update it after completing work
  4. Log decisions with rationale, not just outcomes

You can get sophisticated later. Databases, event streams, vector stores. But the pattern underneath is always the same: agents need a shared, persistent, authoritative picture of the world. Without it, you're running parallel amnesiacs.


Building Sigil for cryptographic audit trails, because shared state is only trustworthy if you can prove it wasn't tampered with. More on agent architecture: The Alignment Layer.

Top comments (0)