DEV Community

Jamie Cole
Jamie Cole

Posted on

Three Memory Mistakes That Kill AI Agents in Production

Last week I watched an agent spiral into a loop, repeatedly trying to complete a task it had already finished three runs ago. The fix took five minutes once I understood what went wrong. The problem? It had no memory of its own history.

Here are the three memory mistakes I see most often in autonomous agent code, and how to fix each one.


Mistake 1: Treating the context window as your only memory

The context window feels like memory because the model can "see" everything in it. But it's not memory — it's working space. When your process exits, it's gone. When you hit the token limit, the oldest parts are truncated or dropped.

I've seen agents built where the entire history lives in the conversation list:

messages = []

def run_agent():
    while True:
        response = client.messages.create(
            messages=messages,
            ...
        )
        messages.append({"role": "assistant", "content": response.content})
        # This list grows forever, context gets truncated, agent "forgets" old context
Enter fullscreen mode Exit fullscreen mode

Two problems with this:

It's expensive. Every API call sends the full history. After 10 cycles, you're sending 10x the tokens for the same response quality.

It's brittle. Context truncation is silent. The model doesn't tell you it can't see messages from 3 hours ago — it just acts like they didn't happen.

The fix: Separate your memory into three tiers.

# Tier 1: Current context window (what the model sees this run)
# Keep this small — 5-10 recent messages maximum

# Tier 2: State file (what persists between runs)
# Everything important goes here: task status, decisions, findings
state = {
    "current_task": "...",
    "completed_steps": [...],
    "findings": {...},
    "last_run": "2026-02-26T07:00:00Z"
}

# Tier 3: External storage (searchable knowledge)
# SQLite, vector DB, files — for things you need to query
Enter fullscreen mode Exit fullscreen mode

When you start each run, build the context from your state file. Don't carry the conversation forward — reconstruct it:

def build_context(state: dict) -> list[dict]:
    """Build a fresh context from state, not from accumulated messages."""
    summary = f"""
Current task: {state['current_task']}
Completed: {', '.join(state['completed_steps'][-5:])}
Key findings: {json.dumps(state.get('findings', {}), indent=2)}
Last run: {state['last_run']}
"""
    return [{"role": "user", "content": f"Here is your current state:\n{summary}\n\nWhat should you do next?"}]
Enter fullscreen mode Exit fullscreen mode

Each run starts fresh with a compact context built from durable state. No accumulation, no truncation, no surprise forgetting.


Mistake 2: Writing state directly to disk

This one caused a data loss incident in one of my early agents. The agent was mid-write when the cron job timer triggered a new run, which tried to read the half-written state file and got a JSON parse error.

# This is wrong
def save_state(state):
    with open("state.json", "w") as f:
        json.dump(state, f)  # If this crashes mid-write, state.json is corrupt
Enter fullscreen mode Exit fullscreen mode

On most filesystems, writes are not atomic. A crash (or a second process reading the file) at the wrong moment gives you a partial write that's no longer valid JSON.

The fix is two lines:

import os
from pathlib import Path

def save_state(state: dict, path: Path = Path("state.json")) -> None:
    """Atomic state write — safe against crashes and concurrent reads."""
    tmp = path.with_suffix(".tmp")
    tmp.write_text(json.dumps(state, indent=2))
    tmp.replace(path)  # Atomic on POSIX — either succeeds fully or not at all
Enter fullscreen mode Exit fullscreen mode

Path.replace() is atomic on POSIX systems (Linux, macOS). The file either appears at the destination in full or doesn't appear at all. No partial writes.

Add a backup for extra safety:

def save_state(state: dict, path: Path = Path("state.json")) -> None:
    # Keep a backup of the last known-good state
    backup = path.with_suffix(".bak")
    if path.exists():
        path.replace(backup)  # Current → backup

    tmp = path.with_suffix(".tmp")
    tmp.write_text(json.dumps(state, indent=2))
    tmp.replace(path)  # New → current
Enter fullscreen mode Exit fullscreen mode

And always wrap loads with recovery:

def load_state(path: Path = Path("state.json")) -> dict:
    try:
        return json.loads(path.read_text())
    except (FileNotFoundError, json.JSONDecodeError):
        # Try backup
        backup = path.with_suffix(".bak")
        if backup.exists():
            try:
                state = json.loads(backup.read_text())
                print("Recovered from backup")
                return state
            except:
                pass
        return {}  # Fresh start
Enter fullscreen mode Exit fullscreen mode

Total cost: ~15 lines. Prevents corruption that can take hours to debug.


Mistake 3: Not distinguishing task state from conversation history

This is the subtlest mistake, and it's why agents "forget" things they should know.

There are two fundamentally different things an agent needs to remember:

What has happened (conversation history — ephemeral, for context)

What is true (task state — durable, for continuity)

Most beginner agent code treats these the same. The result is agents that know what they said last turn but forget what they decided last week.

Example of confusing the two:

# Wrong: storing decisions only in conversation history
messages = [
    {"role": "user", "content": "Research the market for X"},
    {"role": "assistant", "content": "I've decided to focus on segment Y because..."},
    # This "decision" is just text in a list. Next run: gone.
]
Enter fullscreen mode Exit fullscreen mode

The decision to "focus on segment Y" is task state. It should be written to a file. The conversation is just how you got there.

The fix: Extract durable state from the model's output and write it explicitly.

def extract_and_store_decision(response_text: str, state: dict) -> dict:
    """Ask the model to extract structured state from its own response."""

    extraction = client.messages.create(
        messages=[
            {"role": "user", "content": f"""
Extract any decisions or findings from this agent response as JSON.
Return ONLY a JSON object with these possible keys:
- decisions: list of decisions made
- findings: key-value pairs of what was learned
- next_step: what to do next
- status: current task status

Response to extract from:
{response_text}
"""}
        ],
        ...
    )

    try:
        extracted = json.loads(extraction.content[0].text)
        # Merge into state
        if "decisions" in extracted:
            state.setdefault("decisions", []).extend(extracted["decisions"])
        if "findings" in extracted:
            state.setdefault("findings", {}).update(extracted["findings"])
        if "next_step" in extracted:
            state["next_step"] = extracted["next_step"]
    except json.JSONDecodeError:
        pass

    return state
Enter fullscreen mode Exit fullscreen mode

This is more expensive (two API calls per turn) but the agent now has persistent memory of its own reasoning. The conversation window can be cleared; the state remains.


The Summary

Three mistakes, three fixes:

Mistake Fix
Context-only memory Three-tier memory: context + state file + external storage
Direct disk writes Atomic writes: write to .tmp, rename to .json
Conflating history and state Extract structured state from model output, store durably

None of these are complex. They're all things you wouldn't think of until you've had an agent corrupt its state or forget a decision it made three days ago.


Jamie Cole — indie developer, UK. I write about autonomous agent patterns from building my own.

Longer version of these patterns (with more code and edge cases): Autonomous AI Agents with Claude — A Practical Builder's Guide

If you're about to ship an agent, I also made a production readiness checklist — 50 checks before you go live.

Top comments (0)