DEV Community

Diven Rastdus
Diven Rastdus

Posted on

How to Give Your AI Agent a Memory That Actually Works

Ask any engineer who has shipped a production agent what the hardest problem is. The answer is always the same: memory.

Models forget everything between invocations. Every call to the API starts from an empty context window. The model has no awareness of previous conversations, previous tasks, previous errors, or previous successes. It does not remember that it already tried the web search approach and it failed. It does not remember that the user's database runs on port 5434. It does not remember the file format it is parsing has a known bug requiring a specific workaround.

Every piece of state that matters must be reconstructed from context at every invocation. This is fundamentally different from any other software system you have built.

The problem compounds with scale. A simple chatbot that forgets between turns is annoying but tolerable. An agent completing a multi-week research project cannot forget what it did last session. An agent managing a production system cannot forget the sequence of actions it already took. An agent building a codebase cannot forget the architectural decisions made in earlier sessions.

There are three distinct sub-problems here, each requiring a different solution:

  1. Within-session state: How does the agent remember what it did ten tool calls ago?
  2. Cross-session state: How does the agent remember what happened in previous sessions?
  3. Knowledge state: How does the agent maintain a growing body of knowledge about its domain?

None of these have a single clean answer. But each has patterns that work.

The Three Types of Agent Memory

Memory researchers use a framework from cognitive psychology that maps well onto agent systems.

Episodic memory is memory of what happened: the sequence of events, actions, and observations. In an agent, this is the action log -- which tools were called, with what parameters, and what they returned. This is the primary source of within-session state.

Semantic memory is memory of what is known: facts, concepts, relationships, patterns. In an agent, this is accumulated knowledge about the domain, the user's preferences, and learned facts from previous interactions. This is the primary source of cross-session knowledge retention.

Procedural memory is memory of how to do things: skills, workflows, standard procedures. In an agent, this is the set of documented approaches -- "when processing invoices, always validate the total against line items before submitting" or "when the database query returns empty, check whether the filters are too restrictive before concluding the data does not exist."

Each type needs a different storage strategy, and most production agents need all three.

Pattern 1: File-Based State

The simplest and most underestimated memory architecture is structured markdown files. Not glamorous. Remarkably effective for agents that operate on structured, human-readable state.

The pattern: the agent maintains markdown files representing its current understanding of the world. At session start, the agent reads these files. During the session, the agent writes updates as it learns new information. At session end, the files persist state for the next session.

A typical setup uses three files:

  • MEMORY.md -- master state: current projects, key decisions, important context
  • ACTIVE.md -- current tasks in flight with status
  • LESSONS.md -- mistakes made and the rules that prevent them from repeating
from pathlib import Path
from datetime import datetime

STATE_DIR = Path("/home/agent/state")

def read_agent_state() -> dict:
    """Read all state files at session start."""
    state_files = {
        "memory": "MEMORY.md",    # semantic: what the agent knows
        "active": "ACTIVE.md",    # working: current tasks
        "lessons": "LESSONS.md",  # procedural: how to handle situations
    }
    state = {}
    for key, filename in state_files.items():
        filepath = STATE_DIR / filename
        state[key] = filepath.read_text() if filepath.exists() else ""
    return state

def build_system_prompt_with_state(base_prompt: str) -> str:
    """Inject agent state into the system prompt."""
    state = read_agent_state()
    sections = []

    if state["memory"]:
        sections.append(f"## Your Memory\n{state['memory']}")
    if state["active"]:
        sections.append(f"## Active Tasks\n{state['active']}")
    if state["lessons"]:
        sections.append(f"## Lessons Learned\n{state['lessons']}")

    context = "\n\n".join(sections)
    return f"{base_prompt}\n\n---\n\n{context}"
Enter fullscreen mode Exit fullscreen mode

Why this works better than it sounds:

Human-readable. You can open these files and see exactly what the agent knows and is doing. Debugging is immediate compared to trying to interpret vector database embeddings or opaque JSON blobs.

Version-controllable. Files can be committed to git, giving you a complete history of how the agent's state evolved. When something goes wrong, you can diff the state files to see exactly what changed.

Zero infrastructure. No vector database to provision, no embedding model to pay for, no search index to maintain.

The limitation: file-based state does not scale to large amounts of information. If the agent has thousands of past interactions, loading all of them into every context window is impractical. File-based state works well for structured, curated information -- key decisions, active tasks, critical rules. It does not work for "find me something similar to this from my past experiences."

Pattern 2: Vector Databases for Semantic Retrieval

When the agent needs to find relevant past experiences given a description of the current situation, you need semantic search: "find me the most relevant things from the past even though I cannot specify exactly what they are."

Vector databases solve this. Convert text to numerical vectors (embeddings) that capture semantic meaning, store those vectors, and retrieve the most semantically similar ones when needed.

The RAG (Retrieval Augmented Generation) pattern is the standard implementation:

  1. Ingest: Convert documents or experiences to embeddings and store them.
  2. Query: Embed the current situation and retrieve the most similar items.
  3. Augment: Inject the retrieved items into the agent's context.
  4. Generate: The model reasons with both the current query and the retrieved context.

Here is a practical implementation using pgvector, which runs inside Postgres and avoids the operational complexity of a separate vector database service:

import openai
import psycopg2
from pgvector.psycopg2 import register_vector

conn = psycopg2.connect(os.environ["DATABASE_URL"])
register_vector(conn)
embedding_client = openai.OpenAI()

def get_embedding(text: str) -> list[float]:
    response = embedding_client.embeddings.create(
        model="text-embedding-3-small",
        input=text.replace("\n", " ")
    )
    return response.data[0].embedding

def store_memory(content: str, memory_type: str) -> None:
    embedding = get_embedding(content)
    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO agent_memories (content, memory_type, embedding, created_at) "
            "VALUES (%s, %s, %s, NOW())",
            (content, memory_type, embedding)
        )
    conn.commit()

def retrieve_relevant_memories(query: str, top_k: int = 5) -> list[dict]:
    query_embedding = get_embedding(query)
    with conn.cursor() as cur:
        cur.execute(
            """SELECT content, memory_type,
                      1 - (embedding <=> %s::vector) as similarity
               FROM agent_memories
               WHERE 1 - (embedding <=> %s::vector) > 0.7
               ORDER BY similarity DESC
               LIMIT %s""",
            [query_embedding, query_embedding, top_k]
        )
        rows = cur.fetchall()
    return [{"content": r[0], "memory_type": r[1], "similarity": float(r[2])} for r in rows]
Enter fullscreen mode Exit fullscreen mode

And the schema to set it up:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE agent_memories (
    id          SERIAL PRIMARY KEY,
    content     TEXT NOT NULL,
    memory_type TEXT NOT NULL,
    embedding   VECTOR(1536),
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX ON agent_memories
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);
Enter fullscreen mode Exit fullscreen mode

The important limitation: vector databases are lossy by design. Semantic retrieval finds "similar," not "exact." If you need to retrieve a specific fact -- "what API key did the agent store last session?" -- vector search is the wrong tool. Use a structured database for deterministic lookups.

Pattern 3: Structured Databases for Relational State

Some agent memory is relational: structured records with specific fields, queryable with precise filters. User preferences, task status, configuration values -- these need a structured database.

The text-to-SQL pattern lets agents query structured data naturally:

DATABASE_SCHEMA = """
Tables:
- tasks(id, title, status, created_at, priority)
- user_preferences(key, value, updated_at)
- action_log(id, action_type, tool_name, params, result, created_at)
"""

def query_database(natural_language_query: str) -> list[dict]:
    response = client.messages.create(
        model="claude-sonnet-4.6",
        max_tokens=512,
        system=f"""Convert natural language to PostgreSQL SELECT queries.
        Return ONLY the SQL. Never generate INSERT, UPDATE, DELETE, or DROP.
        Schema: {DATABASE_SCHEMA}""",
        messages=[{"role": "user", "content": natural_language_query}]
    )

    sql = response.content[0].text.strip()

    # Enforce read-only at the code level, not just in the prompt
    if not sql.upper().strip().startswith("SELECT"):
        raise ValueError(f"Non-SELECT query generated: {sql}")

    with conn.cursor() as cur:
        cur.execute(sql)
        columns = [desc[0] for desc in cur.description]
        rows = cur.fetchall()

    return [dict(zip(columns, row)) for row in rows]
Enter fullscreen mode Exit fullscreen mode

Note that the SELECT-only check is enforced in code, not just in the prompt. The model generating SQL is given strict instructions to produce only SELECT statements, and this is verified programmatically. Even with read-only queries, your database user should have only SELECT permissions on the tables the agent can access. Principle of least privilege applies at the database layer.

Pattern 4: The Hybrid Architecture

Production agents rarely need just one memory type. They need structured state, semantic retrieval, and exact-match lookups working together.

Think about memory in terms of temperature:

Hot memory is in the context window right now. Immediately available, limited by context size, costs tokens. Put here: current task state, recent conversation history, immediate tool results.

Warm memory is quickly retrievable -- well-indexed files, cached queries. Requires a retrieval step (milliseconds) but holds more than the context window. Put here: state files, frequently accessed facts, recent episodic memory.

Cold memory is stored but not cached -- full action logs, archived knowledge. Retrieval requires database query or embedding lookup. Put here: long-term episodic memory, comprehensive knowledge bases.

class HybridMemoryManager:
    def __init__(self, state_dir: str, db_url: str):
        self.state_dir = Path(state_dir)
        self.conn = psycopg2.connect(db_url)
        register_vector(self.conn)

    def load_hot_context(self) -> str:
        """Load warm memory into context for the current session."""
        sections = []
        for filename in ["MEMORY.md", "ACTIVE.md", "LESSONS.md"]:
            filepath = self.state_dir / filename
            if filepath.exists():
                content = filepath.read_text()
                if content.strip():
                    sections.append(f"## {filename}\n{content}")
        return "\n\n".join(sections)

    def retrieve_relevant_context(self, query: str) -> str:
        """Pull relevant cold memory into context."""
        memories = retrieve_relevant_memories(query, top_k=3)
        if not memories:
            return ""
        formatted = "\n\n".join([
            f"[Relevance: {m['similarity']:.0%}]\n{m['content']}"
            for m in memories
        ])
        return f"## Relevant Past Experiences\n{formatted}"

    def store_episode(self, summary: str, outcome: str) -> None:
        """Store a completed episode in cold memory."""
        store_memory(f"Task: {summary}\nOutcome: {outcome}", "episodic")
Enter fullscreen mode Exit fullscreen mode

The Lessons Pattern: Memory That Prevents Repeated Mistakes

The most valuable memory pattern in production agents is also the simplest: after every failure, write down what went wrong and the rule that prevents it from happening again.

This is the lessons pattern. An agent that remembers its mistakes and applies corrective rules consistently outperforms an agent with a better model but no failure memory. Models are expensive to upgrade. Rules are free to write.

The pattern has two parts: capture and replay.

Capture happens when something goes wrong:

## 2026-03-15: Date range query returned no results for valid data

What happened: Queried the users table for records created "this week."
Got zero results. Spent three tool calls investigating before discovering
that "this week" was interpreted as starting Monday, but the database
stores timestamps in UTC and the query ran at 11pm on Sunday local time.

Rule: Always use explicit UTC timestamps, not relative date expressions.
Convert user-supplied dates to UTC before building queries.
Enter fullscreen mode Exit fullscreen mode

Replay happens at session start. The lessons file is loaded into context, and the model reads the rules before starting work. The rules become part of the active context, not just historical documentation.

def automated_lesson_capture(error_description: str, context: dict) -> str:
    """Use the model to extract a reusable rule from an error."""
    response = client.messages.create(
        model="claude-sonnet-4.6",
        max_tokens=512,
        system="""Extract generalizable rules from specific errors.
        Given an error and context, produce:
        TITLE: [class of mistake]
        WHAT: [what happened, 2-3 sentences]
        RULE: [rule that prevents this in the future]""",
        messages=[{"role": "user", "content": f"Error: {error_description}\nContext: {context}"}]
    )

    timestamp = datetime.now().strftime("%Y-%m-%d")
    lesson = f"\n## {timestamp}\n\n{response.content[0].text}\n\n---\n"

    lessons_file = Path("state/LESSONS.md")
    with open(lessons_file, "a") as f:
        f.write(lesson)

    # Also store in vector DB for semantic retrieval
    store_memory(lesson, "procedural")
    return lesson
Enter fullscreen mode Exit fullscreen mode

The lessons pattern is what separates agents that get better over time from agents that repeat the same mistakes indefinitely. The implementation overhead is low: one file, one write on failure, one read at session start. The benefit compounds with every mistake the agent learns from.

Choosing the Right Pattern

Three questions drive the decision:

How structured is the information? Structured data (key-value pairs, task status, preferences) belongs in a relational database. Unstructured data (conversations, learned facts, experience summaries) belongs in a vector store or file system.

How often is it accessed? Information accessed every session belongs in warm memory (files). Information accessed occasionally based on relevance belongs in cold memory (vector database). Information needed for the current operation belongs in the context window.

Does it need to be exact or approximate? Exact lookups (get the API key, get task #42) go to a relational database. Approximate retrieval (find experiences similar to this situation) go to a vector database.

Start with file-based state. It handles most memory needs with minimal infrastructure. Add vector search when accumulated history makes loading all of it into context impractical. Add a structured database when you have relational data agents need to query precisely. This progression matches the actual complexity of most agents and avoids over-engineering before you know what you need.

Memory is not optional infrastructure. It is one of the four core components. An agent without memory is a stateless function, not an agent.


This post is adapted from Production AI Agents: Build, Deploy, and Monetize Autonomous Systems, available on Amazon Kindle. The book goes deeper with 12 chapters of real code, battle-tested patterns, and a complete hands-on tutorial.

I build production AI systems. More at astraedus.dev.

Top comments (0)