Kevin

Posted on Mar 24

How Conversation Memory Actually Works in AI Agents

#ai #programming #openclaw #agents

How Conversation Memory Actually Works in AI Agents

Ask most people how AI assistants remember things, and you'll get vague answers about context windows and vector databases. The reality is both simpler and more nuanced than the marketing suggests.

I've been running a self-hosted AI assistant (OpenClaw) as my daily driver for several months. The memory system is one of the things I've spent the most time thinking about, because it's the difference between a useful assistant and a forgetful chatbot.

Here's how it actually works — not in theory, but in practice.

The Context Window Is Not Memory

This is the most common misconception. The context window is the model's "working memory" — everything it can see during a single conversation. For modern models, this is somewhere between 128K and 1M tokens. That's a lot of text.

But it's not persistent. When you start a new conversation, the context window is empty. The model doesn't remember yesterday's conversation, your preferences, or the decision you made last week. It's starting fresh every time.

This is why many AI products feel inconsistent. You tell them your name, your preferred coding style, your project context. The next day, they've forgotten everything.

Real memory requires something beyond the context window. It requires persistence.

Two Layers of Memory

OpenClaw's memory system uses two layers, and the distinction between them is the key design decision.

Layer 1: Working Memory (MEMORY.md)

This is a curated Markdown file that gets injected into every conversation turn. Every time the agent starts processing a message, MEMORY.md is part of its context. The agent sees it. Always.

Think of it as the agent's always-available notepad. It contains the things that should always be in the agent's awareness: your name, your role, ongoing projects, key preferences, important decisions.

The critical constraint: because it's injected every turn, it consumes tokens every turn. A large MEMORY.md eats into your context window permanently. This creates a natural pressure to keep it concise — only the most important, most frequently relevant information belongs here.

Layer 2: Daily Memory (memory/YYYY-MM-DD.md)

These are daily log files that the agent writes to but doesn't automatically read. They're accessed on-demand through memory_search and memory_get tools.

When the agent decides something is worth remembering but not worth keeping in always-on context, it writes to the daily log. Next week, if the agent needs to recall what happened on a specific day, it searches the daily logs.

This is essentially the difference between things you always know (your name, where you live) and things you can look up (what you had for lunch last Tuesday). Both are "memory," but they have very different access patterns and costs.

Why Not Just Use a Vector Database?

The obvious question is: why not use embeddings and vector search like every other AI memory system?

Here's the thing — OpenClaw's memory system is just files. Markdown files on disk. No vector database, no embedding pipeline, no RAG system. The agent reads and writes text.

This feels almost irresponsibly simple compared to the architectures being presented at AI conferences. But it has properties that more complex systems lack:

Transparency. You can open MEMORY.md in any text editor and see exactly what the agent remembers. You can edit it. You can delete things. You can add things. Try doing that with a vector database.

Debuggability. When the agent says something that seems based on outdated information, you can grep the memory files and find the source. There's no "the embedding was close to this other embedding" mystery.

Version control. The workspace (including memory) can be a Git repo. You can track how your agent's memory evolves over time, roll back to a previous state, or diff changes.

Zero infrastructure. No database to maintain, no embedding model to run, no index to rebuild. The file system is the storage layer.

The trade-off is search quality. A file-based search with memory_search is less sophisticated than cosine similarity over dense embeddings. For a personal assistant, this turns out to be fine. The agent usually knows roughly when something happened or what topic it relates to, so keyword-based search in daily logs works well enough.

For a system with millions of memory entries across thousands of users, you'd need something more sophisticated. But OpenClaw is a personal assistant, not a knowledge management platform. The simpler approach fits the use case.

Context Compression

Long conversations eventually hit the context window limit. OpenClaw handles this with compaction — essentially asking the model to summarize the conversation so far, then replacing the full history with the summary.

What gets preserved:

Key decisions and outcomes
User preferences and instructions
Important context that would affect future responses
Tool results that are still relevant

What gets dropped:

Verbose intermediate steps
Redundant explanations
Tool call details that are no longer relevant

You can trigger compaction manually (/compact) or let it happen automatically when the context approaches its limit.

The interesting design choice is that compaction is lossy. It's not a lossless compression of the conversation — it's an opinionated summary. Information is lost. The model decides what's important enough to keep.

This means that very long conversations gradually lose detail in their early parts. The model remembers the gist of what was discussed three hours ago, but not the exact wording. This mirrors how human memory works, and it's usually acceptable.

Session Lifecycle

Memory isn't just about what gets remembered. It's about when conversations start and end.

OpenClaw supports several session lifecycle models:

Daily reset: Sessions expire at 4 AM by default. Each morning, you start fresh (but MEMORY.md persists).
Idle timeout: Sessions can expire after a period of inactivity.
Manual reset: Send /new to explicitly start a fresh conversation.
Persistent: Sessions never expire unless manually reset.

The daily reset is the default, and I think it's well-chosen. It prevents context from accumulating indefinitely (which would trigger constant compaction and degrade response quality) while maintaining day-to-day continuity through the persistent MEMORY.md.

The Memory Curation Problem

Here's something the documentation doesn't emphasize enough: the quality of your agent's memory depends on curation.

A MEMORY.md that grows unchecked becomes a dumping ground of outdated preferences, abandoned project notes, and contradictory instructions. The agent's behavior becomes inconsistent because its context is noisy.

The best approach I've found is treating MEMORY.md like you would a personal wiki: periodically review it, remove outdated entries, consolidate related items, and keep it focused on what's currently relevant.

This is manual work, and it's one of the hidden costs of running a persistent AI assistant. The agent can help with curation (you can ask it to review and clean up its own memory), but the judgment calls are yours.

What Good Memory Looks Like

After months of use, here's what a well-maintained memory system looks like:

MEMORY.md is 2-3 pages of concise, current information
Daily logs capture decisions, task outcomes, and temporary context
The agent can recall conversations from weeks ago by searching daily logs
The agent's behavior is consistent because its context is clean
You can audit what the agent knows by reading the files

It's not magic. It's not a neural network storing experiences in latent space. It's organized text that gets read every turn.

And that's the point. The best memory system isn't the most technically sophisticated one. It's the one you can understand, control, and maintain.

Full documentation: OpenClaw Docs

GitHub: openclaw/openclaw

This is Part 5 of a series on AI agent infrastructure. Follow for more.

DEV Community

How Conversation Memory Actually Works in AI Agents

How Conversation Memory Actually Works in AI Agents

The Context Window Is Not Memory

Two Layers of Memory

Why Not Just Use a Vector Database?

Context Compression

Session Lifecycle

The Memory Curation Problem

What Good Memory Looks Like

Top comments (0)