DEV Community

丁久
丁久

Posted on • Originally published at dingjiu1989-hue.github.io

AI Agents Memory Patterns: Working, Episodic, Semantic, and Reflective Memory

This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.

AI Agents Memory Patterns: Working, Episodic, Semantic, and Reflective Memory

Why Memory Matters for AI Agents

An AI agent without memory is like a developer who forgets everything after each function call — you can only work with what's in front of you. Memory is what turns a stateless LLM call into a coherent agent that learns from past interactions, maintains context across sessions, and builds up knowledge over time. In 2026, several memory patterns have proven themselves in production. Here's what works.

Memory Hierarchy

Memory Type Scope Duration Storage Retrieval Example
Working Memory Single conversation Current session Context window (prompt) Direct inclusion Recent messages in a chat
Episodic Memory User/agent history Days to months Vector DB + metadata Semantic search + recency Past conversations, decisions made
Semantic Memory Facts, knowledge Persistent Vector DB / Graph DB / Document store Semantic search + structured queries User preferences, learned procedures
Procedural Memory How to do things Persistent Code / workflows / prompts Routed by task type Agent tool definitions, SOPs
Reflective Memory Meta-cognition Persistent Summarized insights Triggered by patterns "User prefers concise answers on weekdays"

Pattern 1: Summarization + Sliding Window (Basic)

The simplest pattern that works. Keep the last N messages (sliding window) plus a running summary of everything before that. When the conversation exceeds context limits, summarize the oldest messages and prepend to the context. Implementation: after every K messages, call the LLM to update the summary: "Here's the previous summary and new messages. Produce an updated summary that captures key decisions, facts, and context." This pattern alone handles 80% of agent memory needs. Tools like MemGPT (now Letta) use this pattern with automatic context management.

Pattern 2: Vector-Backed Episodic Memory (Intermediate)

Store every significant interaction as an "episode" in a vector database. Each episode: the user query, the agent's response/action, the outcome, relevant metadata (timestamp, topic tags, sentiment). On each new interaction: embed the user's query, retrieve top-K related past episodes, and include them as context. This gives the agent a form of "recollection" — it can reference past interactions that are semantically similar. Key implementation detail: include a recency boost (multiply similarity score by a time decay factor) so recent interactions are weighted higher.

Pattern 3: Structured Knowledge Graph (Advanced)

For agents that need to track entities and relationships: extract structured facts from conversations and store them in a graph or relational database. "User X prefers Python for data processing tasks" → (User:X)-[PREFERS]->(Language:Python, Context:"data processing"). On each interaction: retrieve relevant facts by entity matching, use them to personalize the response. This is more complex to implement but gives precise, queryable memory. Tools like LangGraph and Neo4j's LLM Knowledge Graph Builder automate much of the extraction.

Pattern 4: Reflection and Self-Improvement

Periodically (every N interactions, or triggered by low-quality responses), the agent reflects: "Review the last 10 interactions. What patterns do you notice? What could I do better? What user preferences have emerged?" The reflections are stored as compressed insights and included in future contexts. This is the pattern used by agents that improve over time — they "learn" that certain approaches work better for certain users or tasks. Implementation: a cron-style reflection job that runs asynchronously (not blocking the user interaction).

Production Considerations

Concern Approach
Memory bloat (too many stored episodes degrade retrieval) Prune old/low-importance memories. Score memories by: recency × relevance × importance. Delete below threshold.
Privacy / sensitive data Filter PII before storing. Allow users to view/delete their memory. Implement memory expiration policies.
Cost (embedding and storing every interaction) Batch embedding. Only store "significant" interactions (decisions made, preferences stated, errors encountered). Skip routine exchanges.
Hallucinated memories (agent "remembers" something incorrectly) Store original interaction alongside summarized memory. Periodically audit memory accuracy with spot checks.
Latency (retrieval takes time) Cache recent memories in-process. Async retrieval for non-critical context. Two-stage: fast vector search → rerank.

Starting point for 2026: Implement Pattern 1 (summarization + sliding window) first — it solves the immediate problem of context limits and handles most use cases. Add Pattern 2 (vector episodic memory) when users say "you don't remember our previous conversations." Add Pattern 3 (knowledge graph) when you need precise fact recall about entiti


Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.

Found this useful? Check out more developer guides and tool comparisons on AI Study Room.

Top comments (0)