In Part 1, I mentioned there's one memory type that LLMs literally cannot implement — not with clever prompting, not with RAG, not with any current technique. Today we find out which one, and why it's the exact type multi-agent systems need most.
But first, let's build the vocabulary.
Why Taxonomy Matters
When someone says "my AI agent needs memory," they could mean completely different things:
- "I need it to remember what we discussed 5 minutes ago" → Short-term
- "I need it to remember the user's name across sessions" → Long-term semantic
- "I need it to remember that it tried approach X and it failed" → Episodic
- "I need it to get better at a specific task over time" → Procedural
Building the wrong type of memory for your use case wastes effort and creates frustrating user experiences.
Let's break down each type.
2.1 Short-Term Memory (Working Memory)
What it is: Information the system is actively processing right now.
Human equivalent: The thoughts you're holding in your head while solving a problem. Limited, temporary, high-attention.
In LLMs: This is the context window. Everything the model can see in a single API call.
┌─────────────────────────────────────────┐
│ Context Window │
│ │
│ [System Prompt] │
│ [Previous messages in this session] │
│ [Retrieved memories] │
│ [Current user message] │
│ │
│ ← Everything here is "working memory" │
└─────────────────────────────────────────┘
Characteristics:
| Property | Description |
|---|---|
| Duration | Single session or API call |
| Capacity | Limited by context window (128K-200K tokens typically) |
| Access | Immediate — model sees everything at once |
| Cost | Expensive — every token is processed |
When it's sufficient:
- Single-turn Q&A
- Short conversations (< 50 messages)
- Tasks that don't require historical context
When it breaks down:
- Long conversations (token limit)
- Multi-session continuity (context clears between sessions)
- Cost-sensitive applications at scale
2.2 Long-Term Memory (Persistent Storage)
What it is: Information that persists beyond a single session, stored externally and retrieved when needed.
Human equivalent: Everything you "know" that isn't currently in your active thoughts. Vast, durable, requires recall.
In AI systems: External databases (PostgreSQL, vector stores, etc.) that store information and inject relevant pieces into context when needed.
┌──────────────────┐ ┌──────────────────┐
│ Session 1 │ │ Session 2 │
│ │ │ │
│ "My name is │ │ "What's my │
│ John" │ │ name?" │
│ │ │ │
└────────┬─────────┘ └────────┬─────────┘
│ │
│ Extract & Store │ Retrieve
▼ ▼
┌─────────────────────────────────────────────────┐
│ Long-Term Memory Store │
│ │
│ { "user_name": "John", stored: "2025-12-01" } │
│ │
└─────────────────────────────────────────────────┘
Characteristics:
| Property | Description |
|---|---|
| Duration | Permanent (until explicitly deleted) |
| Capacity | Virtually unlimited |
| Access | Requires retrieval step (adds latency) |
| Cost | Storage cost + retrieval cost (much cheaper than context) |
The tradeoff: You can store unlimited information, but you can only retrieve and inject a subset into the context window. Retrieval quality becomes critical.
2.3 Episodic Memory (Specific Events)
What it is: Memory of specific events, conversations, or experiences — tied to a particular time and context.
Human equivalent: "Remember that meeting last Tuesday where John got angry about the deadline?"
In AI systems: Stored conversation logs, interaction records, or event traces with temporal metadata.
# Episodic memory record
{
"type": "episodic",
"timestamp": "2025-12-10T14:30:00Z",
"session_id": "abc123",
"event": "User discussed funding strategy",
"context": "User was exploring pre-seed vs seed options",
"participants": ["user", "assistant"],
"outcome": "Decided to target pre-seed first"
}
Key properties:
- Time-stamped — When did this happen?
- Contextual — What was the surrounding situation?
- Specific — Not abstracted into general facts
- Queryable by time — "What did we discuss last week?"
Use cases:
- "Continue where we left off"
- "What did I say about X last time?"
- Audit trails and compliance
- Debugging agent behavior ("Why did it do that?")
Retrieval challenge: Episodic memories are often verbose. You can't inject entire conversation transcripts. Systems typically:
- Store full episodes
- Generate summaries for retrieval
- Fetch full detail only when specifically needed
2.4 Semantic Memory (Facts & Knowledge)
What it is: Abstracted facts, knowledge, and information — divorced from the specific episode where they were learned.
Human equivalent: "Paris is the capital of France." You know this, but you probably don't remember when you learned it.
In AI systems: Extracted facts, user profiles, preferences, and knowledge bases.
# Semantic memory records
{
"type": "semantic",
"fact": "User's name is John",
"confidence": 1.0,
"source": "explicit_statement",
"first_seen": "2025-11-01",
"last_confirmed": "2025-12-15"
}
{
"type": "semantic",
"fact": "User prefers concise responses without bullet points",
"confidence": 0.8,
"source": "inferred_from_feedback",
"first_seen": "2025-11-15"
}
Key properties:
- Abstracted — General facts, not tied to specific moments
- Updateable — Facts can change; need conflict resolution
- Confidence-scored — Some facts are more certain than others
- Categorizable — Can be organized (preferences, demographics, knowledge)
The extraction problem:
Converting episodic to semantic memory requires judgment:
Episode: "User said 'I just started at Microsoft last week,
really excited about the AI team there'"
Extracted semantic memories:
- User works at Microsoft (high confidence)
- User is on the AI team (high confidence)
- User started recently (medium confidence — "last week" will become stale)
- User is excited about their job (medium confidence — emotional state may change)
What to extract? At what confidence? This is where LLM-based extraction comes in — and where systems differ in sophistication.
2.5 Procedural Memory (How to Do Things)
What it is: Memory of how to perform tasks, skills, and behaviors.
Human equivalent: How to ride a bike, how to write code, how to negotiate. You don't consciously recall steps — you just do it.
In AI systems: This is the trickiest category. Current implementations include:
| Approach | How It Works |
|---|---|
| Fine-tuning | Bake procedures into model weights |
| Few-shot examples | Store examples of correct behavior, inject when relevant |
| Tool configurations | Remember how to use specific tools/APIs |
| Workflow templates | Store successful action sequences |
# Procedural memory example
{
"type": "procedural",
"skill": "deploy_langraph_agent",
"steps": [
"Validate entry point exists",
"Check dependencies in requirements.txt",
"Build Docker container",
"Push to registry",
"Update deployment config"
],
"learned_from": ["session_xyz", "session_abc"],
"success_rate": 0.94
}
Why it's hard for LLMs:
LLMs don't truly "learn" procedures from memory injection — they follow instructions. Procedural memory in current systems is really:
- Storing successful patterns
- Retrieving relevant patterns
- Injecting them as instructions/examples
True procedural learning would require weight updates (fine-tuning), which can't happen at runtime.
Where this matters:
- Agents that repeat similar tasks should get better over time
- Learning user-specific workflows ("When I say 'deploy', run these 5 steps")
- Tool use patterns ("This API needs auth header X, learned from past failures")
2.6 The Memory Hierarchy in Cognitive Architectures
Now let's see how these types compose in a complete system:
┌─────────────────────────────────────────────────────────────────┐
│ Agent Runtime │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Working Memory │ │
│ │ (Context Window) │ │
│ │ │ │
│ │ • Current conversation │ │
│ │ • Retrieved long-term memories │ │
│ │ • Active task state │ │
│ └────────────────────────────────────────────────────────────┘ │
│ ↑↓ │
│ ┌────────────────────┴───────────────────┐ │
│ │ Retrieval Layer │ │
│ │ (Semantic search, filtering, ranking)│ │
│ └────────────────────┬───────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Long-Term Memory │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Episodic │ │ Semantic │ │ Procedural │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • Sessions │ │ • Facts │ │ • Skills │ │ │
│ │ │ • Events │ │ • Prefs │ │ • Patterns │ │ │
│ │ │ • Logs │ │ • Knowledge │ │ • Workflows │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
The flow:
- Input arrives → Goes into working memory
- Retrieval triggers → Query long-term stores for relevant memories
- Context assembly → Combine current input + retrieved memories
- Processing → LLM generates response using full context
- Storage → Extract new memories from interaction, persist to long-term
Different systems prioritize different memory types:
| System | Primary Focus |
|---|---|
| ChatGPT Memory | Semantic (user facts/preferences) |
| Claude Memory | Semantic + light Episodic (cross-conversation recall) |
| LangGraph Checkpointers | Episodic (workflow state) |
| mem0 | Semantic (extracted facts) |
| Aegis Memory | Semantic + Episodic + Scoped (multi-agent aware) |
Memory Type Decision Framework
When designing memory for your application, ask:
| Question | If Yes → Memory Type |
|---|---|
| Do I need to recall what happened at a specific time? | Episodic |
| Do I need to recall facts about the user/world? | Semantic |
| Do I need the agent to improve at tasks over time? | Procedural |
| Do I just need continuity within a session? | Working memory (context) is enough |
Most applications need a combination:
Personal Assistant:
├── Semantic (user preferences, facts) — Primary
├── Episodic (recent conversations) — Secondary
└── Procedural (user's common workflows) — Nice to have
Research Agent:
├── Episodic (what sources were checked) — Primary
├── Semantic (extracted findings) — Primary
└── Procedural (search strategies that work) — Secondary
Multi-Agent System:
├── Semantic (shared knowledge base) — Primary
├── Episodic (handoff history, who did what) — Primary
├── Scoped access (who can see what) — Critical ← This is the gap
└── Procedural (team coordination patterns) — Secondary
Module 2 Summary
| Memory Type | Duration | Content | AI Implementation |
|---|---|---|---|
| Working (Short-term) | Single session | Active context | Context window |
| Episodic | Permanent | Specific events | Conversation logs, event stores |
| Semantic | Permanent | Abstracted facts | Knowledge bases, user profiles |
| Procedural | Permanent | Skills, behaviors | Few-shot examples, fine-tuning |
Key insight: Most "memory" products today focus on semantic memory (facts extraction). Episodic is often just raw logs without smart retrieval. Procedural is largely unsolved at runtime.
The multi-agent gap: None of the standard memory types address who can access what. In multi-agent systems, you need memory scoping — which isn't a type of memory, but a property that cuts across all types.
In the next Part we will cover how products implement memory today. How leading AI systems — such as Claude’s memory and ChatGPT’s memory — are designed, why they work well for chat‑centric products, and why those same approaches break down for developer‑focused use cases.
Top comments (0)