Arulnidhi Karunanidhi

Posted on Feb 10

Part 2: The 5 Types of Memory Every AI Agent Needs (And Why LLMs Can Only Do 2)

#agents #ai #architecture #llm

In Part 1, I mentioned there's one memory type that LLMs literally cannot implement — not with clever prompting, not with RAG, not with any current technique. Today we find out which one, and why it's the exact type multi-agent systems need most.
But first, let's build the vocabulary.

Why Taxonomy Matters

When someone says "my AI agent needs memory," they could mean completely different things:

"I need it to remember what we discussed 5 minutes ago" → Short-term
"I need it to remember the user's name across sessions" → Long-term semantic
"I need it to remember that it tried approach X and it failed" → Episodic
"I need it to get better at a specific task over time" → Procedural

Building the wrong type of memory for your use case wastes effort and creates frustrating user experiences.

Let's break down each type.

2.1 Short-Term Memory (Working Memory)

What it is: Information the system is actively processing right now.

Human equivalent: The thoughts you're holding in your head while solving a problem. Limited, temporary, high-attention.

In LLMs: This is the context window. Everything the model can see in a single API call.

┌─────────────────────────────────────────┐
│           Context Window                │
│                                         │
│  [System Prompt]                        │
│  [Previous messages in this session]    │
│  [Retrieved memories]                   │
│  [Current user message]                 │
│                                         │
│  ← Everything here is "working memory"  │
└─────────────────────────────────────────┘

Characteristics:

Property	Description
Duration	Single session or API call
Capacity	Limited by context window (128K-200K tokens typically)
Access	Immediate — model sees everything at once
Cost	Expensive — every token is processed

When it's sufficient:

Single-turn Q&A
Short conversations (< 50 messages)
Tasks that don't require historical context

When it breaks down:

Long conversations (token limit)
Multi-session continuity (context clears between sessions)
Cost-sensitive applications at scale

2.2 Long-Term Memory (Persistent Storage)

What it is: Information that persists beyond a single session, stored externally and retrieved when needed.

Human equivalent: Everything you "know" that isn't currently in your active thoughts. Vast, durable, requires recall.

In AI systems: External databases (PostgreSQL, vector stores, etc.) that store information and inject relevant pieces into context when needed.

┌──────────────────┐         ┌──────────────────┐
│   Session 1      │         │   Session 2      │
│                  │         │                  │
│  "My name is     │         │  "What's my      │
│   John"          │         │   name?"         │
│                  │         │                  │
└────────┬─────────┘         └────────┬─────────┘
         │                            │
         │ Extract & Store            │ Retrieve
         ▼                            ▼
┌─────────────────────────────────────────────────┐
│              Long-Term Memory Store             │
│                                                 │
│  { "user_name": "John", stored: "2025-12-01" }  │
│                                                 │
└─────────────────────────────────────────────────┘

Characteristics:

Property	Description
Duration	Permanent (until explicitly deleted)
Capacity	Virtually unlimited
Access	Requires retrieval step (adds latency)
Cost	Storage cost + retrieval cost (much cheaper than context)

The tradeoff: You can store unlimited information, but you can only retrieve and inject a subset into the context window. Retrieval quality becomes critical.

2.3 Episodic Memory (Specific Events)

What it is: Memory of specific events, conversations, or experiences — tied to a particular time and context.

Human equivalent: "Remember that meeting last Tuesday where John got angry about the deadline?"

In AI systems: Stored conversation logs, interaction records, or event traces with temporal metadata.

# Episodic memory record
{
    "type": "episodic",
    "timestamp": "2025-12-10T14:30:00Z",
    "session_id": "abc123",
    "event": "User discussed funding strategy",
    "context": "User was exploring pre-seed vs seed options",
    "participants": ["user", "assistant"],
    "outcome": "Decided to target pre-seed first"
}

Key properties:

Time-stamped — When did this happen?
Contextual — What was the surrounding situation?
Specific — Not abstracted into general facts
Queryable by time — "What did we discuss last week?"

Use cases:

"Continue where we left off"
"What did I say about X last time?"
Audit trails and compliance
Debugging agent behavior ("Why did it do that?")

Retrieval challenge: Episodic memories are often verbose. You can't inject entire conversation transcripts. Systems typically:

Store full episodes
Generate summaries for retrieval
Fetch full detail only when specifically needed

2.4 Semantic Memory (Facts & Knowledge)

What it is: Abstracted facts, knowledge, and information — divorced from the specific episode where they were learned.

Human equivalent: "Paris is the capital of France." You know this, but you probably don't remember when you learned it.

In AI systems: Extracted facts, user profiles, preferences, and knowledge bases.

# Semantic memory records
{
    "type": "semantic",
    "fact": "User's name is John",
    "confidence": 1.0,
    "source": "explicit_statement",
    "first_seen": "2025-11-01",
    "last_confirmed": "2025-12-15"
}

{
    "type": "semantic", 
    "fact": "User prefers concise responses without bullet points",
    "confidence": 0.8,
    "source": "inferred_from_feedback",
    "first_seen": "2025-11-15"
}

Key properties:

Abstracted — General facts, not tied to specific moments
Updateable — Facts can change; need conflict resolution
Confidence-scored — Some facts are more certain than others
Categorizable — Can be organized (preferences, demographics, knowledge)

The extraction problem:

Converting episodic to semantic memory requires judgment:

Episode: "User said 'I just started at Microsoft last week, 
          really excited about the AI team there'"

Extracted semantic memories:
- User works at Microsoft (high confidence)
- User is on the AI team (high confidence)  
- User started recently (medium confidence — "last week" will become stale)
- User is excited about their job (medium confidence — emotional state may change)

What to extract? At what confidence? This is where LLM-based extraction comes in — and where systems differ in sophistication.

2.5 Procedural Memory (How to Do Things)

What it is: Memory of how to perform tasks, skills, and behaviors.

Human equivalent: How to ride a bike, how to write code, how to negotiate. You don't consciously recall steps — you just do it.

In AI systems: This is the trickiest category. Current implementations include:

Approach	How It Works
Fine-tuning	Bake procedures into model weights
Few-shot examples	Store examples of correct behavior, inject when relevant
Tool configurations	Remember how to use specific tools/APIs
Workflow templates	Store successful action sequences

# Procedural memory example
{
    "type": "procedural",
    "skill": "deploy_langraph_agent",
    "steps": [
        "Validate entry point exists",
        "Check dependencies in requirements.txt",
        "Build Docker container",
        "Push to registry",
        "Update deployment config"
    ],
    "learned_from": ["session_xyz", "session_abc"],
    "success_rate": 0.94
}

Why it's hard for LLMs:

LLMs don't truly "learn" procedures from memory injection — they follow instructions. Procedural memory in current systems is really:

Storing successful patterns
Retrieving relevant patterns
Injecting them as instructions/examples

True procedural learning would require weight updates (fine-tuning), which can't happen at runtime.

Where this matters:

Agents that repeat similar tasks should get better over time
Learning user-specific workflows ("When I say 'deploy', run these 5 steps")
Tool use patterns ("This API needs auth header X, learned from past failures")

2.6 The Memory Hierarchy in Cognitive Architectures

Now let's see how these types compose in a complete system:

┌─────────────────────────────────────────────────────────────────┐
│                        Agent Runtime                            │
│                                                                 │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                   Working Memory                           │ │
│  │                   (Context Window)                         │ │
│  │                                                            │ │
│  │  • Current conversation                                    │ │
│  │  • Retrieved long-term memories                            │ │
│  │  • Active task state                                       │ │
│  └────────────────────────────────────────────────────────────┘ │
│                              ↑↓                                 │
│         ┌────────────────────┴───────────────────┐              │
│         │           Retrieval Layer              │              │
│         │   (Semantic search, filtering, ranking)│              │
│         └────────────────────┬───────────────────┘              │
│                              ↓                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                   Long-Term Memory                       │   │
│  │                                                          │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │   │
│  │  │  Episodic   │  │  Semantic   │  │ Procedural  │       │   │
│  │  │             │  │             │  │             │       │   │
│  │  │ • Sessions  │  │ • Facts     │  │ • Skills    │       │   │
│  │  │ • Events    │  │ • Prefs     │  │ • Patterns  │       │   │
│  │  │ • Logs      │  │ • Knowledge │  │ • Workflows │       │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘       │   │
│  │                                                          │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The flow:

Input arrives → Goes into working memory
Retrieval triggers → Query long-term stores for relevant memories
Context assembly → Combine current input + retrieved memories
Processing → LLM generates response using full context
Storage → Extract new memories from interaction, persist to long-term

Different systems prioritize different memory types:

System	Primary Focus
ChatGPT Memory	Semantic (user facts/preferences)
Claude Memory	Semantic + light Episodic (cross-conversation recall)
LangGraph Checkpointers	Episodic (workflow state)
mem0	Semantic (extracted facts)
Aegis Memory	Semantic + Episodic + Scoped (multi-agent aware)

Memory Type Decision Framework

When designing memory for your application, ask:

Question	If Yes → Memory Type
Do I need to recall what happened at a specific time?	Episodic
Do I need to recall facts about the user/world?	Semantic
Do I need the agent to improve at tasks over time?	Procedural
Do I just need continuity within a session?	Working memory (context) is enough

Most applications need a combination:

Personal Assistant:
├── Semantic (user preferences, facts) — Primary
├── Episodic (recent conversations) — Secondary
└── Procedural (user's common workflows) — Nice to have

Research Agent:
├── Episodic (what sources were checked) — Primary
├── Semantic (extracted findings) — Primary
└── Procedural (search strategies that work) — Secondary

Multi-Agent System:
├── Semantic (shared knowledge base) — Primary
├── Episodic (handoff history, who did what) — Primary
├── Scoped access (who can see what) — Critical ← This is the gap
└── Procedural (team coordination patterns) — Secondary

Module 2 Summary

Memory Type	Duration	Content	AI Implementation
Working (Short-term)	Single session	Active context	Context window
Episodic	Permanent	Specific events	Conversation logs, event stores
Semantic	Permanent	Abstracted facts	Knowledge bases, user profiles
Procedural	Permanent	Skills, behaviors	Few-shot examples, fine-tuning

Key insight: Most "memory" products today focus on semantic memory (facts extraction). Episodic is often just raw logs without smart retrieval. Procedural is largely unsolved at runtime.

The multi-agent gap: None of the standard memory types address who can access what. In multi-agent systems, you need memory scoping — which isn't a type of memory, but a property that cuts across all types.

In the next Part we will cover how products implement memory today. How leading AI systems — such as Claude’s memory and ChatGPT’s memory — are designed, why they work well for chat‑centric products, and why those same approaches break down for developer‑focused use cases.

DEV Community