DEV Community

Cover image for Part 2: The 5 Types of Memory Every AI Agent Needs (And Why LLMs Can Only Do 2)
Arulnidhi Karunanidhi
Arulnidhi Karunanidhi

Posted on

Part 2: The 5 Types of Memory Every AI Agent Needs (And Why LLMs Can Only Do 2)

In Part 1, I mentioned there's one memory type that LLMs literally cannot implement — not with clever prompting, not with RAG, not with any current technique. Today we find out which one, and why it's the exact type multi-agent systems need most.
But first, let's build the vocabulary.

Why Taxonomy Matters

When someone says "my AI agent needs memory," they could mean completely different things:

  • "I need it to remember what we discussed 5 minutes ago" → Short-term
  • "I need it to remember the user's name across sessions" → Long-term semantic
  • "I need it to remember that it tried approach X and it failed" → Episodic
  • "I need it to get better at a specific task over time" → Procedural

Building the wrong type of memory for your use case wastes effort and creates frustrating user experiences.

Let's break down each type.

2.1 Short-Term Memory (Working Memory)

What it is: Information the system is actively processing right now.

Human equivalent: The thoughts you're holding in your head while solving a problem. Limited, temporary, high-attention.

In LLMs: This is the context window. Everything the model can see in a single API call.

┌─────────────────────────────────────────┐
│           Context Window                │
│                                         │
│  [System Prompt]                        │
│  [Previous messages in this session]    │
│  [Retrieved memories]                   │
│  [Current user message]                 │
│                                         │
│  ← Everything here is "working memory"  │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Characteristics:

Property Description
Duration Single session or API call
Capacity Limited by context window (128K-200K tokens typically)
Access Immediate — model sees everything at once
Cost Expensive — every token is processed

When it's sufficient:

  • Single-turn Q&A
  • Short conversations (< 50 messages)
  • Tasks that don't require historical context

When it breaks down:

  • Long conversations (token limit)
  • Multi-session continuity (context clears between sessions)
  • Cost-sensitive applications at scale

2.2 Long-Term Memory (Persistent Storage)

What it is: Information that persists beyond a single session, stored externally and retrieved when needed.

Human equivalent: Everything you "know" that isn't currently in your active thoughts. Vast, durable, requires recall.

In AI systems: External databases (PostgreSQL, vector stores, etc.) that store information and inject relevant pieces into context when needed.

┌──────────────────┐         ┌──────────────────┐
│   Session 1      │         │   Session 2      │
│                  │         │                  │
│  "My name is     │         │  "What's my      │
│   John"          │         │   name?"         │
│                  │         │                  │
└────────┬─────────┘         └────────┬─────────┘
         │                            │
         │ Extract & Store            │ Retrieve
         ▼                            ▼
┌─────────────────────────────────────────────────┐
│              Long-Term Memory Store             │
│                                                 │
│  { "user_name": "John", stored: "2025-12-01" }  │
│                                                 │
└─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Characteristics:

Property Description
Duration Permanent (until explicitly deleted)
Capacity Virtually unlimited
Access Requires retrieval step (adds latency)
Cost Storage cost + retrieval cost (much cheaper than context)

The tradeoff: You can store unlimited information, but you can only retrieve and inject a subset into the context window. Retrieval quality becomes critical.


2.3 Episodic Memory (Specific Events)

What it is: Memory of specific events, conversations, or experiences — tied to a particular time and context.

Human equivalent: "Remember that meeting last Tuesday where John got angry about the deadline?"

In AI systems: Stored conversation logs, interaction records, or event traces with temporal metadata.

# Episodic memory record
{
    "type": "episodic",
    "timestamp": "2025-12-10T14:30:00Z",
    "session_id": "abc123",
    "event": "User discussed funding strategy",
    "context": "User was exploring pre-seed vs seed options",
    "participants": ["user", "assistant"],
    "outcome": "Decided to target pre-seed first"
}
Enter fullscreen mode Exit fullscreen mode

Key properties:

  • Time-stamped — When did this happen?
  • Contextual — What was the surrounding situation?
  • Specific — Not abstracted into general facts
  • Queryable by time — "What did we discuss last week?"

Use cases:

  • "Continue where we left off"
  • "What did I say about X last time?"
  • Audit trails and compliance
  • Debugging agent behavior ("Why did it do that?")

Retrieval challenge: Episodic memories are often verbose. You can't inject entire conversation transcripts. Systems typically:

  1. Store full episodes
  2. Generate summaries for retrieval
  3. Fetch full detail only when specifically needed

2.4 Semantic Memory (Facts & Knowledge)

What it is: Abstracted facts, knowledge, and information — divorced from the specific episode where they were learned.

Human equivalent: "Paris is the capital of France." You know this, but you probably don't remember when you learned it.

In AI systems: Extracted facts, user profiles, preferences, and knowledge bases.

# Semantic memory records
{
    "type": "semantic",
    "fact": "User's name is John",
    "confidence": 1.0,
    "source": "explicit_statement",
    "first_seen": "2025-11-01",
    "last_confirmed": "2025-12-15"
}

{
    "type": "semantic", 
    "fact": "User prefers concise responses without bullet points",
    "confidence": 0.8,
    "source": "inferred_from_feedback",
    "first_seen": "2025-11-15"
}
Enter fullscreen mode Exit fullscreen mode

Key properties:

  • Abstracted — General facts, not tied to specific moments
  • Updateable — Facts can change; need conflict resolution
  • Confidence-scored — Some facts are more certain than others
  • Categorizable — Can be organized (preferences, demographics, knowledge)

The extraction problem:

Converting episodic to semantic memory requires judgment:

Episode: "User said 'I just started at Microsoft last week, 
          really excited about the AI team there'"

Extracted semantic memories:
- User works at Microsoft (high confidence)
- User is on the AI team (high confidence)  
- User started recently (medium confidence — "last week" will become stale)
- User is excited about their job (medium confidence — emotional state may change)
Enter fullscreen mode Exit fullscreen mode

What to extract? At what confidence? This is where LLM-based extraction comes in — and where systems differ in sophistication.


2.5 Procedural Memory (How to Do Things)

What it is: Memory of how to perform tasks, skills, and behaviors.

Human equivalent: How to ride a bike, how to write code, how to negotiate. You don't consciously recall steps — you just do it.

In AI systems: This is the trickiest category. Current implementations include:

Approach How It Works
Fine-tuning Bake procedures into model weights
Few-shot examples Store examples of correct behavior, inject when relevant
Tool configurations Remember how to use specific tools/APIs
Workflow templates Store successful action sequences
# Procedural memory example
{
    "type": "procedural",
    "skill": "deploy_langraph_agent",
    "steps": [
        "Validate entry point exists",
        "Check dependencies in requirements.txt",
        "Build Docker container",
        "Push to registry",
        "Update deployment config"
    ],
    "learned_from": ["session_xyz", "session_abc"],
    "success_rate": 0.94
}
Enter fullscreen mode Exit fullscreen mode

Why it's hard for LLMs:

LLMs don't truly "learn" procedures from memory injection — they follow instructions. Procedural memory in current systems is really:

  1. Storing successful patterns
  2. Retrieving relevant patterns
  3. Injecting them as instructions/examples

True procedural learning would require weight updates (fine-tuning), which can't happen at runtime.

Where this matters:

  • Agents that repeat similar tasks should get better over time
  • Learning user-specific workflows ("When I say 'deploy', run these 5 steps")
  • Tool use patterns ("This API needs auth header X, learned from past failures")

2.6 The Memory Hierarchy in Cognitive Architectures

Now let's see how these types compose in a complete system:

┌─────────────────────────────────────────────────────────────────┐
│                        Agent Runtime                            │
│                                                                 │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                   Working Memory                           │ │
│  │                   (Context Window)                         │ │
│  │                                                            │ │
│  │  • Current conversation                                    │ │
│  │  • Retrieved long-term memories                            │ │
│  │  • Active task state                                       │ │
│  └────────────────────────────────────────────────────────────┘ │
│                              ↑↓                                 │
│         ┌────────────────────┴───────────────────┐              │
│         │           Retrieval Layer              │              │
│         │   (Semantic search, filtering, ranking)│              │
│         └────────────────────┬───────────────────┘              │
│                              ↓                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                   Long-Term Memory                       │   │
│  │                                                          │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │   │
│  │  │  Episodic   │  │  Semantic   │  │ Procedural  │       │   │
│  │  │             │  │             │  │             │       │   │
│  │  │ • Sessions  │  │ • Facts     │  │ • Skills    │       │   │
│  │  │ • Events    │  │ • Prefs     │  │ • Patterns  │       │   │
│  │  │ • Logs      │  │ • Knowledge │  │ • Workflows │       │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘       │   │
│  │                                                          │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The flow:

  1. Input arrives → Goes into working memory
  2. Retrieval triggers → Query long-term stores for relevant memories
  3. Context assembly → Combine current input + retrieved memories
  4. Processing → LLM generates response using full context
  5. Storage → Extract new memories from interaction, persist to long-term

Different systems prioritize different memory types:

System Primary Focus
ChatGPT Memory Semantic (user facts/preferences)
Claude Memory Semantic + light Episodic (cross-conversation recall)
LangGraph Checkpointers Episodic (workflow state)
mem0 Semantic (extracted facts)
Aegis Memory Semantic + Episodic + Scoped (multi-agent aware)

Memory Type Decision Framework

When designing memory for your application, ask:

Question If Yes → Memory Type
Do I need to recall what happened at a specific time? Episodic
Do I need to recall facts about the user/world? Semantic
Do I need the agent to improve at tasks over time? Procedural
Do I just need continuity within a session? Working memory (context) is enough

Most applications need a combination:

Personal Assistant:
├── Semantic (user preferences, facts) — Primary
├── Episodic (recent conversations) — Secondary
└── Procedural (user's common workflows) — Nice to have

Research Agent:
├── Episodic (what sources were checked) — Primary
├── Semantic (extracted findings) — Primary
└── Procedural (search strategies that work) — Secondary

Multi-Agent System:
├── Semantic (shared knowledge base) — Primary
├── Episodic (handoff history, who did what) — Primary
├── Scoped access (who can see what) — Critical ← This is the gap
└── Procedural (team coordination patterns) — Secondary
Enter fullscreen mode Exit fullscreen mode

Module 2 Summary

Memory Type Duration Content AI Implementation
Working (Short-term) Single session Active context Context window
Episodic Permanent Specific events Conversation logs, event stores
Semantic Permanent Abstracted facts Knowledge bases, user profiles
Procedural Permanent Skills, behaviors Few-shot examples, fine-tuning

Key insight: Most "memory" products today focus on semantic memory (facts extraction). Episodic is often just raw logs without smart retrieval. Procedural is largely unsolved at runtime.

The multi-agent gap: None of the standard memory types address who can access what. In multi-agent systems, you need memory scoping — which isn't a type of memory, but a property that cuts across all types.


In the next Part we will cover how products implement memory today. How leading AI systems — such as Claude’s memory and ChatGPT’s memory — are designed, why they work well for chat‑centric products, and why those same approaches break down for developer‑focused use cases.


Top comments (0)