DEV Community: Arulnidhi Karunanidhi

Part 2: The 5 Types of Memory Every AI Agent Needs (And Why LLMs Can Only Do 2)

Arulnidhi Karunanidhi — Tue, 10 Feb 2026 16:54:42 +0000

In Part 1, I mentioned there's one memory type that LLMs literally cannot implement — not with clever prompting, not with RAG, not with any current technique. Today we find out which one, and why it's the exact type multi-agent systems need most.
But first, let's build the vocabulary.

Why Taxonomy Matters

When someone says "my AI agent needs memory," they could mean completely different things:

"I need it to remember what we discussed 5 minutes ago" → Short-term
"I need it to remember the user's name across sessions" → Long-term semantic
"I need it to remember that it tried approach X and it failed" → Episodic
"I need it to get better at a specific task over time" → Procedural

Building the wrong type of memory for your use case wastes effort and creates frustrating user experiences.

Let's break down each type.

2.1 Short-Term Memory (Working Memory)

What it is: Information the system is actively processing right now.

Human equivalent: The thoughts you're holding in your head while solving a problem. Limited, temporary, high-attention.

In LLMs: This is the context window. Everything the model can see in a single API call.

┌─────────────────────────────────────────┐
│           Context Window                │
│                                         │
│  [System Prompt]                        │
│  [Previous messages in this session]    │
│  [Retrieved memories]                   │
│  [Current user message]                 │
│                                         │
│  ← Everything here is "working memory"  │
└─────────────────────────────────────────┘

Characteristics:

Property	Description
Duration	Single session or API call
Capacity	Limited by context window (128K-200K tokens typically)
Access	Immediate — model sees everything at once
Cost	Expensive — every token is processed

When it's sufficient:

Single-turn Q&A
Short conversations (< 50 messages)
Tasks that don't require historical context

When it breaks down:

Long conversations (token limit)
Multi-session continuity (context clears between sessions)
Cost-sensitive applications at scale

2.2 Long-Term Memory (Persistent Storage)

What it is: Information that persists beyond a single session, stored externally and retrieved when needed.

Human equivalent: Everything you "know" that isn't currently in your active thoughts. Vast, durable, requires recall.

In AI systems: External databases (PostgreSQL, vector stores, etc.) that store information and inject relevant pieces into context when needed.

┌──────────────────┐         ┌──────────────────┐
│   Session 1      │         │   Session 2      │
│                  │         │                  │
│  "My name is     │         │  "What's my      │
│   John"          │         │   name?"         │
│                  │         │                  │
└────────┬─────────┘         └────────┬─────────┘
         │                            │
         │ Extract & Store            │ Retrieve
         ▼                            ▼
┌─────────────────────────────────────────────────┐
│              Long-Term Memory Store             │
│                                                 │
│  { "user_name": "John", stored: "2025-12-01" }  │
│                                                 │
└─────────────────────────────────────────────────┘

Characteristics:

Property	Description
Duration	Permanent (until explicitly deleted)
Capacity	Virtually unlimited
Access	Requires retrieval step (adds latency)
Cost	Storage cost + retrieval cost (much cheaper than context)

The tradeoff: You can store unlimited information, but you can only retrieve and inject a subset into the context window. Retrieval quality becomes critical.

2.3 Episodic Memory (Specific Events)

What it is: Memory of specific events, conversations, or experiences — tied to a particular time and context.

Human equivalent: "Remember that meeting last Tuesday where John got angry about the deadline?"

In AI systems: Stored conversation logs, interaction records, or event traces with temporal metadata.

# Episodic memory record
{
    "type": "episodic",
    "timestamp": "2025-12-10T14:30:00Z",
    "session_id": "abc123",
    "event": "User discussed funding strategy",
    "context": "User was exploring pre-seed vs seed options",
    "participants": ["user", "assistant"],
    "outcome": "Decided to target pre-seed first"
}

Key properties:

Time-stamped — When did this happen?
Contextual — What was the surrounding situation?
Specific — Not abstracted into general facts
Queryable by time — "What did we discuss last week?"

Use cases:

"Continue where we left off"
"What did I say about X last time?"
Audit trails and compliance
Debugging agent behavior ("Why did it do that?")

Retrieval challenge: Episodic memories are often verbose. You can't inject entire conversation transcripts. Systems typically:

Store full episodes
Generate summaries for retrieval
Fetch full detail only when specifically needed

2.4 Semantic Memory (Facts & Knowledge)

What it is: Abstracted facts, knowledge, and information — divorced from the specific episode where they were learned.

Human equivalent: "Paris is the capital of France." You know this, but you probably don't remember when you learned it.

In AI systems: Extracted facts, user profiles, preferences, and knowledge bases.

# Semantic memory records
{
    "type": "semantic",
    "fact": "User's name is John",
    "confidence": 1.0,
    "source": "explicit_statement",
    "first_seen": "2025-11-01",
    "last_confirmed": "2025-12-15"
}

{
    "type": "semantic", 
    "fact": "User prefers concise responses without bullet points",
    "confidence": 0.8,
    "source": "inferred_from_feedback",
    "first_seen": "2025-11-15"
}

Key properties:

Abstracted — General facts, not tied to specific moments
Updateable — Facts can change; need conflict resolution
Confidence-scored — Some facts are more certain than others
Categorizable — Can be organized (preferences, demographics, knowledge)

The extraction problem:

Converting episodic to semantic memory requires judgment:

Episode: "User said 'I just started at Microsoft last week, 
          really excited about the AI team there'"

Extracted semantic memories:
- User works at Microsoft (high confidence)
- User is on the AI team (high confidence)  
- User started recently (medium confidence — "last week" will become stale)
- User is excited about their job (medium confidence — emotional state may change)

What to extract? At what confidence? This is where LLM-based extraction comes in — and where systems differ in sophistication.

2.5 Procedural Memory (How to Do Things)

What it is: Memory of how to perform tasks, skills, and behaviors.

Human equivalent: How to ride a bike, how to write code, how to negotiate. You don't consciously recall steps — you just do it.

In AI systems: This is the trickiest category. Current implementations include:

Approach	How It Works
Fine-tuning	Bake procedures into model weights
Few-shot examples	Store examples of correct behavior, inject when relevant
Tool configurations	Remember how to use specific tools/APIs
Workflow templates	Store successful action sequences

# Procedural memory example
{
    "type": "procedural",
    "skill": "deploy_langraph_agent",
    "steps": [
        "Validate entry point exists",
        "Check dependencies in requirements.txt",
        "Build Docker container",
        "Push to registry",
        "Update deployment config"
    ],
    "learned_from": ["session_xyz", "session_abc"],
    "success_rate": 0.94
}

Why it's hard for LLMs:

LLMs don't truly "learn" procedures from memory injection — they follow instructions. Procedural memory in current systems is really:

Storing successful patterns
Retrieving relevant patterns
Injecting them as instructions/examples

True procedural learning would require weight updates (fine-tuning), which can't happen at runtime.

Where this matters:

Agents that repeat similar tasks should get better over time
Learning user-specific workflows ("When I say 'deploy', run these 5 steps")
Tool use patterns ("This API needs auth header X, learned from past failures")

2.6 The Memory Hierarchy in Cognitive Architectures

Now let's see how these types compose in a complete system:

┌─────────────────────────────────────────────────────────────────┐
│                        Agent Runtime                            │
│                                                                 │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                   Working Memory                           │ │
│  │                   (Context Window)                         │ │
│  │                                                            │ │
│  │  • Current conversation                                    │ │
│  │  • Retrieved long-term memories                            │ │
│  │  • Active task state                                       │ │
│  └────────────────────────────────────────────────────────────┘ │
│                              ↑↓                                 │
│         ┌────────────────────┴───────────────────┐              │
│         │           Retrieval Layer              │              │
│         │   (Semantic search, filtering, ranking)│              │
│         └────────────────────┬───────────────────┘              │
│                              ↓                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                   Long-Term Memory                       │   │
│  │                                                          │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │   │
│  │  │  Episodic   │  │  Semantic   │  │ Procedural  │       │   │
│  │  │             │  │             │  │             │       │   │
│  │  │ • Sessions  │  │ • Facts     │  │ • Skills    │       │   │
│  │  │ • Events    │  │ • Prefs     │  │ • Patterns  │       │   │
│  │  │ • Logs      │  │ • Knowledge │  │ • Workflows │       │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘       │   │
│  │                                                          │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The flow:

Input arrives → Goes into working memory
Retrieval triggers → Query long-term stores for relevant memories
Context assembly → Combine current input + retrieved memories
Processing → LLM generates response using full context
Storage → Extract new memories from interaction, persist to long-term

Different systems prioritize different memory types:

System	Primary Focus
ChatGPT Memory	Semantic (user facts/preferences)
Claude Memory	Semantic + light Episodic (cross-conversation recall)
LangGraph Checkpointers	Episodic (workflow state)
mem0	Semantic (extracted facts)
Aegis Memory	Semantic + Episodic + Scoped (multi-agent aware)

Memory Type Decision Framework

When designing memory for your application, ask:

Question	If Yes → Memory Type
Do I need to recall what happened at a specific time?	Episodic
Do I need to recall facts about the user/world?	Semantic
Do I need the agent to improve at tasks over time?	Procedural
Do I just need continuity within a session?	Working memory (context) is enough

Most applications need a combination:

Personal Assistant:
├── Semantic (user preferences, facts) — Primary
├── Episodic (recent conversations) — Secondary
└── Procedural (user's common workflows) — Nice to have

Research Agent:
├── Episodic (what sources were checked) — Primary
├── Semantic (extracted findings) — Primary
└── Procedural (search strategies that work) — Secondary

Multi-Agent System:
├── Semantic (shared knowledge base) — Primary
├── Episodic (handoff history, who did what) — Primary
├── Scoped access (who can see what) — Critical ← This is the gap
└── Procedural (team coordination patterns) — Secondary

Module 2 Summary

Memory Type	Duration	Content	AI Implementation
Working (Short-term)	Single session	Active context	Context window
Episodic	Permanent	Specific events	Conversation logs, event stores
Semantic	Permanent	Abstracted facts	Knowledge bases, user profiles
Procedural	Permanent	Skills, behaviors	Few-shot examples, fine-tuning

Key insight: Most "memory" products today focus on semantic memory (facts extraction). Episodic is often just raw logs without smart retrieval. Procedural is largely unsolved at runtime.

The multi-agent gap: None of the standard memory types address who can access what. In multi-agent systems, you need memory scoping — which isn't a type of memory, but a property that cuts across all types.

In the next Part we will cover how products implement memory today. How leading AI systems — such as Claude’s memory and ChatGPT’s memory — are designed, why they work well for chat‑centric products, and why those same approaches break down for developer‑focused use cases.

Part 1: Foundations - Why Memory Matters in AI

Arulnidhi Karunanidhi — Mon, 09 Feb 2026 15:59:57 +0000

1.1 The Stateless Nature of LLMs

Let's start with a truth that seems counterintuitive when you're chatting with Claude or ChatGPT:

Large Language Models have no memory.

Every single time you send a message, the model starts completely fresh. It has no idea who you are, what you discussed before, or what preferences you have. It's like talking to someone with perfect amnesia, as every conversation begins at zero.

But, the conversation feels continuous, right!

The trick: Your entire conversation history is sent with every message.

When you send "Can you explain that differently?", what actually reaches the model is:

[System prompt: You are Claude, made by Anthropic...]
[Message 1: User asked about Python decorators]
[Message 2: Claude explained decorators with examples]
[Message 3: User says "Can you explain that differently?"]

The model reads everything, generates a response, and then forgets everything. The next time you send a message, the whole history is sent again.

This is what we mean by stateless , the model itself stores nothing between calls. All "memory" is an illusion created by passing context back and forth.

1.2 Context Windows: The Illusion of Memory

The "context window" is the maximum amount of text a model can process in a single call. Think of it as the model's working memory, everything it can "see" at once.

Context window sizes (as of late 2025):

Model	Context Window
GPT-4o	128K tokens
Claude 3.5 Sonnet	200K tokens
Claude Opus 4	200K tokens
Gemini 1.5 Pro	2M tokens

A token is roughly ¾ of a word. So 200K tokens ≈ 150,000 words ≈ a 500-page book.

This sounds huge. So what's the problem?

Three issues:

Issue 1: Cost

Every token you send costs money. If you're building an application with 1,000 users, each sending 10 messages per day, and you're stuffing 50K tokens of history into each call...

1,000 users × 10 messages × 50K tokens = 500M input tokens/day

At Claude's pricing ($3/1M input tokens for Sonnet), that's $1,500/day just on input tokens. And that's before the model generates any output.

Issue 2: Latency

More tokens = slower responses. The model has to process everything you send before generating the first word of its response. With 100K tokens of context, you might wait 5-10 seconds before seeing any output.

Issue 3: The "Lost in the Middle" Problem

Research has shown that LLMs pay more attention to the beginning and end of their context window, and less attention to the middle. If you stuff a 200K context window full of conversation history, the model might miss important details from 3 hours ago that are buried in the middle.

[Beginning - High Attention]
...
[Middle - Lower Attention] ← Important detail about user's project here
...
[End - High Attention]

This is why "just use a bigger context window" isn't a complete solution.

1.3 The Forgetting Problem: What Happens After 128K Tokens?

Let's make this concrete.

Imagine you're building a personal assistant that helps a user over weeks or months. They discuss:

Their job (software engineer at a fintech startup)
Their preferences (likes concise answers, hates bullet points)
Their projects (building a recommendation engine)
Their schedule (busy Mondays, prefers async communication)
Hundreds of small details mentioned in passing

After a few weeks of daily use, you have millions of tokens of conversation history.

What do you do?

Option A: Truncate (Delete Old Messages)

Just keep the most recent N messages. Simple, but brutal.

Day 1: User mentions they're allergic to shellfish
Day 2-30: Various conversations
Day 31: User asks for dinner recommendations
Assistant: "How about this great lobster restaurant?" 💀

The model forgot because you deleted the context where the allergy was mentioned.

Option B: Summarize

Periodically compress old conversations into summaries.

Original (5000 tokens):
- Long conversation about user's job search
- Details about companies they applied to
- Specific concerns about salary negotiation

Summary (200 tokens):
"User is job searching in tech, has applied to several companies, 
concerned about salary negotiation."

Better, but you lose nuance. Which companies? What were the specific concerns? Summaries are lossy compression.

Option C: Extract and Store

Pull out key facts and store them separately:

Facts extracted:
- User works at: TechCorp (software engineer)
- User preference: concise answers
- User allergy: shellfish
- User project: recommendation engine

This is the foundation of what memory systems like mem0 do. But now you need a system to:

Decide what's worth extracting
Store it somewhere
Retrieve relevant facts for each new conversation
Handle conflicts (user changed jobs, old fact is now wrong)

This is the memory problem. And it's why a whole category of tools exists to solve it.

1.4 Human Memory vs Machine Memory: A Conceptual Framework

To build good AI memory systems, it helps to understand how human memory actually works. Not because we should copy it exactly, but because it reveals what kinds of memory matter.

Human Memory Types

Sensory Memory (milliseconds)
Raw input from senses. Mostly irrelevant for AI, this is like the streaming tokens before they're processed.

Short-Term / Working Memory (seconds to minutes)
What you're actively thinking about right now. Limited capacity — humans can hold about 7±2 items.

For LLMs: This is the context window. What the model can "see" in a single call.

Long-Term Memory — This is where it gets interesting:

Type	What It Stores	Human Example	AI Equivalent
Episodic	Specific events	"Last Tuesday's meeting"	Conversation logs
Semantic	Facts & knowledge	"Paris is in France"	Extracted facts, knowledge bases
Procedural	How to do things	Riding a bike	Fine-tuned behaviors, tool usage patterns

The Key Insight

Humans don't remember everything. We:

Consolidate — Important things move from short-term to long-term
Forget — Unimportant things decay
Reconstruct — We don't replay memories perfectly; we rebuild them from fragments
Associate — Memories connect to each other (one memory triggers another)

Good AI memory systems need similar properties:

Not everything should be stored (selective extraction)
Old irrelevant memories should fade (decay/relevance scoring)
Retrieval should be associative, not just keyword-based (semantic search)
Memory should be reconstructible from fragments (summarization + facts)

The Gap

Here's what current LLM products (Claude's memory, ChatGPT's memory) give you:

User Preferences ✓ (semantic memory)
Key Facts ✓ (semantic memory)
Conversation Recall ✓ (episodic memory, limited)

Here's what they don't handle well:

Multi-agent shared memory ✗
Memory scoping (who knows what) ✗
Memory validation (is this fact still true?) ✗
Procedural memory for agents ✗
Memory across applications ✗

This gap is exactly where developer-facing memory tools (mem0, Supermemory, Aegis Memory) come in.

Module 1 Summary

Concept	Key Takeaway
Stateless LLMs	Models remember nothing; context is re-sent every call
Context Windows	Limited size, costly, slow, attention problems
The Forgetting Problem	Can't keep everything; need selective storage & retrieval
Memory Types	Episodic (events), Semantic (facts), Procedural (skills)
The Gap	Product memory ≠ Agent/Developer memory

What's Next

In Part 2, we'll answer a question that trips up most developers:

What's the difference between episodic and semantic memory, and why does it matter for your agent?

We'll build a complete taxonomy mapping human memory research to AI implementation.
You'll learn why LLMs can fake most memory types but struggle with one critical category —
the same one that multi-agent systems need most.

How to Add Persistent Memory to CrewAI Agents

Arulnidhi Karunanidhi — Sat, 07 Feb 2026 07:52:33 +0000

What is Aegis Memory?

Aegis Memory is an open-source memory layer built specifically for multi-agent AI systems. It solves the "amnesia problem" where AI agents forget everything between sessions.

Unlike built-in framework memory (which is session-only), Aegis provides:

Persistent storage that survives restarts
Semantic search across all memories
Scoped access control (private, shared, global)
Self-improvement patterns (ACE) for agents that learn over time

In this tutorial, you'll learn how to integrate Aegis Memory with CrewAI in under 10 minutes.

The Problem

You build a CrewAI agent. It works great. Then you restart it.

# Run 1
agent.chat("My name is Alex, I'm a Python developer")
# Agent: "Nice to meet you, Alex!"

# Run 2 (new session)
agent.chat("What's my name?")
# Agent: "I don't know your name."

Everything your agent learned? Gone.

This isn't a bug, it's how LLMs work. Context windows reset. Sessions end. Memory disappears.

But multi-agent systems need persistent memory. Agents that remember users. Teams that share knowledge. Systems that learn from mistakes.

The Solution: Aegis Memory

Aegis Memory is an open-source memory layer for multi-agent systems. It gives your CrewAI agents:

Persistent memory that survives restarts
Semantic search (query by meaning, not keywords)
Scoped access (private, shared, or global memories)
Self-improvement through ACE patterns

Let's add it to a CrewAI project.

Step 1: Install

pip install aegis-memory crewai

Start the Aegis server (requires Docker):

git clone https://github.com/quantifylabs/aegis-memory.git
cd aegis-memory
docker-compose up -d

Step 2: Basic Integration

from aegis_memory import AegisClient
from crewai import Agent, Task, Crew

# Connect to Aegis
memory = AegisClient(
    base_url="http://localhost:8741",
    api_key="your-api-key"
)

# Store a memory
memory.add(
    content="User prefers concise responses and dark mode",
    agent_id="assistant",
    user_id="user_123",
    scope="agent-private"
)

# Query memories semantically
results = memory.query(
    query="What are the user's preferences?",
    agent_id="assistant",
    user_id="user_123",
    top_k=5
)

for r in results:
    print(r["content"])
# Output: "User prefers concise responses and dark mode"

Step 3: CrewAI Integration

Here's a research crew with persistent memory:

from aegis_memory.integrations.crewai import AegisCrewMemory
from crewai import Agent, Task, Crew

# Initialize Aegis memory for CrewAI
memory = AegisCrewMemory(
    base_url="http://localhost:8741",
    api_key="your-api-key"
)

# Create agents with shared memory
researcher = Agent(
    role="Research Analyst",
    goal="Find accurate information on given topics",
    backstory="Expert at gathering and analyzing data",
    memory=memory,
    verbose=True
)

writer = Agent(
    role="Content Writer", 
    goal="Write clear, engaging content",
    backstory="Skilled at transforming research into readable content",
    memory=memory,  # Same memory instance = shared knowledge
    verbose=True
)

# Define tasks
research_task = Task(
    description="Research the latest trends in AI agents",
    agent=researcher,
    expected_output="Summary of key trends"
)

writing_task = Task(
    description="Write a blog post based on the research",
    agent=writer,
    expected_output="Draft blog post"
)

# Create and run the crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    memory=memory
)

result = crew.kickoff()

What happens:

Researcher finds information → stored in shared memory
Writer queries memory → gets researcher's findings
Next run → both agents remember what they learned

How Memory Scopes Work

Aegis provides three scopes:

# Private: only the researcher can see this
memory.add(
    content="Internal analysis notes",
    agent_id="researcher",
    scope="agent-private"
)

# Shared: researcher and writer can see this
memory.add(
    content="Key finding: AI agent market growing 40% YoY",
    agent_id="researcher",
    scope="agent-shared",
    shared_with_agents=["writer"]
)

# Global: all agents can see this
memory.add(
    content="Company policy: always cite sources",
    agent_id="admin",
    scope="global"
)

Why Not Just Use CrewAI's Built-in Memory?

CrewAI's memory works for single sessions. Aegis works for production systems.

What's Next?

Once you have persistent memory working, you can add:

Memory voting: Track which memories help vs harm task completion
Reflections: Store lessons from failures
Playbooks: Query proven strategies before starting tasks

These are called ACE patterns — they help agents improve over time.

Resources

GitHub: github.com/quantifylabs/aegis-memory
Docs: docs.aegismemory.com
Full tutorial: aegismemory.com/blog/add-persistent-memory-to-crewai

Built something cool with Aegis Memory? Please drop a comment I'd love to see :)