Chaitrali Kakde

Posted on Oct 28

Building Smarter AI: Types of Memory and How to Make Your Agent Remember

#ai #showdev #agents #beginners

I’ve worked with multiple AI agents and noticed a common issue - they forget.

No matter how advanced the model is, most AI agents lose context fast. They forget earlier parts of a conversation, start hallucinating, or respond with completely irrelevant answers. This doesn’t just frustrate users — it breaks the illusion of intelligence and consistency that makes AI feel “real.”
In this blog, we’ll break down what memory truly means for AI systems, why it’s so difficult to implement, and explore the best tools, frameworks, and approaches available today to add reliable, long-term memory to a voice-based AI agent.

Why Memory Matters

Imagine you’re talking to an assistant that forgets everything you said 30 seconds ago.That’s what most LLMs are doing today.
While these models can generate amazing text, they don’t actually remember they just process what’s inside the current context window (the tokens you send in a single API call). Once that context is gone, the model forgets everything.

This becomes a real problem when you want your agent to:

Remember user preferences across sessions
Learn from past interactions
Maintain long conversations (especially in voice agents)
Reduce hallucinations and irrelevant answers In short, memory is what makes intelligence feel human.

The Real Problem — Context Limits

Large language models (LLMs) like GPT, Claude, or Gemini have something called a context window — a limit on how much information they can “see” at once.

For example:

GPT-4 Turbo supports up to 128k tokens (~100 pages of text)
Claude 3.5 can handle similar lengths

That might sound like a lot, but once your agent handles long conversations, retrieves documents, or processes voice transcripts, that limit gets hit quickly.When the context runs out, older parts of the conversation get trimmed — meaning your AI literally forgets earlier turns.
This forgetfulness often leads to hallucinations, because the model tries to guess what it doesn’t remember.

Types of Memory in AI

When we talk about “AI memory,” we’re actually talking about three kinds of memory systems — similar to how humans process information.

Short-Term Memory: This is what the model sees in its current context window. It’s fast but temporary — once the session ends, it’s gone.
Working Memory: This is the in-process “scratchpad” an agent uses to reason or plan during a single task. For example, if it’s generating steps to complete a workflow, this working memory helps it stay organized.
Long-Term Memory: This is what gives agents continuity. Long-term memory stores key details, summaries, or embeddings from past sessions and retrieves them later. This is where most of today’s innovation is happening.

How Developers Are Solving It

There’s no single “perfect” solution yet, but several promising approaches exist.

1. Vector Databases

Tools like Pinecone, Weaviate, FAISS, or Chroma let you store pieces of past conversations as embeddings — mathematical representations of meaning.

When a new query comes in, the system searches for similar embeddings to bring back relevant memories.

Example flow:

Convert each message or summary into an embedding
Store it in a vector database
Retrieve top-N similar chunks before each new prompt
Append them to the model input

This method is popular because it’s scalable and model-agnostic.

2. Memory Frameworks

Several frameworks make it easier to plug memory into LLM-based agents:

LangChain Memory Components — includes ConversationBufferMemory, VectorStoreMemory, and hybrid memory options.
LlamaIndex (formerly GPT-Index) — provides retrieval and summarization tools for long-term memory.
Mem0 (by AWS) — a dynamic memory layer designed for cloud-scale AI agents. It combines semantic search with time-based retention logic.
OpenAI Assistants API (Threads) — supports persistent sessions through “thread IDs,” which keep conversation history on OpenAI’s side.

Each of these takes a different approach, but the goal is the same — give the agent continuity across interactions.

3. Custom Implementations

You can also build your own lightweight memory layer using:

Redis for caching short-term history
PostgreSQL for structured, long-term storage
JSON logs + embeddings for minimal setups

Developers often mix summarization + embedding to balance performance and memory size.

Implementation: Adding Long-Term Memory with Mem0

Now that we’ve explored the theory behind memory in AI agents, let’s look at a practical way to implement it using Mem0
— an open-source platform designed to give AI agents persistent, long-term memory.

In this example, we’ll create a Concierge Voice Agent that remembers returning users and personal details over time.

💡 For a full working example, check out the GitHub repo:
https://github.com/videosdk-live/agents-quickstart/tree/main/Memory

Prerequisites

A Mem0 API key, available from the Mem0 dashboard.
Ensure your agent environment is set up per the AI Voice Agent Quickstart. This is the baseline app where we'll implement the memory features in the steps below.

Step 1: Create a Dedicated Memory Manager

We’ll start by creating a Mem0MemoryManager class that wraps Mem0’s API.
It handles three core operations:

Fetching user memories
Storing new memories
Deciding which messages are worth remembering

Here’s what it looks like:

memory_utils.py

from mem0.client.main import AsyncMemoryClient

class Mem0MemoryManager:
    """Handles all interactions with the Mem0 API."""
    def __init__(self, api_key: str, user_id: str):
        self.user_id = user_id
        self._client = AsyncMemoryClient(api_key=api_key)

    async def fetch_recent_memories(self, limit: int = 5) -> list[str]:
        """Retrieves the most recent memories for the user."""
        try:
            response = await self._client.get_all(filters={"user_id": self.user_id}, limit=limit)
            return [entry.get("memory", "") for entry in response]
        except Exception as e:
            print(f"Error fetching memories: {e}")
            return []

    def should_store(self, user_message: str) -> bool:
        """Determines if a message contains keywords worth remembering."""
        keywords = ("remember", "preference", "my name is", "likes", "dislike")
        return any(keyword in user_message.lower() for keyword in keywords)

    async def record_memory(self, user_message: str, assistant_message: str | None = None):
        """Stores a conversational turn in Mem0."""
        # Example implementation
        await self._client.add(
            memory={"user_id": self.user_id, "memory": user_message, "response": assistant_message}
        )

Step 2: Access Memory to Personalize the Agent

Next, we’ll fetch stored memories and inject them into the agent’s system prompt.
This allows the agent to greet users personally and maintain continuity across sessions.

main.py

class MemoryAgent(Agent):
    def __init__(self, instructions: str, remembered_facts: list[str] | None = None):
        self._remembered_facts = remembered_facts or []
        super().__init__(instructions=instructions)

    async def on_enter(self):
        # Use the retrieved facts for a personalized greeting
        if self._remembered_facts:
            top_fact = "; ".join(self._remembered_facts[:2])
            await self.session.say(f"Welcome back! I remember that {top_fact}. What can I help you with?")
        else:
            await self.session.say("Hello! How can I help today?")

# This helper function runs at the start of the session
async def build_agent_instructions(memory_manager: Mem0MemoryManager | None) -> tuple[str, list[str]]:
    base_instructions = "You are a helpful voice concierge..."
    if not memory_manager:
        return base_instructions, []

    # Fetches memories and adds them to the system prompt
    remembered_facts = await memory_manager.fetch_recent_memories()
    if not remembered_facts:
        return base_instructions, []

    memory_lines = "\n".join(f"- {fact}" for fact in remembered_facts)
    enriched_instructions = f"{base_instructions}\n\nKnown details about this caller:\n{memory_lines}"
    return enriched_instructions, remembered_facts

Step3: Storing New Memories with a Custom Conversation Flow

memory_utils.py

from videosdk.agents import ConversationFlow

class Mem0ConversationFlow(ConversationFlow):
    """A custom flow that records memories after each turn."""
    def __init__(self, agent: Agent, memory_manager: Mem0MemoryManager, **kwargs):
        super().__init__(agent=agent, **kwargs)
        self._memory_manager = memory_manager
        self._pending_user_message: str | None = None

    async def run(self, transcript: str):
        self._pending_user_message = transcript
        # First, let the standard conversation turn happen
        full_response = "".join([chunk async for chunk in super().run(transcript)])

        # After the response, decide if the turn should be stored in memory
        if self._pending_user_message and self._memory_manager.should_store(self._pending_user_message):
            await self._memory_manager.record_memory(self._pending_user_message, full_response or None)

        self._pending_user_message = None

Step 4: Assembling the Agent Session

Integrate all components in your main application entry point. Initialize your memory manager, use it to build personalized agent instructions, and configure your session with the enhanced conversation flow.

main.py

async def start_session(context: JobContext):
    # 1. Setup memory manager
    memory_manager = Mem0MemoryManager(api_key=os.getenv("MEM0_API_KEY"), user_id="demo-user")

    # 2. Build agent with personalized instructions
    instructions, facts = await build_agent_instructions(memory_manager)
    agent = MemoryAgent(instructions=instructions, remembered_facts=facts)

    # 3. Setup conversation flow with memory capabilities
    conversation_flow = Mem0ConversationFlow(agent=agent, memory_manager=memory_manager, ...)

    # 4. Create the session with the custom flow
    session = AgentSession(
        agent=agent,
        pipeline=pipeline, # your pipeline
        conversation_flow=conversation_flow
    )

    # ... rest of your session and job context setup

Step 5: Run the Agent

main.py

from videosdk.agents import WorkerJob, JobContext, RoomOptions

def make_context() -> JobContext:
    return JobContext(room_options=RoomOptions(name="Concierge Agent", playground=True))

if __name__ == "__main__":
    WorkerJob(entrypoint=start_session, jobctx=make_context).start()

This will initialize the session using your start_session function from Step 4 and keep the worker alive.

Features

Persistent Memory: Remembers user preferences and details across conversations
Smart Memory Detection: Automatically stores important information based on keywords
Personalized Greetings: Welcomes returning users with remembered facts
Voice Interface: Full voice conversation with speech-to-text and text-to-speech

💡 We’d Love to Hear From You!

Have you tried adding memory to your AI or voice-based agent yet?
What challenges did you face while implementing persistent or conversational memory?
Are you exploring cascading pipelines, realtime pipelines, or a hybrid memory architecture?
How do you think long-term memory will change the way AI voice assistants interact with users?

👉 Share your thoughts, roadblocks, or success stories in the comments or join our Discord community ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!

Top comments (2)

Neurolov AI • Oct 29

Fantastic breakdown! Clear, structured and super practical love how you covered both the theory and hands-on implementation of memory in AI agents. This is gold for anyone building smarter, more human-like assistants.

Chaitrali Kakde • Oct 29

thanks @neurolov__ai for your support and kind words