DEV Community

raghava kotala
raghava kotala

Posted on

Memory in LlamaIndex: Building Intelligent Chat Agents

This blog post is designed for developers building chat applications, AI assistants, or conversational interfaces who want to implement persistent memory capabilities. Whether you're creating customer support bots, personal assistants, or recommendation systems, you'll learn how to make your agents remember user context, preferences, and conversation history. By following the practical examples and code snippets provided, you can implement these memory features in your own LlamaIndex-based applications within hours.

Memory is fundamental to human intelligence—it shapes our identity, enables learning from experiences, and allows us to build meaningful relationships. In traditional software systems, memory has been simply categorized as RAM for immediate processing and disk storage for persistence. When you send a message to a friend, it gets processed in memory and then stored in a database for future reference.

However, the emergence of Large Language Models (LLMs) is transforming how we interact with technology. Users increasingly expect natural, conversational interfaces where they can say "ask Bob if he'd like to go to the Batman movie" and have the system understand context, intent, and nuances. This requires a more sophisticated approach to memory—one that goes beyond simple data storage to understand and remember user preferences, conversation history, and behavioral patterns.

In this blog post, we'll explore how to implement intelligent memory systems in chat agents using LlamaIndex. We'll cover three essential types of memory:

  1. Short-term Memory: Storing recent interactions and conversation context
  2. Static Memory: Maintaining persistent user profile information
  3. Dynamic Memory: Extracting and storing user behavioral patterns and preferences

By the end of this post, you'll understand how to create chat agents that remember who your users are, what they've discussed, and how they prefer to interact—making every conversation feel more personal and contextually aware.

Types of Memory in LlamaIndex

LlamaIndex provides a flexible framework for implementing various types of memory in chat agents. Let's delve into the three key memory components you can leverage:

1. Short-term Memory: Memory.from_defaults

Short-term memory is crucial for keeping track of recent interactions and maintaining conversation context. The Memory.from_defaults class in LlamaIndex allows you to store information on a per-user basis, using a unique user ID (UID) for retrieval. By default, this information is stored in an in-memory database, but you can configure it to use any database URI of your choice.

Memory retrieval example)

2. Static Memory: StaticMemoryBlock

Static memory is used to store persistent user profile information, such as name, age, and location. Instead of embedding this information in the system prompt (which is not designed for user-specific data), you can use the StaticMemoryBlock. This dedicated memory block can store system instructions, user pre-defined data, and other information that should be readily available during a conversation.

Static memory block example

3. Dynamic Memory: FactExtractionMemoryBlock

Dynamic memory enables the extraction and storage of user behavioral patterns and preferences. For instance, one user may prefer responses with emojis, another may prefer technical details, and yet another may enjoy jokes. The FactExtractionMemoryBlock in LlamaIndex allows you to extract important information from conversations using prompts and store it as facts.

Fact extraction example

Dynamic memory is particularly powerful because it adapts to individual user preferences over time. However, as of now, LlamaIndex does not provide built-in support for storing these facts in a database—you'll need to implement this functionality yourself.

Memory Types Comparison

Here's a quick comparison of when to use each memory type:

Scenario Short-term Memory Static Memory Dynamic Memory
Purpose Store recent interactions and conversation context Maintain persistent user profile information Extract and store user behavioral patterns and preferences
Customer Support Bots ✅ Must have - track order ID, status mentioned in conversation ⚠️ Might be useful ⚠️ Can be useful for personalizing responses
Personal Assistant Bots ✅ Must have - needs conversation context ✅ Must have - needs user details ✅ Could enhance experience by personalizing interactions
E-commerce Recommendation Bots ✅ Must have - needs conversation context ✅ Must have - needs user details ✅ Must have - needs user preferences

Example: Memory in Action

For example, if a user says, "Hi! My name is raghav," the system switches to that user and the following gets added to the user message:

<memory>
<core_info>
I am raghav, a 30-year-old Data Scientist from Bangalore.
</core_info>
<extracted_info>
<fact>The user's name is Raghav.</fact>
</extracted_info>
</memory>
Enter fullscreen mode Exit fullscreen mode

Here, <core_info> represents static information about the user, and <extracted_info> contains the facts extracted from the conversation. Though the fact is already present in static memory, this is just for demonstration purposes—we can specify instructions to extract other information like preferences, hobbies, etc.

Whatever follow-up messages from the user will have this memory block appended to it. Below we can see the response from the bot:

Bot response with memory

Complete Implementation

Here's a complete working example that demonstrates all three memory types:

"""
LlamaIndex Memory Blog - Chat Agent with Persistent Memory
Demonstrates fact extraction and SQLite persistence
"""

import asyncio
import sqlite3
import json
import os
from llama_index.core.memory import StaticMemoryBlock, FactExtractionMemoryBlock, Memory
from llama_index.core.agent.workflow import FunctionAgent
import model
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# =============================================================================
# CONFIGURATION
# =============================================================================

# LLM Configuration
LLM = OpenAI()
EMBED_MODEL = OpenAIEmbedding()

# User specific data:
user_data = {
    "raghav": {
        "name": "raghav",
        "age": 30,
        "profession": "Data Scientist",
        "location": "Bangalore",
    },
    "pavan": {
        "name": "pavan",
        "age": 25,
        "profession": "Test Engineer",
        "location": "Hyderabad",
    }
}

# Global state
agent_dict = {}
agent = FunctionAgent(tools=[], llm=LLM)

def create_memory(user_id: str) -> Memory:
    """Create a new memory instance for a user"""

    STATIC_BLOCK = StaticMemoryBlock(
        name="core_info", 
        static_content=f"I am {user_data[user_id]['name']}, a {user_data[user_id]['age']}-year-old {user_data[user_id]['profession']} from {user_data[user_id]['location']}.", 
        priority=0
    )

    fact_extraction_block = FactExtractionMemoryBlock(
        name="extracted_info", 
        llm=LLM, 
        max_facts=50, 
        priority=1
    )

    return Memory.from_defaults(
        session_id=user_id,
        token_limit=300,
        chat_history_token_ratio=0.0002,
        token_flush_size=500,
        memory_blocks=[STATIC_BLOCK, fact_extraction_block],
        insert_method="user",
        async_database_uri="sqlite+aiosqlite:///data/chat_memory.db",
        table_name="chat_sessions",
    )

def get_agent(user_id: str) -> Memory:
    """Get or create memory for a user"""
    if user_id not in agent_dict:
        agent_dict[user_id] = create_memory(user_id)
    return agent_dict[user_id]

# =============================================================================
# MAIN APPLICATION
# =============================================================================

async def main():
    """Main application loop"""
    print("🤖 LlamaIndex Memory Blog - Chat Agent")
    print("Commands:")
    print("  'message:user_id' - Chat with specific user")
    print("  'exit' - Quit application")
    print("=" * 50)

    memory = None

    while True:
        try:
            user_message = input("\nUser: ").strip()

            if user_message == "exit":
                print("👋 Goodbye!")
                break

            elif ":" in user_message:
                user_id = user_message.split(":")[-1].strip()
                memory = get_agent(user_id)
                print(f"💭 Switched to user: {user_id}")

            if memory is None:
                print("⚠️  Please specify a user ID first using format 'message:user_id'")
                continue

            # Process the message
            response = await agent.run(user_msg=user_message, memory=memory)
            chat_messages = memory.get()
            print(f"🤖 Response: {response}")

        except KeyboardInterrupt:
            print("\n👋 Goodbye!")
            break
        except Exception as e:
            print(f"❌ Error: {e}")

if __name__ == "__main__":
    asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Conclusion

Most organizations already have access to essential user information. Supplying these details to chat agents, tailored to specific business needs, can significantly enhance contextual understanding and personalization. While there are additional complexities to consider, this post focused on the core memory components in LlamaIndex that empower you to develop smarter, more responsive chat agents.

For more detailed information, check out the original LlamaIndex blog post.


What type of memory do you think would be most valuable for your chat agent use case? Share your thoughts in the comments below!

Top comments (0)