This blog post is designed for developers building chat applications, AI assistants, or conversational interfaces who want to implement persistent memory capabilities. Whether you're creating customer support bots, personal assistants, or recommendation systems, you'll learn how to make your agents remember user context, preferences, and conversation history. By following the practical examples and code snippets provided, you can implement these memory features in your own LlamaIndex-based applications within hours.
Memory is fundamental to human intelligence—it shapes our identity, enables learning from experiences, and allows us to build meaningful relationships. In traditional software systems, memory has been simply categorized as RAM for immediate processing and disk storage for persistence. When you send a message to a friend, it gets processed in memory and then stored in a database for future reference.
However, the emergence of Large Language Models (LLMs) is transforming how we interact with technology. Users increasingly expect natural, conversational interfaces where they can say "ask Bob if he'd like to go to the Batman movie" and have the system understand context, intent, and nuances. This requires a more sophisticated approach to memory—one that goes beyond simple data storage to understand and remember user preferences, conversation history, and behavioral patterns.
In this blog post, we'll explore how to implement intelligent memory systems in chat agents using LlamaIndex. We'll cover three essential types of memory:
- Short-term Memory: Storing recent interactions and conversation context
- Static Memory: Maintaining persistent user profile information
- Dynamic Memory: Extracting and storing user behavioral patterns and preferences
By the end of this post, you'll understand how to create chat agents that remember who your users are, what they've discussed, and how they prefer to interact—making every conversation feel more personal and contextually aware.
Types of Memory in LlamaIndex
LlamaIndex provides a flexible framework for implementing various types of memory in chat agents. Let's delve into the three key memory components you can leverage:
1. Short-term Memory: Memory.from_defaults
Short-term memory is crucial for keeping track of recent interactions and maintaining conversation context. The Memory.from_defaults
class in LlamaIndex allows you to store information on a per-user basis, using a unique user ID (UID) for retrieval. By default, this information is stored in an in-memory database, but you can configure it to use any database URI of your choice.
2. Static Memory: StaticMemoryBlock
Static memory is used to store persistent user profile information, such as name, age, and location. Instead of embedding this information in the system prompt (which is not designed for user-specific data), you can use the StaticMemoryBlock
. This dedicated memory block can store system instructions, user pre-defined data, and other information that should be readily available during a conversation.
3. Dynamic Memory: FactExtractionMemoryBlock
Dynamic memory enables the extraction and storage of user behavioral patterns and preferences. For instance, one user may prefer responses with emojis, another may prefer technical details, and yet another may enjoy jokes. The FactExtractionMemoryBlock
in LlamaIndex allows you to extract important information from conversations using prompts and store it as facts.
Dynamic memory is particularly powerful because it adapts to individual user preferences over time. However, as of now, LlamaIndex does not provide built-in support for storing these facts in a database—you'll need to implement this functionality yourself.
Memory Types Comparison
Here's a quick comparison of when to use each memory type:
Scenario | Short-term Memory | Static Memory | Dynamic Memory |
---|---|---|---|
Purpose | Store recent interactions and conversation context | Maintain persistent user profile information | Extract and store user behavioral patterns and preferences |
Customer Support Bots | ✅ Must have - track order ID, status mentioned in conversation | ⚠️ Might be useful | ⚠️ Can be useful for personalizing responses |
Personal Assistant Bots | ✅ Must have - needs conversation context | ✅ Must have - needs user details | ✅ Could enhance experience by personalizing interactions |
E-commerce Recommendation Bots | ✅ Must have - needs conversation context | ✅ Must have - needs user details | ✅ Must have - needs user preferences |
Example: Memory in Action
For example, if a user says, "Hi! My name is raghav," the system switches to that user and the following gets added to the user message:
<memory>
<core_info>
I am raghav, a 30-year-old Data Scientist from Bangalore.
</core_info>
<extracted_info>
<fact>The user's name is Raghav.</fact>
</extracted_info>
</memory>
Here, <core_info>
represents static information about the user, and <extracted_info>
contains the facts extracted from the conversation. Though the fact is already present in static memory, this is just for demonstration purposes—we can specify instructions to extract other information like preferences, hobbies, etc.
Whatever follow-up messages from the user will have this memory block appended to it. Below we can see the response from the bot:
Complete Implementation
Here's a complete working example that demonstrates all three memory types:
"""
LlamaIndex Memory Blog - Chat Agent with Persistent Memory
Demonstrates fact extraction and SQLite persistence
"""
import asyncio
import sqlite3
import json
import os
from llama_index.core.memory import StaticMemoryBlock, FactExtractionMemoryBlock, Memory
from llama_index.core.agent.workflow import FunctionAgent
import model
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# =============================================================================
# CONFIGURATION
# =============================================================================
# LLM Configuration
LLM = OpenAI()
EMBED_MODEL = OpenAIEmbedding()
# User specific data:
user_data = {
"raghav": {
"name": "raghav",
"age": 30,
"profession": "Data Scientist",
"location": "Bangalore",
},
"pavan": {
"name": "pavan",
"age": 25,
"profession": "Test Engineer",
"location": "Hyderabad",
}
}
# Global state
agent_dict = {}
agent = FunctionAgent(tools=[], llm=LLM)
def create_memory(user_id: str) -> Memory:
"""Create a new memory instance for a user"""
STATIC_BLOCK = StaticMemoryBlock(
name="core_info",
static_content=f"I am {user_data[user_id]['name']}, a {user_data[user_id]['age']}-year-old {user_data[user_id]['profession']} from {user_data[user_id]['location']}.",
priority=0
)
fact_extraction_block = FactExtractionMemoryBlock(
name="extracted_info",
llm=LLM,
max_facts=50,
priority=1
)
return Memory.from_defaults(
session_id=user_id,
token_limit=300,
chat_history_token_ratio=0.0002,
token_flush_size=500,
memory_blocks=[STATIC_BLOCK, fact_extraction_block],
insert_method="user",
async_database_uri="sqlite+aiosqlite:///data/chat_memory.db",
table_name="chat_sessions",
)
def get_agent(user_id: str) -> Memory:
"""Get or create memory for a user"""
if user_id not in agent_dict:
agent_dict[user_id] = create_memory(user_id)
return agent_dict[user_id]
# =============================================================================
# MAIN APPLICATION
# =============================================================================
async def main():
"""Main application loop"""
print("🤖 LlamaIndex Memory Blog - Chat Agent")
print("Commands:")
print(" 'message:user_id' - Chat with specific user")
print(" 'exit' - Quit application")
print("=" * 50)
memory = None
while True:
try:
user_message = input("\nUser: ").strip()
if user_message == "exit":
print("👋 Goodbye!")
break
elif ":" in user_message:
user_id = user_message.split(":")[-1].strip()
memory = get_agent(user_id)
print(f"💭 Switched to user: {user_id}")
if memory is None:
print("⚠️ Please specify a user ID first using format 'message:user_id'")
continue
# Process the message
response = await agent.run(user_msg=user_message, memory=memory)
chat_messages = memory.get()
print(f"🤖 Response: {response}")
except KeyboardInterrupt:
print("\n👋 Goodbye!")
break
except Exception as e:
print(f"❌ Error: {e}")
if __name__ == "__main__":
asyncio.run(main())
Conclusion
Most organizations already have access to essential user information. Supplying these details to chat agents, tailored to specific business needs, can significantly enhance contextual understanding and personalization. While there are additional complexities to consider, this post focused on the core memory components in LlamaIndex that empower you to develop smarter, more responsive chat agents.
For more detailed information, check out the original LlamaIndex blog post.
What type of memory do you think would be most valuable for your chat agent use case? Share your thoughts in the comments below!
Top comments (0)