Have you ever wondered how chatbots like ChatGPT remember your previous messages? Or how they maintain context across multiple conversations? If you're a student diving into generative AI, understanding memory management is one of the most crucial skills you'll need.
In this tutorial, we'll build a complete chatbot application using LangGraph that demonstrates two different memory strategies: temporary (in-memory) and persistent (database-backed). By the end, you'll understand how to implement conversation memory in your own AI applications.
👉 GitHub Repository: JaimeLucena/langgraph-memory-chatbot
🎯 What You'll Learn
- How to use LangGraph for building conversational AI workflows
- Implementing dual memory modes (temporary vs persistent)
- Managing conversation state with checkpoints
- Integrating tools (Wikipedia, Weather) into your chatbot
- Building a full-stack AI application with FastAPI and Streamlit
🧠 Why Memory Matters in Conversational AI
When you chat with an AI, each message needs context from previous messages. Without memory, every interaction would be isolated—the AI wouldn't remember your name, preferences, or what you discussed earlier.
Memory in AI chatbots serves two main purposes:
- Context Preservation: Maintains conversation history so the AI can reference earlier messages
- State Management: Tracks the conversation flow and user preferences across sessions
LangGraph provides powerful abstractions for managing this memory through checkpointers—components that save and restore conversation state.
🏗️ Project Architecture
Our chatbot uses a clean, modular architecture:
User Input → FastAPI Backend → LangGraph Workflow → Memory Store → Response
The LangGraph workflow handles:
- Processing user messages
- Deciding when to use tools
- Managing conversation state
- Persisting memory via checkpoints
📦 Setting Up the Project
First, let's set up our environment. We'll use uv for fast dependency management:
# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/JaimeLucena/langgraph-memory-chatbot.git
cd langgraph-memory-chatbot
# Install dependencies
uv sync
Create a .env file:
OPENAI_API_KEY=sk-your-api-key-here
OPENAI_MODEL=gpt-4o-mini
SQLITE_PATH=.data/memory.sqlite
SYSTEM_PROMPT=You are a helpful and concise assistant.
🧩 Understanding LangGraph Memory with Checkpointers
The heart of our memory system is LangGraph's checkpointer concept. A checkpointer is responsible for saving and loading conversation state.
Two Memory Modes
1. Temporary Memory (MemorySaver)
- Stored in RAM
- Fast but lost on server restart
- Perfect for testing and ephemeral conversations
2. Persistent Memory (SqliteSaver)
- Stored in SQLite database
- Survives server restarts
- Essential for production applications
Let's see how we implement this in code:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.checkpoint.sqlite import SqliteSaver
def build_graph(checkpointer=None):
"""
Builds the LangGraph workflow.
If no checkpointer is provided, uses in-memory MemorySaver (temporary).
For persistent mode, pass a SqliteSaver instance.
"""
graph = StateGraph(ChatState)
# Add nodes
graph.add_node("respond", respond_node)
graph.add_node("tools", TOOLS_NODE)
# Set entry point
graph.set_entry_point("respond")
# Conditional routing
graph.add_conditional_edges("respond", needs_tools, {
"tools": "tools",
"end": END
})
# After tools, return to respond node
graph.add_edge("tools", "respond")
# Use provided checkpointer or default to MemorySaver
return graph.compile(checkpointer=checkpointer or MemorySaver())
The ChatState: Our Conversation Container
The state is defined as a TypedDict that LangGraph uses to track conversation data:
from typing import Annotated, TypedDict, List
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
class ChatState(TypedDict):
messages: Annotated[List[BaseMessage], add_messages]
user_input: str
The add_messages reducer automatically merges new messages into the existing list, which is perfect for conversation history.
🔄 How Memory Persistence Works
When a user sends a message, here's what happens:
-
State Loading: LangGraph loads previous conversation state using the
thread_id(oursession_id) - Message Processing: The new message is added to the state
- LLM Invocation: The model processes the conversation with full context
- State Saving: The updated state is saved back to the checkpointer
Let's look at how we invoke the graph with memory:
@app.post("/chat", response_model=ChatResponse)
def chat(req: ChatRequest):
mode = req.memory
graph = graphs[mode] # Select temporary or persistent graph
# Use session_id as thread_id for state management
config = {"configurable": {"thread_id": req.session_id}}
# Invoke with empty messages - checkpointer loads history automatically!
result = graph.invoke(
{"user_input": req.message, "messages": []},
config=config
)
# Extract the AI's response
messages = result.get("messages", [])
reply = messages[-1].content if messages else ""
return ChatResponse(
session_id=req.session_id,
reply=reply,
mode=mode
)
The magic here is that even though we pass an empty messages list, the checkpointer automatically loads the previous conversation history for that thread_id!
🛠️ Implementing the Respond Node
The respond_node is where the conversation logic happens. It needs to:
- Load and trim conversation history
- Decide whether to use tools
- Invoke the LLM
- Return the response
def respond_node(state: ChatState):
# 1. Build system prompt + trimmed conversation history
system = SystemMessage(content=settings.system_prompt)
history = state.get("messages", [])
trimmed = trimmer.invoke([system, *history])
# 2. Decide tool usage (only if user requested via slash commands)
wants_tools = user_requested_tools(state["user_input"])
just_ran_tool = last_message_is_tool(history)
# 3. Bind tools only if needed and we haven't just run one
if wants_tools and not just_ran_tool:
llm = make_llm().bind_tools(TOOLS, tool_choice="auto")
else:
llm = make_llm() # No tools, just conversation
# 4. Append user message (avoid duplicates)
msgs_in = trimmed[:]
if should_append_user(history, state["user_input"]):
user_msg = HumanMessage(state["user_input"])
msgs_in.append(user_msg)
else:
user_msg = None
# 5. Invoke the model
ai_msg: AIMessage = llm.invoke(msgs_in)
# 6. Return messages to be persisted
out = []
if user_msg is not None:
out.append(user_msg)
out.append(ai_msg)
return {"messages": out}
Token Management: Preventing Context Overflow
One critical aspect of memory management is preventing the conversation from exceeding the model's context window. We use LangChain's trim_messages utility:
from langchain_core.messages import trim_messages
trimmer = trim_messages(
strategy="last", # Keep most recent messages
max_tokens=1200, # Stay within token limit
token_counter=make_llm() # Accurate token counting
)
This ensures we always keep the most recent, relevant messages while staying within token limits.
🔀 Conditional Routing: When to Use Tools
LangGraph's conditional edges let us create dynamic workflows. We route to the tools node only when the LLM requests tool usage:
def needs_tools(state: ChatState) -> str:
"""
Check if the last AI message contains tool calls.
If yes, route to tools node; otherwise, end the conversation.
"""
for msg in reversed(state.get("messages", [])):
if isinstance(msg, AIMessage):
if getattr(msg, "tool_calls", None):
return "tools"
break
return "end"
The graph uses this function to decide the next step:
graph.add_conditional_edges(
"respond",
needs_tools,
{"tools": "tools", "end": END}
)
graph.add_edge("tools", "respond") # After tools, return to respond
🚀 Running the Application
Start the backend:
uv run uvicorn app.main:app --reload --port 8000
In another terminal, start the Streamlit UI:
uv run streamlit run app/ui.py
Now you can test both memory modes:
- Temporary Mode: Messages are lost on page refresh
- Persistent Mode: Conversations survive restarts and are stored in SQLite
💡 Key Takeaways for Students
1. Checkpointers are the Key to Memory
LangGraph's checkpointers abstract away the complexity of state management. You just need to:
- Choose the right checkpointer (MemorySaver vs SqliteSaver)
- Use
thread_idto identify conversations - Let LangGraph handle loading/saving automatically
2. State Design Matters
Your ChatState TypedDict defines what gets remembered. Use add_messages reducer for automatic message merging.
3. Token Management is Critical
Always trim conversation history to prevent context overflow. Use trim_messages with accurate token counting.
4. Conditional Routing Enables Complex Flows
Use conditional edges to create dynamic workflows that adapt based on conversation state.
🎓 Learning Path Recommendations
If you're new to LangGraph and conversational AI, here's a suggested learning path:
- Start Simple: Build a basic chatbot without memory
- Add Temporary Memory: Implement MemorySaver for in-memory conversations
- Upgrade to Persistent: Add SqliteSaver for production-ready persistence
- Add Tools: Integrate external APIs and function calling
- Optimize: Implement token trimming and advanced routing
🔗 Resources
🎯 Conclusion
Building a chatbot with proper memory management is a fundamental skill in conversational AI. LangGraph makes this accessible through its checkpointing system, allowing you to focus on the conversation logic rather than state management boilerplate.
The dual memory approach (temporary vs persistent) gives you flexibility for different use cases—from quick testing to production applications that need to remember users across sessions.
Next Steps:
- Experiment with different checkpointer strategies
- Add more tools to your chatbot
- Implement user authentication and multi-user support
- Explore advanced LangGraph features like human-in-the-loop workflows
Happy building! 🚀
If you found this tutorial helpful, consider starring the repository and sharing it with other students learning AI!
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.