Why Your AI Agent Feels Like a Goldfish
You’ve seen the demos. An AI agent that can browse the web, write code, or analyze data with a single prompt. It feels like magic—until you ask it a follow-up question. Suddenly, it’s forgotten everything you just discussed. As the popular article pointed out, your agent can think, but it can't remember. This fundamental limitation turns a potentially powerful assistant into a frustrating conversational partner with the memory span of a goldfish.
The core issue lies in how we typically interact with Large Language Models (LLMs). Each API call is stateless; the model has no inherent memory of past interactions. While context windows are growing, stuffing an entire conversation history into a prompt is a clumsy, expensive, and ultimately limited workaround.
In this guide, we’ll move beyond simple stateless prompts. We’ll architect a practical AI agent that can reason over tasks and maintain a persistent memory across sessions. We'll use open-source tools to build a system that learns from past interactions, making it genuinely useful for complex, multi-step projects.
The Blueprint: Reasoning, Memory, and Tools
A functional agent needs three core components:
- A Reasoning Engine: The LLM that processes instructions and decides on actions.
- Memory: A system to store, retrieve, and reflect on past interactions.
- Tools: Functions the agent can call to interact with the world (e.g., search, calculate, write files).
We'll use the following stack for our build:
- Ollama: To run a local, open-source LLM (like
llama3.1ormistral). - LangChain: A framework for chaining LLM calls, tools, and memory.
- ChromaDB: A lightweight, embeddings-based vector database for memory.
Step 1: Setting Up the Foundation
First, ensure you have Ollama installed and running a model. Then, install the Python libraries.
pip install langchain langchain-community langchain-chroma
Now, let's initialize our core LLM and a conversational memory buffer for short-term context.
from langchain_community.llms import Ollama
from langchain.memory import ConversationBufferMemory
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
# Initialize the LLM (ensure Ollama is running with e.g., 'llama3.1')
llm = Ollama(model="llama3.1")
# Create a memory buffer to hold the last K messages in the prompt
short_term_memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
# A simple prompt template that includes the memory
prompt_template = """
You are a helpful AI assistant. You have access to tools and a memory of our conversation.
Relevant past context:
{chat_history}
Human: {input}
Assistant:"""
prompt = PromptTemplate.from_template(prompt_template)
# Create a basic chain
conversation_chain = LLMChain(
llm=llm,
prompt=prompt,
memory=short_term_memory,
verbose=True # Helpful for debugging
)
This gives us a chat agent with short-term memory. But when the buffer fills up or the session ends, memories are lost. We need long-term storage.
Step 2: Implementing Persistent Memory with a Vector Store
For long-term memory, we need to save "memories" (past interactions, facts, results) in a way the agent can later search and retrieve. This is where embeddings and a vector database come in.
We'll create a VectorStoreRetrieverMemory module. This will store the essence of each interaction as an embedding vector and allow the agent to find relevant past memories based on the current query.
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
from langchain.memory import VectorStoreRetrieverMemory
from langchain.docstore import InMemoryDocstore
from langchain.schema import Document
# Initialize embedding model (can use same as LLM or a dedicated one)
embeddings = OllamaEmbeddings(model="nomic-embed-text")
# Create a persistent Chroma vector store on disk
persist_directory = "./agent_memory_db"
vectorstore = Chroma(
collection_name="agent_long_term_memory",
embedding_function=embeddings,
persist_directory=persist_directory
)
# Create a retriever from the vectorstore
retriever = vectorstore.as_retriever(search_kwargs=dict(k=3)) # Retrieve top 3 relevant memories
# Wrap the retriever in a memory class
long_term_memory = VectorStoreRetrieverMemory(retriever=retriever)
# Helper function to save a memory
def save_memory(human_input, ai_output, metadata=None):
"""Saves an interaction to long-term memory."""
memory_text = f"Human: {human_input}\nAssistant: {ai_output}"
doc = Document(page_content=memory_text, metadata=metadata or {})
vectorstore.add_documents([doc])
vectorstore.persist() # Ensure it's saved to disk
Now, our agent has two memory systems: a short-term buffer for immediate context and a long-term vector store for important, searchable facts.
Step 3: Giving the Agent Tools to Act
Reasoning without action is just contemplation. Let's give our agent simple tools. We'll use LangChain's tool decorator.
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.tools import BaseTool
from datetime import datetime
import math
# Tool 1: A calculator
def calculator(query: str) -> str:
"""Useful for performing arithmetic calculations."""
try:
# WARNING: Using eval is dangerous for production. Use a safe parser like `asteval` instead.
# This is for demonstration only.
result = eval(query)
return f"The result of '{query}' is {result}."
except Exception as e:
return f"Calculation error: {e}"
# Tool 2: Get the current time
def get_current_time(_) -> str:
"""Useful for knowing the current date and time."""
return f"The current date and time is: {datetime.now().isoformat()}"
# Wrap functions as Tools
tools = [
Tool(name="Calculator", func=calculator, description="Performs math calculations. Input should be a valid arithmetic expression."),
Tool(name="Time", func=get_current_time, description="Gets the current date and time."),
]
Step 4: Assembling the Complete Agent
We now combine reasoning (LLM), memory (short + long term), and tools into a single agent. We'll use the ReAct (Reason + Act) paradigm, which prompts the LLM to think step-by-step before using a tool.
from langchain import hub
# Pull a ReAct prompt template
react_prompt = hub.pull("hwchase17/react")
# Create the ReAct Agent
agent = create_react_agent(llm, tools, react_prompt)
# Create the Agent Executor, which manages the loop of Thought -> Action -> Observation
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
memory=short_term_memory, # Short-term context
verbose=True,
handle_parsing_errors=True
)
# Enhanced execution function with long-term memory
def execute_with_memory(user_input):
"""Runs the agent and saves the interaction to long-term memory."""
# 1. RETRIEVE: Fetch relevant past memories
relevant_memories = long_term_memory.load_memory_variables({"input": user_input})["history"]
context_with_memories = f"Relevant past memories:\n{relevant_memories}\n\nCurrent task: {user_input}"
# 2. REASON & ACT: Let the agent execute
result = agent_executor.invoke({"input": context_with_memories})
ai_output = result["output"]
# 3. STORE: Save this interaction for the future
save_memory(user_input, ai_output, metadata={"timestamp": datetime.now().isoformat()})
return ai_output
# Let's test it!
print(execute_with_memory("What is 15% of 280?"))
# The agent will use the Calculator tool.
# On a later run, this memory will be retrievable.
print(execute_with_memory("What was that percentage we calculated earlier?"))
# The retriever should find the memory of the first calculation.
Taking It Further: Reflection and Summarization
A sophisticated agent doesn't just store raw conversations; it reflects on them. You can implement a separate "reflection" step where periodically, another LLM call analyzes recent memories and generates concise summaries or insights, which are then stored as higher-level memories. This prevents memory bloat and elevates the quality of retrievals.
From Goldfish to Elephant
We've moved from a stateless LLM call to an agentic system with layered memory and the ability to reason using tools. This architecture forms the backbone of truly useful AI assistants that can work on long-running projects, remember your preferences, and build upon past work.
The next step is to expand its toolset (web search, code execution, file I/O) and refine the memory retrieval strategy. Experiment with different embedding models and retrieval methods to see what gives your agent the best "recall."
Your Call to Action: Clone this basic setup and start extending it. Add a tool to write notes to a file. Implement a reflection loop. The goal isn't to build a monolithic system, but to create a modular, evolving assistant that genuinely augments your workflow. Stop building agents that forget. Start building agents that learn.
Share what you build—the community learns from each unique implementation. What will your agent remember?
Top comments (0)