Your AI agent forgets everything the moment it responds. Ask it a follow-up question and it has zero context. Without memory, every interaction starts from scratch.
Here's how to fix that in under 40 lines of Python -- no LangChain, no frameworks, just the standard library and the OpenAI SDK.
The Code
import json
import os
from pathlib import Path
from openai import OpenAI
MEMORY_FILE = "agent_memory.json"
client = OpenAI() # uses OPENAI_API_KEY env var
def load_memory() -> list[dict]:
"""Load conversation history from disk."""
if Path(MEMORY_FILE).exists():
with open(MEMORY_FILE, "r") as f:
return json.load(f)
return []
def save_memory(messages: list[dict]) -> None:
"""Persist conversation history to disk."""
with open(MEMORY_FILE, "w") as f:
json.dump(messages, f, indent=2)
def chat(user_input: str, messages: list[dict]) -> str:
"""Send a message with full conversation history."""
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
*messages
],
)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
save_memory(messages)
return reply
if __name__ == "__main__":
history = load_memory()
print("Agent ready. Type 'quit' to exit.\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() == "quit":
break
print(f"Agent: {chat(user_input, history)}\n")
Save this as agent.py, set your OPENAI_API_KEY, and run it:
pip install openai
export OPENAI_API_KEY="sk-..."
python agent.py
How It Works
load_memory() checks for a local JSON file and loads any previous conversation. If the file doesn't exist, it starts fresh with an empty list. This is your agent's long-term memory -- it survives restarts.
save_memory() writes the full message list to disk after every exchange. The format matches OpenAI's message schema exactly, so there's no translation step.
chat() is where the magic happens. It appends the user's message to the history, sends the entire conversation to the model, then appends the response. The model sees every previous turn, so it can reference earlier context naturally.
The *messages spread in the API call unpacks your history after the system prompt. This keeps the system instruction separate from the conversation flow.
What You'll See
You: My name is Sarah and I'm building a CLI tool in Rust.
Agent: Nice to meet you, Sarah! A CLI tool in Rust is a great
choice. What does it do?
You: What language am I using?
Agent: You're using Rust for your CLI tool.
# Restart the script...
You: What's my name?
Agent: Your name is Sarah!
The agent remembers across messages and across sessions because the JSON file persists.
When This Breaks Down
This approach has two limits you'll hit fast:
Token overflow. Every message gets sent to the model. After ~50 exchanges, you'll exceed the context window. Fix: trim
messagesto the last N entries before the API call, or summarize older messages.No semantic search. The agent remembers everything linearly but can't search its memory by topic. For that, you'd add an embedding store -- but that's a different tutorial.
For most prototypes and personal tools, this flat-file approach works surprisingly well. You get persistent, contextual conversations with zero dependencies beyond the OpenAI SDK.
Next Steps
- Add a
max_historyparameter to cap token usage - Store timestamps with each message for time-aware recall
- Split into short-term (RAM) and long-term (disk) memory layers
Check out the other posts in the AI Agent Quick Tips series for more patterns like retry logic, structured outputs, and human approval gates.
Building agents that need memory, tools, and orchestration out of the box? Nebula handles the infrastructure so you can focus on the logic.
Top comments (0)