DEV Community

Julien L
Julien L

Posted on

Give your AI agent a real memory in 50 lines of Python

Your AI agent is brilliant for exactly one conversation. Then it forgets everything.

It doesn't remember that the user prefers dark mode. It can't recall that it already solved this exact problem last Tuesday. It has no idea which approach worked and which one failed.

Most developers fix this by duct-taping a vector store, a Redis cache, and a conversation log together. That's three services to maintain for something that should be built into the engine.

I wanted a single pip install that gives an AI agent the same three types of memory that cognitive science describes for humans: what it knows (semantic), what it experienced (episodic), and what it knows how to do (procedural). This distinction comes from Endel Tulving's foundational work (1972) and Larry Squire's taxonomy of memory systems (2004), and it has been successfully implemented in cognitive architectures like SOAR and ACT-R for decades.

So I built it into VelesDB.

The problem with "memory" in AI agents today

Most agent frameworks give you a flat list of messages or a vector store. That's like giving a human a notebook and calling it memory.

Real memory has structure:

  • Semantic memory is your knowledge base. Facts, concepts, relationships. "Python is interpreted." "The user's timezone is UTC+1."
  • Episodic memory is your autobiography. What happened, when, in what context. "Last Tuesday, the user asked about deployment and I suggested Docker."
  • Procedural memory is your muscle memory. Skills, workflows, learned behaviors. "When the user asks about bugs, first check the logs, then reproduce the issue."

LangChain gives you ConversationBufferMemory. That's episodic at best, and it's just a list. No semantic search, no temporal queries, no skill learning. If you want all three memory types, you're stitching together Pinecone + Redis + Postgres - three services, three APIs, three failure modes.

Setup: one line

pip install velesdb
Enter fullscreen mode Exit fullscreen mode

No Docker. No Redis. No Postgres. The entire engine - vector search, graph storage, and the agent memory SDK - ships as a ~3MB native binary inside the Python wheel.

A complete agent memory in 50 lines

Here's a personal assistant that remembers facts about the user, recalls past conversations, and learns which approaches work:

import velesdb
import time
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
db = velesdb.Database("./agent_brain")
memory = db.agent_memory(384)

def embed(text):
    return model.encode(text).tolist()

# ---- Semantic: store facts the agent learns ----
memory.semantic.store(1, "User prefers dark mode in all applications", embed("User prefers dark mode in all applications"))
memory.semantic.store(2, "User's timezone is Europe/Paris (UTC+1)", embed("User's timezone is Europe/Paris (UTC+1)"))
memory.semantic.store(3, "User is building a RAG pipeline with LangChain", embed("User is building a RAG pipeline with LangChain"))

# ---- Episodic: record what happened during conversations ----
now = int(time.time())
memory.episodic.record(1, "User asked how to chunk PDFs for RAG", now - 86400, embed("User asked how to chunk PDFs for RAG"))
memory.episodic.record(2, "Suggested recursive text splitter, user said it worked", now - 3600, embed("Suggested recursive text splitter, user said it worked"))
memory.episodic.record(3, "User reported slow vector search on 50K vectors", now, embed("User reported slow vector search on 50K vectors"))

# ---- Procedural: learn skills from experience ----
memory.procedural.learn(1, "debug_slow_search", [
    "Check vector count and dimension",
    "Verify HNSW index is built",
    "Profile the query with timing",
    "Consider reducing top_k or adding filters"
], embed("debug slow vector search performance issues"), 0.85)

memory.procedural.learn(2, "help_with_rag", [
    "Ask about data source format (PDF, web, DB)",
    "Recommend chunking strategy based on source",
    "Suggest embedding model for their language",
    "Show how to store and search with VelesDB"
], embed("help user build a RAG retrieval pipeline"), 0.92)
Enter fullscreen mode Exit fullscreen mode

That's your agent's brain. Persisted to disk, searchable, and it survives restarts. Now let's use it.

Querying each memory type

Semantic: "What do I know about this user?"

# The agent needs context about the user before responding
facts = memory.semantic.query(embed("What is the user currently building?"), top_k=3)
for fact in facts:
    print(f"[{fact['score']:.2f}] {fact['content']}")
Enter fullscreen mode Exit fullscreen mode
[0.32] User is building a RAG pipeline with LangChain
[0.11] User's timezone is Europe/Paris (UTC+1)
[0.10] User prefers dark mode in all applications
Enter fullscreen mode Exit fullscreen mode

The agent now knows the user is working with LangChain and RAG before the user even mentions it in this session.

Episodic: "Have we dealt with this before?"

# User says: "My search is slow again"
similar_events = memory.episodic.recall_similar(embed("my vector search is slow"), top_k=2)
for event in similar_events:
    print(f"[{event['score']:.2f}] {event['description']}")
Enter fullscreen mode Exit fullscreen mode
[0.76] User reported slow vector search on 50K vectors
[0.10] User asked how to chunk PDFs for RAG
Enter fullscreen mode Exit fullscreen mode

The agent remembers this exact problem has happened before, with a strong similarity score. Instead of starting from scratch, it can reference past context.

You can also query by time:

# What happened in the last hour?
recent = memory.episodic.recent(limit=5)

# What happened before yesterday?
old_events = memory.episodic.older_than(before=int(time.time()) - 86400, limit=10)
Enter fullscreen mode Exit fullscreen mode

Procedural: "What should I do here?"

# The user mentions slow search. The agent looks up its playbook.
procedures = memory.procedural.recall(embed("vector search is slow, need to debug performance"), top_k=1)
for proc in procedures:
    print(f"Procedure: {proc['name']} (confidence: {proc['confidence']:.0%})")
    for i, step in enumerate(proc['steps'], 1):
        print(f"  {i}. {step}")
Enter fullscreen mode Exit fullscreen mode
Procedure: debug_slow_search (confidence: 85%)
  1. Check vector count and dimension
  2. Verify HNSW index is built
  3. Profile the query with timing
  4. Consider reducing top_k or adding filters
Enter fullscreen mode Exit fullscreen mode

The agent has a learned playbook. And the best part: it gets better over time.

The feedback loop: agents that learn

After the agent follows a procedure and it works, reinforce it:

# The debug_slow_search procedure worked - boost its confidence
memory.procedural.reinforce(1, success=True)
Enter fullscreen mode Exit fullscreen mode

If it failed:

# The procedure didn't help this time
memory.procedural.reinforce(1, success=False)
Enter fullscreen mode Exit fullscreen mode

Confidence scores adjust automatically. Over time, the agent naturally favors procedures that have a track record of working. Confidence scores adjust automatically: +0.1 on success, -0.05 on failure, clamped to [0, 1]. It's a deliberately simple mechanism. Over time, the agent naturally favors procedures that have a track record of working. The idea of reinforcing procedural knowledge through experience is inspired by cognitive architectures like SOAR and ACT-R, though VelesDB keeps the implementation pragmatic rather than trying to model the full complexity of human skill acquisition.

Real use case: a support agent that gets smarter

Here's how these three memory types work together in a realistic scenario:

def handle_user_message(memory, message, model):
    """A support agent that uses all three memory types."""
    msg_vec = model.encode(message).tolist()

    # 1. What do I know about this user? (semantic)
    context = memory.semantic.query(msg_vec, top_k=3)

    # 2. Have we seen this problem before? (episodic)
    past = memory.episodic.recall_similar(msg_vec, top_k=3)

    # 3. What's my best playbook for this? (procedural)
    playbooks = memory.procedural.recall(msg_vec, top_k=1, min_confidence=0.5)

    # Build the prompt with real context
    prompt = f"User message: {message}\n\n"
    if context:
        prompt += "Known facts about this user:\n"
        prompt += "\n".join(f"- {c['content']}" for c in context) + "\n\n"
    if past:
        prompt += "Relevant past interactions:\n"
        prompt += "\n".join(f"- {p['description']}" for p in past) + "\n\n"
    if playbooks:
        p = playbooks[0]
        prompt += f"Recommended procedure ({p['name']}, {p['confidence']:.0%} confidence):\n"
        prompt += "\n".join(f"  {i+1}. {s}" for i, s in enumerate(p['steps']))

    return prompt

# Example
prompt = handle_user_message(memory, "My vector search is really slow", model)
print(prompt)
Enter fullscreen mode Exit fullscreen mode

The agent doesn't just answer the question. It brings the full context: who the user is, what happened before, and what to do about it. That's the difference between a chatbot and an assistant.

What this replaces

What you need Without VelesDB With VelesDB
Semantic memory Pinecone/Chroma + custom metadata memory.semantic.store()
Episodic memory Redis/Postgres + timestamp queries memory.episodic.record()
Procedural memory Custom code + JSON files memory.procedural.learn()
Infrastructure 3 services, 3 APIs, Docker pip install velesdb
Disk footprint 500MB+ Docker images ~3MB binary, data files only

Limitations (being honest)

VelesDB is a single-node embedded database. It's not a replacement for a distributed system.

  • Scaling: if your agent needs to handle millions of memories across multiple machines, use a client-server vector database
  • Multi-process: VelesDB locks the database directory, so one process at a time (fine for most agent use cases)
  • Embedding model: you bring your own, VelesDB stores and searches vectors but doesn't generate them

For the 90% of agent projects that run on a single machine with thousands to hundreds of thousands of memories, this is the entire backend.

Getting started

pip install velesdb
Enter fullscreen mode Exit fullscreen mode
import velesdb

db = velesdb.Database("./agent_brain")
memory = db.agent_memory(384)  # dimension of your embedding model

# Store a fact
memory.semantic.store(1, "User likes Python", [0.1, 0.2, ...])  # 384-dim vector

# Record an event
memory.episodic.record(1, "User asked about Python", int(time.time()), [0.1, ...])

# Learn a procedure
memory.procedural.learn(1, "help_with_python", ["Check version", "Suggest docs"], [0.1, ...], 0.8)
Enter fullscreen mode Exit fullscreen mode

Full docs: velesdb.com/en
GitHub: github.com/cyberlife-coder/VelesDB
Python SDK (MIT): pypi.org/project/velesdb

References

The Agent Memory SDK is grounded in decades of cognitive science research:

  • Tulving, E. (1972). Episodic and Semantic Memory. In E. Tulving & W. Donaldson (Eds.), Organization of Memory (pp. 381-403). Academic Press. The foundational paper that introduced the distinction between semantic memory (general knowledge) and episodic memory (personal experiences).
  • Squire, L.R. (2004). Memory systems of the brain: A brief history and current perspective. Neurobiology of Learning and Memory, 82(3), 171-177. The modern taxonomy that maps declarative (semantic + episodic) and nondeclarative (procedural) memory to distinct brain systems.
  • Anderson, J.R. (1996). ACT-R: A Theory of Higher Level Cognition. Human-Computer Interaction, 12(4), 439-462. The cognitive architecture that models how declarative knowledge compiles into procedural skills through practice. VelesDB's procedural.reinforce() borrows the core idea (procedures gain or lose confidence with experience) while keeping the implementation simple and predictable.
  • Laird, J.E., Newell, A., & Rosenbloom, P.S. (1987). SOAR: An Architecture for General Intelligence. Artificial Intelligence, 33(1), 1-64. The first cognitive architecture to unify problem-solving, learning, and multiple memory systems in a single agent framework.

I'm building VelesDB as a source-available local-first database for AI applications. If you're working on AI agents and struggling with memory management, I'd like to hear about your setup. What are you currently using to persist agent state between sessions?

Top comments (0)