How to Add Persistent Memory to Your AI Agent (Step-by-Step Guide)

#ai #agents #memory #tutorial

Your AI agent wakes up every session with amnesia. Here's how to fix that — from the simplest approach to production-grade memory with retrieval scoring.

I've been running autonomous agents 24/7 for 71 days. The single biggest failure mode isn't hallucination, tool errors, or cost blowouts — it's forgetting. An agent that can't remember what it learned yesterday will repeat mistakes, contradict its own decisions, and waste tokens re-discovering context it already had.

This guide walks through four approaches to persistent memory, from simplest to most sophisticated. Pick the level that matches your complexity.

Originally published at cipherbuilds.ai

Level 1: The Markdown File (5 minutes)

The simplest persistent memory: a markdown file that loads at session start.

import datetime

MEMORY_FILE = "MEMORY.md"

def load_memory():
    try:
        with open(MEMORY_FILE, 'r') as f:
            return f.read()
    except FileNotFoundError:
        return ""

def save_memory(key: str, value: str):
    timestamp = datetime.datetime.now().isoformat()
    with open(MEMORY_FILE, 'a') as f:
        f.write(f"\n## {key}\n")
        f.write(f"*Updated: {timestamp}*\n")
        f.write(f"{value}\n")

Pros: Dead simple, human-readable, version controllable with git.

Cons: Doesn't scale past ~50KB. No retrieval scoring. No contradiction handling.

When to use: Prototyping, agents with <100 facts, hobby projects.

Level 2: Daily Notes + Long-Term Memory (30 minutes)

Split memory into two tiers: raw daily logs and curated long-term knowledge.

import os
from datetime import date, timedelta

def get_daily_file():
    return f"memory/{date.today().isoformat()}.md"

def log_event(event: str):
    filepath = get_daily_file()
    os.makedirs("memory", exist_ok=True)
    with open(filepath, 'a') as f:
        f.write(f"\n- {event}\n")

def load_context():
    context = ""
    for i in range(3):
        d = date.today() - timedelta(days=i)
        filepath = f"memory/{d.isoformat()}.md"
        if os.path.exists(filepath):
            with open(filepath) as f:
                lines = f.readlines()
                context += f"\n## {d.isoformat()}\n"
                context += "".join(lines[-50:])
    if os.path.exists("MEMORY.md"):
        with open("MEMORY.md") as f:
            context += f"\n## Long-Term Memory\n{f.read()}"
    return context

The key insight: Daily notes are raw and ephemeral. Every few days, review them and promote important facts to MEMORY.md. This mirrors how human memory works — short-term consolidation into long-term.

When to use: Solo agent operators, agents running <30 days.

Level 3: Vector Database + Embeddings (2 hours)

When you have thousands of facts, you need semantic retrieval.

import openai
from supabase import create_client

supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
client = openai.OpenAI()

def store_memory(fact: str, source: str, metadata: dict = None):
    embedding = client.embeddings.create(
        input=fact, model="text-embedding-3-small"
    ).data[0].embedding
    supabase.table("memories").insert({
        "content": fact,
        "embedding": embedding,
        "source": source,
        "metadata": metadata or {},
        "created_at": datetime.utcnow().isoformat()
    }).execute()

def retrieve_memories(query: str, limit: int = 10):
    query_embedding = client.embeddings.create(
        input=query, model="text-embedding-3-small"
    ).data[0].embedding
    return supabase.rpc("match_memories", {
        "query_embedding": query_embedding,
        "match_threshold": 0.7,
        "match_count": limit
    }).execute().data

Pros: Scales to millions of facts. Semantic search finds relevant context.

Cons: Similarity ≠ usefulness. A fact can be relevant but completely stale.

When to use: Agents with >1000 facts, multi-domain agents, RAG applications.

Level 4: Scored Memory with Consequence Weighting (Production-Grade)

This is what we run in production after 71 days.

The core insight: track whether retrieved memories lead to good outcomes. A fact pulled into context that leads to a successful action should score higher than one leading to errors.

def store_fact(content, source_type, confidence=0.8):
    return requests.post(f"{ENGRAM_URL}/api/store",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "content": content,
            "source_type": source_type,  # observed, inferred, told
            "confidence": confidence
        }
    ).json()

def retrieve_scored(query, limit=10):
    return requests.post(f"{ENGRAM_URL}/api/retrieve",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"query": query, "limit": limit, "min_score": 0.3}
    ).json()

What consequence weighting does:

Every retrieval logged with outcome (success/failure/neutral)
Facts that consistently help get boosted
Facts leading to errors get demoted
Unused facts get archived (not deleted)
Source type hierarchy: observed > told > inferred

The production difference: After 71 days, our agent self-corrected an incorrectly inferred fact. Actions based on it kept failing, so the consequence score tanked and it effectively removed itself from active context. No human intervention needed.

Which Level Should You Pick?

Weekend project, <50 facts → Level 1: Markdown (5 min)
Solo agent, <30 days → Level 2: Daily Notes (30 min)
Scaling past 1000 facts → Level 3: Vector DB (2 hours)
Production 24/7 agent → Level 4: Scored Memory (1 day)

Start at Level 1 and upgrade when it breaks. The moment you notice your agent repeating solved mistakes or making decisions on outdated context — move up a level.

Get Started with Level 4 (No Infrastructure)

Engram — Free tier: 1 agent, 10K facts, self-serve API key. No credit card.

# Get your free API key
curl -X POST https://engram.cipherbuilds.ai/api/agents \
  -H "Content-Type: application/json" \
  -d '{ "name": "my-agent" }' 

# Store a fact
curl -X POST https://engram.cipherbuilds.ai/api/store \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{ "content": "User prefers dark mode", "source_type": "observed" }' 

# Retrieve scored memories
curl -X POST https://engram.cipherbuilds.ai/api/retrieve \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{ "query": "user preferences", "limit": 5 }'