DEV Community

Gambi Dev
Gambi Dev

Posted on

How to Give Your AI Agent Persistent Memory Across Runs

Most AI agents have a memory problem.

You build a LangChain agent, a CrewAI crew, or a
custom AutoGen workflow. It runs perfectly. Then the
process ends and it forgets everything — user
preferences, past decisions, what it already tried.

Every run starts from zero.

The problem with context window stuffing

The common workaround is jamming previous context
into the prompt. This causes three problems:

  • You pay for the same tokens every single run
  • As the prompt grows, the model's attention dilutes
  • Eventually you hit the token ceiling and lose the oldest (often most important) context

What agents actually need is an external memory layer
that persists between runs.

A simple solution: two REST calls

Here is the pattern that works:

Store a memory after something important happens:

curl -X POST https://memstore.dev/v1/memory/remember \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "User prefers concise responses 
       and works in Python"}'
Enter fullscreen mode Exit fullscreen mode

Recall relevant context before the next run:

curl "https://memstore.dev/v1/memory/recall?q=user+preferences" \
  -H "Authorization: Bearer YOUR_API_KEY"
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "memories": [{
    "content": "User prefers concise responses 
                and works in Python",
    "score": 0.94
  }]
}
Enter fullscreen mode Exit fullscreen mode

The recall endpoint uses semantic search — it finds
relevant memories even when your query wording
differs from the stored content.

Wiring it into a LangChain agent

import requests

API_KEY = "your_key_here"
BASE = "https://memstore.dev/v1/memory"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

def remember(content, session=None):
    requests.post(f"{BASE}/remember", 
        headers=HEADERS,
        json={"content": content, "session": session})

def recall(query, session=None):
    params = {"q": query}
    if session:
        params["session"] = session
    r = requests.get(f"{BASE}/recall", 
        headers=HEADERS, params=params)
    return r.json().get("memories", [])

# Before your agent runs
memories = recall("user preferences", 
                  session="user_123")
context = "\n".join([m["content"] for m in memories])

# Inject into your prompt
prompt = f"Relevant context:\n{context}\n\nTask: ..."

# After your agent learns something new
remember("User is building a FastAPI app", 
         session="user_123")
Enter fullscreen mode Exit fullscreen mode

The same pattern works for CrewAI

from crewai import Agent, Task, Crew

# Recall memory before crew starts
memories = recall("project context", 
                  session="project_abc")
context = "\n".join([m["content"] for m in memories])

researcher = Agent(
    role="Researcher",
    goal="Find relevant information",
    backstory=f"Previous context: {context}"
)

# After crew finishes, store what was learned
remember(f"Research complete: {result}", 
         session="project_abc")
Enter fullscreen mode Exit fullscreen mode

Why not just use a vector database directly?

You could set up Supabase with pgvector, write
the embedding pipeline, tune the retrieval, and
manage the index. That is roughly 2-4 hours of
setup and ongoing maintenance.

Or two API calls.

Session isolation

Tag memories by user, task, or agent to keep
contexts separate:

# User-specific memory
remember("Prefers dark mode", session="user_8821")

# Task-specific memory  
remember("Found 3 relevant papers", 
         session="research_task_42")

# Recall only within a session
memories = recall("preferences", 
                  session="user_8821")
Enter fullscreen mode Exit fullscreen mode

TTL for short-lived context

Set memories to expire automatically:

requests.post(f"{BASE}/remember",
    headers=HEADERS,
    json={
        "content": "Current task: write unit tests",
        "ttl": 3600  # expires in 1 hour
    })
Enter fullscreen mode Exit fullscreen mode

Getting started

Get a free API key at memstore.dev — 1,000
operations per month, no credit card required.

The free tier is enough to add persistent memory
to your first production agent.


What are you building with AI agents? Drop a comment
— always interested in real use cases.

Top comments (0)