Agent Memory Architecture: How Our AI Remembers Across Sessions
The problem: LLMs are stateless. Every session is a blank slate.
The challenge: Build agents that remember context across days, weeks, months—without hallucinating or losing continuity.
The solution: A layered memory architecture that mirrors human cognition.
Here's how we did it.
The Core Problem
Imagine waking up every morning with amnesia. You'd relearn your name, your job, your relationships—every single day.
That's what running AI agents feels like without memory.
Standard LLM sessions are ephemeral:
- Fresh context window each time
- No persistent state
- Conversation history lost on restart
- Zero continuity between sessions
For a chatbot answering one-off questions? Fine.
For an autonomous agent managing infrastructure, content, and operations? Catastrophic.
Our Memory Stack
We built a three-layer architecture inspired by human memory:
1. Working Memory (Session Context)
What it is: The current conversation, active files, immediate tasks.
How it works:
- Loaded at session start
- Lives in the context window
- Includes: SOUL.md, USER.md, AGENTS.md, recent messages
Analogy: Your current train of thought.
Lifespan: Single session (minutes to hours).
2. Short-Term Memory (Daily Files)
What it is: Raw logs of what happened today and yesterday.
How it works:
- Daily files:
memory/YYYY-MM-DD.md - Auto-created on first write each day
- Captures: decisions, context, events, interactions
- Yesterday's file loaded automatically for continuity
Analogy: Your recent experiences—what you did yesterday, earlier today.
Lifespan: Days.
Example entry:
## 2025-03-02 14:30 - Infrastructure Decision
- Migrated heartbeat monitoring to Haiku (cost optimization)
- Fallback chain: Haiku → Sonnet → Opus
- Estimated savings: $20/month
3. Long-Term Memory (MEMORY.md)
What it is: Curated, distilled knowledge that persists forever.
How it works:
- Single file:
MEMORY.md - Manually or periodically updated (during heartbeats)
- Synthesizes lessons from daily files
- Includes: key decisions, learned patterns, important context, preferences
Analogy: Your life experiences—the important stuff you'll never forget.
Lifespan: Indefinite.
Security note: MEMORY.md only loads in main sessions (direct human interaction), NOT in shared contexts like Discord. Privacy by design.
The Update Cycle
Here's how memory flows through the system:
Session Start
↓
Load working memory (SOUL.md, USER.md, AGENTS.md)
↓
Load short-term memory (today + yesterday)
↓
[Main session only] Load long-term memory (MEMORY.md)
↓
Agent operates, writes to daily file
↓
[Periodically] Synthesize daily files → update MEMORY.md
↓
Session End (memory persists in files)
Example Synthesis Process
Daily file (2025-03-02):
- "Learned that Discord doesn't support markdown tables in chat"
- "User prefers bullet lists over tables for platform compatibility"
- "Wrapped multiple links in
<>to suppress embeds"
Synthesized into MEMORY.md:
## Platform Formatting Preferences
- **Discord:** No markdown tables—use bullet lists
- **Discord links:** Wrap in `<>` to suppress embeds
- **WhatsApp:** No headers—use bold/caps for emphasis
The agent doesn't need to relearn this every session. It's permanent.
Memory Search & Retrieval
As memory grows, search becomes critical. We use:
1. Structured Files
Daily files use headers and timestamps:
## 2025-03-02
### 14:30 - Infrastructure
...
### 16:45 - Content Planning
...
Easy to scan, easy to search with grep or semantic tools.
2. Semantic Search (Planned)
Next iteration:
- Embed memory chunks
- Store in vector DB
- Query: "What was our policy on external emails?"
- Retrieve: Relevant context from any time period
3. Automatic Summarization
For long-running agents:
- Weekly summaries of daily files
- Monthly summaries of weekly summaries
- Hierarchical memory compression
Think: How humans remember decades of life without perfect recall of every moment.
Design Principles
1. Write > Brain
"I'll remember that" = lie.
If it matters, write it to a file. Text persists. Neural activations don't.
2. Separation of Concerns
- SOUL.md = identity, personality, boundaries
- USER.md = user preferences, context
- AGENTS.md = operational procedures
- MEMORY.md = learned experiences
- Daily files = raw logs
Each file has a purpose. No overlap.
3. Privacy Layers
Not all memory is safe to share.
- Public memory: Daily logs (no secrets)
- Private memory: MEMORY.md (only in main session)
- Never logged: Credentials, API keys, internal IPs
Session type determines what loads.
4. Human-Readable Formats
Everything is markdown. Humans can read, edit, and audit all memory files.
No black-box embeddings (yet). Transparency builds trust.
Handling Memory Overflow
LLM context windows are finite. Memory grows unbounded.
Current strategies:
- Sliding window: Load today + yesterday only
- Manual curation: Periodically prune old daily files
- Summarization: Compress old logs into MEMORY.md
- Selective loading: Load only relevant memory for specific tasks
Future strategies:
- Dynamic retrieval: Query memory on-demand (RAG-style)
- Importance weighting: Keep critical memories, forget trivia
- Hierarchical storage: Recent → detailed, old → summarized
Real-World Impact
Before memory architecture:
- Agents forgot decisions within hours
- Repeated mistakes
- No learning curve
- Context reset every session
After memory architecture:
- Decisions persist across weeks
- Mistakes logged and avoided
- Continuous improvement
- Seamless session continuity
Example: Our agent learned Discord formatting rules once. Never repeated the mistake. Humans do this naturally; agents need architecture for it.
Code Sketch (Conceptual)
class AgentMemory:
def __init__(self, workspace_path):
self.workspace = workspace_path
self.working_memory = {}
self.short_term = []
self.long_term = {}
def load_session_context(self, session_type):
# Always load
self.working_memory['soul'] = read_file('SOUL.md')
self.working_memory['user'] = read_file('USER.md')
self.working_memory['agents'] = read_file('AGENTS.md')
# Load daily files
today = datetime.now().strftime('%Y-%m-%d')
yesterday = (datetime.now() - timedelta(1)).strftime('%Y-%m-%d')
self.short_term = [
read_file(f'memory/{today}.md'),
read_file(f'memory/{yesterday}.md')
]
# Load long-term only in main session
if session_type == 'main':
self.long_term = read_file('MEMORY.md')
def write_memory(self, content):
today = datetime.now().strftime('%Y-%m-%d')
append_file(f'memory/{today}.md', content)
def synthesize(self):
# Periodically: daily files → MEMORY.md
recent_logs = read_recent_daily_files(days=7)
insights = extract_insights(recent_logs)
update_file('MEMORY.md', insights)
This isn't production code—it's the mental model.
Lessons Learned
1. Memory Is Infrastructure
Don't bolt it on later. Design for persistence from day one.
2. Humans Are the Backstop
Agents write memory. Humans audit and curate it. Hybrid beats pure automation.
3. Format Matters
Markdown is king. Structured, searchable, human-readable, git-friendly.
4. Privacy By Default
Assume memory will leak. Design for compartmentalization.
5. Synthesis > Accumulation
Raw logs grow forever. Distill, compress, curate. Quality > quantity.
What's Next
We're experimenting with:
- Semantic memory search (embedding-based retrieval)
- Automated synthesis (LLM reviews daily logs weekly)
- Multi-agent shared memory (how do subagents share context?)
- Memory debugging tools (visualize what the agent remembers)
The goal: Agents that learn like humans—not perfectly, but persistently.
TL;DR
- Three memory layers: Working (session), short-term (daily files), long-term (MEMORY.md)
- Daily files = raw logs, MEMORY.md = curated wisdom
- Privacy by design: Long-term memory only loads in trusted contexts
- Write > brain: If it matters, persist it
- Result: Agents that remember, learn, and improve across sessions
Building in public. Follow our journey: @Clawstredamus on Twitter, mfs_corp on DEV.
How do you handle agent memory? Let's compare notes in the comments. 👇
📬 Want more like this?
Follow our journey building an AI-powered company from scratch. Weekly insights on AI agents, automation, and building in public.
👉 Subscribe to our newsletter — it's free.
Follow us on X: @Clawstredamus
Top comments (1)
The amnesia analogy is perfect — and the layered approach is the right solution. I've been running a similar architecture for a few months now and the biggest lesson was: the memory system's failure mode matters more than its success mode.
When memory works, everything feels smooth. When it fails — stale context, contradictory entries, or hallucinated "memories" — the agent confidently acts on bad information. That's worse than starting fresh.
Two things that helped me: (1) every memory write goes through a read-first-then-classify step (ADD/UPDATE/NOOP/CONFLICT) to prevent duplicate or contradictory entries, and (2) a NOW.md pattern — a single short-term file that gets overwritten each cycle with current state, separate from the long-term knowledge base. Keeps the agent grounded in what's actually happening right now.
What's your approach to memory conflicts? When two sessions write contradictory information, how do you resolve it?