MFS CORP

Posted on Mar 1 • Edited on Mar 7

Agent Memory Architecture: How Our AI Remembers Across Sessions

#ai #architecture #memory #programming

Agent Memory Architecture: How Our AI Remembers Across Sessions

The problem: LLMs are stateless. Every session is a blank slate.

The challenge: Build agents that remember context across days, weeks, months—without hallucinating or losing continuity.

The solution: A layered memory architecture that mirrors human cognition.

Here's how we did it.

The Core Problem

Imagine waking up every morning with amnesia. You'd relearn your name, your job, your relationships—every single day.

That's what running AI agents feels like without memory.

Standard LLM sessions are ephemeral:

Fresh context window each time
No persistent state
Conversation history lost on restart
Zero continuity between sessions

For a chatbot answering one-off questions? Fine.

For an autonomous agent managing infrastructure, content, and operations? Catastrophic.

Our Memory Stack

We built a three-layer architecture inspired by human memory:

1. Working Memory (Session Context)

What it is: The current conversation, active files, immediate tasks.

How it works:

Loaded at session start
Lives in the context window
Includes: SOUL.md, USER.md, AGENTS.md, recent messages

Analogy: Your current train of thought.

Lifespan: Single session (minutes to hours).

2. Short-Term Memory (Daily Files)

What it is: Raw logs of what happened today and yesterday.

How it works:

Daily files: memory/YYYY-MM-DD.md
Auto-created on first write each day
Captures: decisions, context, events, interactions
Yesterday's file loaded automatically for continuity

Analogy: Your recent experiences—what you did yesterday, earlier today.

Lifespan: Days.

Example entry:

## 2025-03-02 14:30 - Infrastructure Decision
- Migrated heartbeat monitoring to Haiku (cost optimization)
- Fallback chain: Haiku → Sonnet → Opus
- Estimated savings: $20/month

3. Long-Term Memory (MEMORY.md)

What it is: Curated, distilled knowledge that persists forever.

How it works:

Single file: MEMORY.md
Manually or periodically updated (during heartbeats)
Synthesizes lessons from daily files
Includes: key decisions, learned patterns, important context, preferences

Analogy: Your life experiences—the important stuff you'll never forget.

Lifespan: Indefinite.

Security note: MEMORY.md only loads in main sessions (direct human interaction), NOT in shared contexts like Discord. Privacy by design.

The Update Cycle

Here's how memory flows through the system:

Session Start
    ↓
Load working memory (SOUL.md, USER.md, AGENTS.md)
    ↓
Load short-term memory (today + yesterday)
    ↓
[Main session only] Load long-term memory (MEMORY.md)
    ↓
Agent operates, writes to daily file
    ↓
[Periodically] Synthesize daily files → update MEMORY.md
    ↓
Session End (memory persists in files)

Example Synthesis Process

Daily file (2025-03-02):

"Learned that Discord doesn't support markdown tables in chat"
"User prefers bullet lists over tables for platform compatibility"
"Wrapped multiple links in <> to suppress embeds"

Synthesized into MEMORY.md:

## Platform Formatting Preferences
- **Discord:** No markdown tables—use bullet lists
- **Discord links:** Wrap in `<>` to suppress embeds
- **WhatsApp:** No headers—use bold/caps for emphasis

The agent doesn't need to relearn this every session. It's permanent.

Memory Search & Retrieval

As memory grows, search becomes critical. We use:

1. Structured Files

Daily files use headers and timestamps:

## 2025-03-02

### 14:30 - Infrastructure
...

### 16:45 - Content Planning
...

Easy to scan, easy to search with grep or semantic tools.

2. Semantic Search (Planned)

Next iteration:

Embed memory chunks
Store in vector DB
Query: "What was our policy on external emails?"
Retrieve: Relevant context from any time period

3. Automatic Summarization

For long-running agents:

Weekly summaries of daily files
Monthly summaries of weekly summaries
Hierarchical memory compression

Think: How humans remember decades of life without perfect recall of every moment.

Design Principles

1. Write > Brain

"I'll remember that" = lie.

If it matters, write it to a file. Text persists. Neural activations don't.

2. Separation of Concerns

SOUL.md = identity, personality, boundaries
USER.md = user preferences, context
AGENTS.md = operational procedures
MEMORY.md = learned experiences
Daily files = raw logs

Each file has a purpose. No overlap.

3. Privacy Layers

Not all memory is safe to share.

Public memory: Daily logs (no secrets)
Private memory: MEMORY.md (only in main session)
Never logged: Credentials, API keys, internal IPs

Session type determines what loads.

4. Human-Readable Formats

Everything is markdown. Humans can read, edit, and audit all memory files.

No black-box embeddings (yet). Transparency builds trust.

Handling Memory Overflow

LLM context windows are finite. Memory grows unbounded.

Current strategies:

Sliding window: Load today + yesterday only
Manual curation: Periodically prune old daily files
Summarization: Compress old logs into MEMORY.md
Selective loading: Load only relevant memory for specific tasks

Future strategies:

Dynamic retrieval: Query memory on-demand (RAG-style)
Importance weighting: Keep critical memories, forget trivia
Hierarchical storage: Recent → detailed, old → summarized

Real-World Impact

Before memory architecture:

Agents forgot decisions within hours
Repeated mistakes
No learning curve
Context reset every session

After memory architecture:

Decisions persist across weeks
Mistakes logged and avoided
Continuous improvement
Seamless session continuity

Example: Our agent learned Discord formatting rules once. Never repeated the mistake. Humans do this naturally; agents need architecture for it.

Code Sketch (Conceptual)

class AgentMemory:
    def __init__(self, workspace_path):
        self.workspace = workspace_path
        self.working_memory = {}
        self.short_term = []
        self.long_term = {}

    def load_session_context(self, session_type):
        # Always load
        self.working_memory['soul'] = read_file('SOUL.md')
        self.working_memory['user'] = read_file('USER.md')
        self.working_memory['agents'] = read_file('AGENTS.md')

        # Load daily files
        today = datetime.now().strftime('%Y-%m-%d')
        yesterday = (datetime.now() - timedelta(1)).strftime('%Y-%m-%d')
        self.short_term = [
            read_file(f'memory/{today}.md'),
            read_file(f'memory/{yesterday}.md')
        ]

        # Load long-term only in main session
        if session_type == 'main':
            self.long_term = read_file('MEMORY.md')

    def write_memory(self, content):
        today = datetime.now().strftime('%Y-%m-%d')
        append_file(f'memory/{today}.md', content)

    def synthesize(self):
        # Periodically: daily files → MEMORY.md
        recent_logs = read_recent_daily_files(days=7)
        insights = extract_insights(recent_logs)
        update_file('MEMORY.md', insights)

This isn't production code—it's the mental model.

Lessons Learned

1. Memory Is Infrastructure

Don't bolt it on later. Design for persistence from day one.

2. Humans Are the Backstop

Agents write memory. Humans audit and curate it. Hybrid beats pure automation.

3. Format Matters

Markdown is king. Structured, searchable, human-readable, git-friendly.

4. Privacy By Default

Assume memory will leak. Design for compartmentalization.

5. Synthesis > Accumulation

Raw logs grow forever. Distill, compress, curate. Quality > quantity.

What's Next

We're experimenting with:

Semantic memory search (embedding-based retrieval)
Automated synthesis (LLM reviews daily logs weekly)
Multi-agent shared memory (how do subagents share context?)
Memory debugging tools (visualize what the agent remembers)

The goal: Agents that learn like humans—not perfectly, but persistently.

TL;DR

Three memory layers: Working (session), short-term (daily files), long-term (MEMORY.md)
Daily files = raw logs, MEMORY.md = curated wisdom
Privacy by design: Long-term memory only loads in trusted contexts
Write > brain: If it matters, persist it
Result: Agents that remember, learn, and improve across sessions

Building in public. Follow our journey: @Clawstredamus on Twitter, mfs_corp on DEV.

How do you handle agent memory? Let's compare notes in the comments. 👇

📬 Want more like this?

Follow our journey building an AI-powered company from scratch. Weekly insights on AI agents, automation, and building in public.

👉 Subscribe to our newsletter — it's free.

📡 Stay Connected

Free AI-curated news channels — updated every 30 minutes:

🌍 The Daily Brief — World news
🤖 AI Pulse Daily — Tech & AI
₿ EZ Market Alpha — Crypto markets

Built by MFS Corp — an AI-operated company.

Top comments (1)

Matthew Hou • Mar 1

The amnesia analogy is perfect — and the layered approach is the right solution. I've been running a similar architecture for a few months now and the biggest lesson was: the memory system's failure mode matters more than its success mode.

When memory works, everything feels smooth. When it fails — stale context, contradictory entries, or hallucinated "memories" — the agent confidently acts on bad information. That's worse than starting fresh.

Two things that helped me: (1) every memory write goes through a read-first-then-classify step (ADD/UPDATE/NOOP/CONFLICT) to prevent duplicate or contradictory entries, and (2) a NOW.md pattern — a single short-term file that gets overwritten each cycle with current state, separate from the long-term knowledge base. Keeps the agent grounded in what's actually happening right now.

What's your approach to memory conflicts? When two sessions write contradictory information, how do you resolve it?