Ted Murray

Posted on Mar 19

I Built an AI Memory System Because My Brain Needed It First

#ai #claude #infrastructure #homelab

On February 4th, 2026 — the day after my first Claude subscription — I was doing what I'd been doing since day one: asking Claude questions to see how it responded to things. Testing the edges. Seeing what it would say.

I noticed a memory feature in the settings. I asked about it honestly, the way I ask about most things I don't know enough to have an opinion on yet:

"If I enable the built-in memory feature, is it possible for you to make incorrect assumptions in future chats?"

Claude said yes. It explained how automatic memory extraction could misinterpret context, overgeneralize, fill in gaps incorrectly, and pollute conversations across domains. It was honest about its own limitations.

I didn't enable the memory feature.

Three days later, I built something better.

Why I Needed This

I have ADHD. Likely ASD too — the Asperger's pattern. I've managed it my whole career as a Windows sysadmin, but it shapes how I work in ways I'm very deliberate about.

One of those ways: I cannot hold all project context in biological memory simultaneously. This isn't a weakness I've overcome. It's a constraint I've designed around for 15 years. External memory systems aren't a nice-to-have for me. They're how I function.

Screenshots as memory aids. Meticulous notes. Status docs I update before leaving work so I can reconstruct context on Monday. Elaborate folder structures. Anything to offload cognitive burden.

When I started using Claude, I immediately hit a familiar friction. Every new conversation started from zero. I'd re-explain my server setup, my preferences, what we'd decided yesterday. That was annoying enough. But what actually bothered me more was something harder to name: the exploratory conversations — where I was just asking questions to see where they'd go, following threads, thinking out loud — those disappeared completely when the session ended. Not just the technical context. The texture of the conversation itself. Gone.

I got genuinely frustrated when Claude forgot something mid-session and made assumptions. That happened a lot early on.

I didn't frame it as "the statefulness problem in agentic AI systems." I thought: Claude forgets things. So do I. Let's build something so neither of us has to remember.

The Ship of Theseus Problem

Here's what I was watching happen in real time.

Long AI conversations have a finite context window — a working memory. As a conversation grows, early details get compressed into summaries. Summaries get summarized. Nuance fades. Eventually you're talking to a version of Claude that only vaguely remembers how the session started.

It's the Ship of Theseus paradox: if you replace every plank on a ship over time, is it still the same ship? In a long AI conversation, you're watching conversational entropy happen in real time.

Claude's built-in memory feature doesn't solve this — it just automates the extraction process while introducing its own failure modes. Invisible assumptions. Cross-conversation pollution. No audit trail. No version control. You don't control what gets remembered, you can't see what conclusions were drawn, and you can't roll back when it gets something wrong.

I wanted control. Version-controlled, explicit, auditable memory that I managed, not an AI extraction black box.

Day 7: What I Built

I created a GitHub repo called claude-prime-directive.

The idea was simple: a repository of context that Claude could reference. My infrastructure specs. My communication preferences. My cognitive style. My workflows. Version-controlled, always available, refreshable on demand.

I also documented myself in it. A cognitive-style.md file explaining how I think and what I need from a collaborator — ADHD patterns, working memory limitations, interest-driven focus, the cost of context switching. Not because I thought it would be technically interesting, but because I needed Claude to understand how I work.

The tier structure that emerged:

Tier 0 — GitHub repo (the prime directive): Stable, version-controlled, permanent. The foundation. What survives everything else. Updated rarely, and when it is, the git history shows exactly what changed and why.

Tier 1 — The main Claude chat: Strategic, evolving, allowed to age. Principles matter more than details here. Like episodic memory — you don't remember every meal, but you remember the principles of good nutrition.

Tier 2 — Project chats: Tactical, stable, domain-specific. Docker work doesn't bleed into PowerShell work. Each chat loads only its relevant context. Technical details have to persist precisely.

Tier 3 — Code: Canonical. The actual implementation is the truth, not a description of it. Version controlled, searchable, referenceable.

I wasn't reading about agent architectures. I was solving the same problem I'd solved for my own brain. The design patterns that work for ADHD working memory turn out to work for AI context management too:

Externalize and write it down → git-backed repos, structured notes
Summarize frequently → nightly distillation pipeline
Keep stable things stable → permanent distilled knowledge, version controlled
Make context easy to reload → semantic search, always-visible core context
Separate concerns → scoped agents, different memory per domain

I noticed what I'd built after building it.

The Homelab Analogy I Was Already Using

The tier structure wasn't abstract to me. I'd already built it in metal.

My homelab runs Unraid as the hot storage layer — fast, always-on, where active data lives. TrueNAS handles cold storage — backup, archive, slower but reliable. Right data, right place, right persistence.

The memory system is the same architecture:

Hot layer (session notes): Captured automatically, fast to write, short retention
Warm layer (working memory): Curated decisions and findings, medium-term
Cold layer (distilled knowledge): Permanent, git-backed, slow to update but always there

I wasn't inventing a new architecture. I was applying one I already understood to a new domain.

What Grew From There: Stigmergic Design

None of the subsequent pieces were planned. They emerged from using the previous piece.

February, week 2: Added the Memory MCP to Claude Desktop. Now Claude could write notes and retrieve them across conversations. Basic, but it closed the loop. The prime directive repo was now readable and writable.

Late February: Built out infrastructure docs in the repo — server specs, network topology, service inventory. Claude could answer questions about my homelab without being told first. That felt like the right direction.

Early March: Discovered memsearch and qmd — semantic search tools built for exactly this problem. The Memory MCP was a workaround; these were purpose-built. Started building the full stack.

March 8: The memory-sync agent came online. A headless Claude Code session running at 4 AM, reviewing session notes from the past week, applying a "would this matter in 3 months?" filter, and committing qualifying entries to permanent storage. The tier structure I'd sketched on day seven was now automated.

March 12: Switched fully to Claude Code. Created CLAUDE.md files for all project agents. The prime directive repo, which had been a Claude Desktop workaround, became the distilled knowledge tier it was always meant to be — now feeding multiple scoped agents instead of a single conversation.

March 13: Formalized the three working tiers — session, working, distilled — and rewrote the sync pipeline as a proper multi-step process with idempotency, conflict detection, and a health report.

March 14: Added the core context tier — a 40-line always-visible block injected at every session start. The thing that cannot scroll out of context no matter how long the session runs.

Also around this time, I hit a curation problem. Early on, I had a "Librarian" — a behavior defined directly in my project instructions that I'd manually invoke to keep the prime directive repo updated and the index current. It wasn't a proper skill; it was just a named role I could call on. That worked when I was building slowly, when there was time to pause and say okay, record what we just did.

Then CloudCLI arrived and my build velocity accelerated. I was deploying components faster than I was curating them. The Librarian couldn't keep up because I wasn't invoking it fast enough. The system wasn't failing to record things — I was failing to trigger the recording.

The fix was obvious in retrospect: stop relying on manual invocations. I automated it. The doc-health agent runs weekly (full scan, Claude Opus) and nightly (delta scan, Claude Sonnet), checking for drift between docs and reality, auto-updating index entries, and surfacing coverage gaps. A daily-touched-files.json tracker records what agents modify during a session; when a writer pass runs, it targets exactly those components. The system now curates itself.

This is the stigmergic design pattern in its purest form: I didn't plan for automation. I built a manual process, increased the pace until the manual process broke, and then automated the broken part.

March 17: Added a temporal knowledge graph (Neo4j + Graphiti MCP). Now the system doesn't just store facts — it stores relationships between entities and tracks how they change over time. "What connects to SWAG?" is a graph query, not a text search.

The pattern across all of this: use the system → hit friction → build the next piece. Each layer emerged from using the previous one. I didn't design this top-down. I noticed what was missing and filled the gap.

What the System Looks Like Now

Four tiers, three of which are fully automated:

Session tier — memsearch captures every conversation automatically. Semantic search makes past sessions findable. 30-day retention. No effort required.

Working tier — agents promote important decisions to structured markdown files with YAML frontmatter: creation date, 90-day expiry, tags, tier. Shared across agents where relevant, scoped to one domain where not.

Distilled tier — the nightly 4 AM pipeline. A headless Claude Code agent reviews working memory notes, applies the "would this matter in 3 months?" filter, commits qualifying entries to a git-backed repo. These are permanent. This is what the prime directive repo became.

Core context — a 40-line cap, always injected, never compresses away. User profile, active projects, key constraints, recent decisions. The sticky note on the monitor.

Plus the knowledge graph for relationships and topology — infrastructure that's hard to encode in flat files.

What I Found Out Later

Around the same time the memory-sync agent came online, I built a private web search stack — SearXNG as the search engine, firecrawl-simple for full-page extraction, and a local reranker to surface the most relevant results. The point was to give Claude real search capability without relying on external APIs or sending queries to Google.

Once that was running, I started asking Claude to find comparable projects — things other people had built that solved similar problems. That's how I found Letta, Mem0, and eventually the ICLR 2026 MemAgents workshop.

The infrastructure I built to make Claude more capable also made Claude better at finding the research that validated the approach. The tools gave back.

The ICLR 2026 MemAgents workshop was organized around this question: "What are the principled memory substrates for agentic systems?"

Researchers submitted papers. Panels were organized. Frameworks were compared.

Letta (formerly MemGPT) designed a system with core memory, archival memory, and recall memory — different persistence for different purposes. Mem0 built a bolt-on memory layer for any agent framework.

The architectures are similar to mine. The key difference: they designed theirs. I built mine by accident, from necessity, starting from "I asked Claude if its own memory could make mistakes and it said yes."

I don't think I solved this better than the research community. I think I solved it independently, driven by a cognitive pattern I've been compensating for my whole life, arriving at similar answers because the problem is the same. Working memory is limited. Context matters. Some things need to persist forever; others can fade.

The way you manage that for a human brain is, it turns out, a reasonable way to manage it for an AI agent.

The Thing That Stays With Me

I went back and read the original tier-definitions doc — written on day seven — while writing this article.

The structure it describes is essentially what I'm running today. The tools changed completely. The scale is different. But the principle — stable knowledge in one place, working knowledge in another, automated promotion between them, accept entropy where it doesn't matter and fight it where it does — that's in the day-seven file.

I didn't study this. I didn't read the research first. I just applied the same patterns I use to function at work to a new problem, and they worked.

At the bottom of that original file, I wrote a note:

"This architecture emerged organically from first-week AI usage and accommodates both technological constraints (context windows) and human limitations (working memory, context switching)."

I wrote that in week one. It's still true.

The Repository

The full memory system is part of homelab-agent, open source and documented as a reference architecture. The index.md is designed to be handed directly to Claude — point it at the file and ask for help mapping a path through the docs based on your setup.

homelab-agent on GitHub →

Previous: I Built an Agentic Infrastructure Platform in 42 Days. I'm a Windows Sysadmin.

Next: The permission problem — why an AI agent with filesystem write access needs a two-party enforcement model, and what 15 years of Active Directory taught me about building it.

DEV Community