Ted Murray

Posted on Mar 27

How to Give Claude Code a Memory

#ai #claude #homelab #tutorial

I wrote about why I built a memory system for Claude. The short version: Claude's built-in memory extracts facts automatically with no audit trail, no version control, and no way to scope what different agents see. I wanted control.

This post is the practical companion. If you want to build your own, here's how mine works, what each piece does, and the order I'd set it up in if I were starting today.

Everything here is open source. The full stack is documented in homelab-agent.

What You're Building

A memory system with three properties:

Persistent — context survives across sessions. Monday's agent knows what Friday's agent decided.
Searchable — agents find relevant context automatically, not by loading everything into the prompt.
Scoped — different agents see different things. Your infrastructure agent doesn't need your code review history.

The system has three tiers of memory, two search tools, and an optional knowledge graph. You don't need all of it on day one.

The Minimum Viable Memory

If you set up nothing else, do this. It takes ten minutes and gets you 70% of the value.

CLAUDE.md files

Claude Code reads CLAUDE.md files automatically. One in your home directory for global context. One in each project directory for project-specific context. This is your foundation.

Your global ~/.claude/CLAUDE.md should contain:

Who you are and how you work (role, preferences, communication style)
Your infrastructure overview (hosts, IPs, key services)
Rules that apply everywhere (don't push to main, don't SSH without asking)

Your project CLAUDE.md files should contain:

What this project is and what the agent's scope covers
Project-specific conventions and constraints
Pointers to where relevant documentation lives

This isn't memory in the dynamic sense — it's stable configuration. But it's the single highest-impact thing you can do. Every session starts with this context loaded automatically.

Directory-based working memory

Create a memory directory:

~/.claude/memory/
├── shared/           # Cross-agent knowledge
└── agents/
    ├── dev/          # Dev agent's notes
    ├── research/     # Research agent's notes
    └── ops/          # Ops agent's notes

Tell your agents (via CLAUDE.md) to write notes here during sessions. Use a simple frontmatter format:

---
tier: working
created: 2026-03-15
source: dev
expires: 2026-06-13
tags: [docker, decision]
---

The expiry date matters. Working memory should have a 90-day TTL. If a note is still relevant after 90 days, it should be promoted to permanent storage. If not, it was temporary context that served its purpose.

The shared/ directory is for cross-agent knowledge — decisions that affect multiple projects. The agents/ subdirectories are scoped — each agent reads its own directory plus shared.

This is just markdown files in directories. No database, no service, no dependencies. It works immediately and it's human-readable, git-trackable, and greppable.

Adding Search: memsearch

Directory-based memory has a problem: agents have to know what file to read. Once you have more than a dozen notes, you need search.

memsearch is a Claude Code plugin that indexes markdown files using local embeddings and auto-injects relevant context at session start. No API calls. No external service. It runs locally using sentence-transformers.

What memsearch does:

Indexes your memory directories into a local vector store
At session start, searches the index for context relevant to the conversation
Auto-injects matching notes into the context window
Captures session summaries automatically via a Stop hook

The session capture is important. When a Claude Code session ends, memsearch writes a summary to its own memory store. Next time you start a session in that project, relevant past sessions surface automatically. You don't have to write anything — it happens.

Install it as a Claude Code plugin, point it at your memory directories, and you get semantic search over your notes with zero ongoing effort.

What memsearch doesn't do

memsearch is great for automatic recall — "surface relevant context without being asked." It's not great for intentional search — "find me the note where I decided to use Traefik instead of Caddy." For that, you want a proper search tool.

Adding Intentional Search: qmd

qmd is a hybrid search tool that combines BM25 keyword matching with vector embeddings and LLM reranking. It serves results via MCP, so any agent can search.

Why both memsearch and qmd?

memsearch = automatic recall. Surfaces relevant context at session start without being asked. Good for "remind me of things I should know."
qmd = intentional search. Agent explicitly queries when it needs specific information. Good for "find the decision about X" or "what does the architecture doc say about Y."

qmd indexes multiple collections — memory notes, infrastructure docs, compose files, whatever you point it at. The hybrid approach (keywords + semantics + reranking) outperforms pure vector search on technical documentation where exact terms matter.

If you have GPU acceleration available, enable it. Embedding time dropped from 3+ minutes to under a minute on my setup using Vulkan on a AMD Radeon 780M iGPU.

The Three-Tier Pipeline

Once you have working memory and search, you'll hit a new problem: memory accumulates. Session notes pile up. Working notes expire but some of them contain decisions you'll want forever.

The three-tier pipeline solves this:

Session tier — Raw, auto-captured. memsearch writes these. 30-day retention. No curation needed. This is your "what happened recently" layer.

Working tier — Agent-curated. Agents write structured notes with frontmatter during sessions. 90-day expiry. This is your "active decisions and context" layer.

Distilled tier — Permanent, git-backed. Notes that pass the "would this matter in 3 months?" test get promoted here. This is your "settled knowledge" layer. Version-controlled so you have full history.

The promotion path is always upward: session notes get reviewed and important items become working notes. Working notes older than 14 days get evaluated for distillation. Distilled notes are permanent.

Automating the pipeline

I run a headless Claude Code agent at 4 AM that handles the promotion pipeline automatically. It:

Scans session notes from the past week across all project stores
Promotes durable items to working tier
Reviews working notes older than 14 days
Promotes qualifying notes to the distilled tier (git-backed)
Expires stale working notes past 90 days
Deduplicates (merges topical duplicates)
Logs metrics and generates a health report

You don't need this on day one. Start with manual curation — read your working notes occasionally, promote the important ones, delete the stale ones. Automate when the volume makes manual curation a burden.

Core Context: The Sticky Note

There's a fourth layer that sits outside the pipeline: core context.

This is a small file (I cap mine at 40 lines) that gets injected at every session start via a SessionStart hook, before any tools run. It contains:

User profile (role, key skills, cognitive style)
Active projects and their current status
Key constraints (things every agent must know)
Recent decisions (the last few important choices made across any project)

The 40-line cap is deliberate. This file sits above the context window's compression threshold — it never gets summarized away, no matter how long the session runs. If it's too big, it crowds out working memory. Keep it tight.

The distinction from CLAUDE.md: CLAUDE.md is stable configuration that changes rarely. Core context is dynamic — it reflects what's happening now. Active projects change. Recent decisions rotate. The core context file gets updated by a skill whenever something important shifts.

Private Web Search: SearXNG-MCP

This isn't memory in the traditional sense, but it feeds the memory system. When your agents can search the web privately, the results become part of the knowledge base.

SearXNG is a self-hosted meta-search engine. It queries multiple search backends (Google, Bing, DuckDuckGo, and dozens more) without sending your queries to any single provider. No API keys, no per-search costs, no tracking.

I built searxng-mcp to expose SearXNG as an MCP server with three tools:

search — query SearXNG, get structured results with titles, URLs, snippets, and source engines
search_and_fetch — search + fetch full text of the top result
fetch_url — fetch and extract readable text from any URL

Results are reranked by a local ML model before being returned. Full page content is fetched via Firecrawl (handles JavaScript-rendered pages). GitHub URLs are handled natively via the GitHub API.

Why does this matter for memory? Because when your research agent searches the web, evaluates options, and writes a recommendation to working memory, that recommendation is grounded in current information — not model training data. The search tool feeds the memory system with sourced, dated, real-world information.

Optional: Knowledge Graph

Everything above uses flat files and search indexes. For most setups, that's enough. But there's a category of question that text search can't answer well: relationship queries.

"What services depend on port 8080?" "What changed about SWAG config this week?" "What connects to the message bus?" These are graph queries — the answer is about relationships between entities, not about retrieving a document.

I use Graphiti with Neo4j for this. Graphiti is a temporal knowledge graph — facts have validity windows, so when something changes, the old fact gets superseded rather than polluting results.

The knowledge graph is fed automatically by the same pipeline that handles memory sync. When the nightly agent processes session notes, it also ingests relevant facts into the graph. Infrastructure state changes (deploys, service adds/removes, network changes) get added directly.

This is genuinely optional. If your queries are mostly "find relevant context" (text search handles this) rather than "what's related to what" (graph handles this), you don't need it. I added it three weeks into building the memory system, not on day one.

Setup Order

If I were starting from scratch today, I'd build in this order:

Week 1: Foundation

Write your global ~/.claude/CLAUDE.md — who you are, your infrastructure, your rules
Write project CLAUDE.md files for each project directory
Create the memory directory structure (~/.claude/memory/shared/, ~/.claude/memory/agents/)
Define your frontmatter format (tier, created, source, expires, tags)
Tell your agents (via CLAUDE.md) to write notes during sessions

Week 2: Search

Install memsearch — automatic context recall and session capture
Deploy qmd — intentional search over memory + docs
Index your memory directories and any infrastructure documentation

Week 3: Pipeline

Start manually reviewing working notes — promote the important ones, delete the stale ones
Write the core context file and inject it via a SessionStart hook
When manual curation becomes a burden, automate with a nightly sync agent

Week 4+: Extensions

Deploy SearXNG + searxng-mcp for private web search
Add the knowledge graph if you're hitting relationship query limits
Build skills (reusable instruction sets) for common memory operations

Don't try to build it all at once. Each layer should earn its place by solving a friction you actually feel.

What It Feels Like

The before state: every session starts cold. You re-explain your setup. You re-state your preferences. You forget what you decided last week because the conversation where you decided it is gone.

The after state: you sit down on Monday morning and the agent already knows about the Docker change you made Friday, the monitoring alert from Saturday, and the research you did Sunday. It knows because the memory pipeline captured those events, the semantic search surfaced them as relevant, and the knowledge graph connected them to the services they affected.

The system isn't perfect. Memory sync sometimes promotes things that don't matter. Search sometimes misses things that do. The knowledge graph needs entity resolution tuning. But the baseline — persistent, searchable, scoped context that accumulates and connects without manual curation — changes how you work with AI agents.

It stops being a tool you instruct and starts being a collaborator that remembers.

The Repository

Everything described here is open source and documented in detail:

homelab-agent on GitHub — the full stack with component docs for memsearch, memory-sync, qmd, Graphiti, and more.

The component docs are thorough (2000+ lines each for the major pieces). The index.md at the root is designed to be handed directly to Claude — point it at the file and tell it to help you map a path through the docs based on your setup.

If you build your own version of this, it will look different from mine. Your infrastructure is different, your workflow is different, your agents handle different domains. That's the point. The architecture transfers. The implementation is yours.

Previous: I Built an AI Memory System Because My Brain Needed It First

Next in series: I Manage a Team of AI Agents. I Had to Build My Own Management Tools.