thestack_ai

Posted on Mar 26

I Gave My AI Agent Memory Across Sessions. Here's the Schema.

#ai #programming #opensource #productivity

My AI coding agent now remembers decisions I made three weeks ago and adjusts its behavior accordingly. It tracks 187 entities, 128 relationships, and distills thousands of raw memories into actionable context — all from a single SQLite file and a lightweight knowledge graph. No vector database. No external service. $0/month.

Here's the full setup.

TL;DR: I built a 4-tier memory system (episodic, semantic, project, procedural) backed by SQLite and a markdown-based knowledge graph for my Claude Code agent. It holds 187 entities and 128 relationships, runs a distillation pipeline that compresses ~6,300 raw memories into compact context, and fits the entire active working set into a single LLM prompt. Total infrastructure cost: one file on disk.

The Problem: Agents That Forget Everything

I run Claude Code as my primary development environment. Dozens of sessions per week. Every single session started the same way — me re-explaining context the agent already had yesterday.

"No, we decided to use Sonnet for research tasks, not Opus."

"The color scheme is blue accent for Instagram, purple for the app. We went over this."

"The legal review team uses dispatch, not local subagents."

I was spending 10-15 minutes per session on context restoration. Multiply that by 30+ sessions a week and you're looking at 5-8 hours per month just teaching your agent things it already knew.

The core insight that drove the solution: "Your agent can think. It can't remember." I decided to fix that.

Architecture: 4 Tiers, Not 1 Blob

The first mistake everyone makes is treating memory as a single bucket. I tried that. It doesn't scale. You end up with a mess of session logs, preferences, and stale project facts all competing for context window space.

The fix: separate memories by how they're used, not when they were created.

Tier 1: Episodic → What happened (session logs, events, timestamps)
Tier 2: Semantic → What I know (facts, relationships, entities)
Tier 3: Project → What we're building (goals, decisions, status)
Tier 4: Procedural → How to do things (workflows, preferences, rules)

Each tier has different retention policies, different compression strategies, and different retrieval patterns. Episodic memories decay. Procedural memories are nearly permanent. This distinction matters more than any embedding model you'll pick.

The SQLite Schema

I chose SQLite over Postgres, Redis, or any vector store for one reason: it ships with the agent. No connection strings. No Docker containers. No infrastructure to maintain. One .db file that follows the project.

CREATE TABLE memories (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    tier TEXT NOT NULL CHECK(tier IN ('episodic','semantic','project','procedural')),
    source TEXT NOT NULL,
    content TEXT NOT NULL,
    metadata JSON,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    expires_at DATETIME,
    distilled BOOLEAN DEFAULT 0,
    confidence REAL DEFAULT 1.0,
    access_count INTEGER DEFAULT 0
);

CREATE INDEX idx_memories_tier ON memories(tier);
CREATE INDEX idx_memories_source ON memories(source);
CREATE INDEX idx_memories_distilled ON memories(distilled);
CREATE INDEX idx_memories_expires ON memories(expires_at);

The `distilled` flag is critical. Raw memories are verbose — full session transcripts, lengthy decision discussions. Distilled memories are compressed, verified, and ready for prompt injection. More on the distillation pipeline below.

The `confidence` field tracks how verified a memory is. A memory extracted from a conversation starts at 0.6. After cross-referencing with code or git history, it gets bumped to 0.9+. **Never trust unverified memories at face value.**

sql
CREATE TABLE entities (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
type TEXT NOT NULL,
properties JSON,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE relationships (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_entity INTEGER REFERENCES entities(id),
target_entity INTEGER REFERENCES entities(id),
relation_type TEXT NOT NULL,
properties JSON,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_rel_source ON relationships(source_entity);
CREATE INDEX idx_rel_target ON relationships(target_entity);
CREATE INDEX idx_entities_type ON entities(type);

That's the entire knowledge graph. Two tables. No graph database. For my scale (sub-1000 entities), SQLite handles traversal queries in under 5ms.

The Knowledge Graph: Entities and Relationships

The knowledge graph stores structured facts about the project ecosystem — who owns what, what depends on what, and why decisions were made. Here's what my actual entity distribution looks like after 3 months of daily use:

Entity Type	Count	Examples
decision	96	model routing rules, architecture choices
project	65	repos, services, features in progress
person	10	team members, stakeholders
tool	8	frameworks, services, APIs in use
organization	6	companies, teams, departments
concept	2	recurring design patterns
Total	187	128 relationships between them

The relationship types that proved most useful:

works_on → person-to-project mapping
depends_on → project dependency chains
decided_by → links decisions to their rationale
created → authorship tracking
uses → tool-to-project associations

The single most valuable relationship type is decided_by. When the agent can look up why a decision was made — not just what was decided — it stops re-proposing rejected approaches. This alone saved the most re-explanation time.

The Distillation Pipeline

Raw memory accumulation is easy. The hard part is keeping it useful. My system had 6,327 unverified memories after two months. Without distillation, the context window would be 90% noise.

One week of session logs typically compresses to 15-20 semantic memories — roughly a 50:1 ratio before the agent ever sees the data.

The pipeline runs in three stages:

def distill_memories(db_path: str, batch_size: int = 50):
    conn = sqlite3.connect(db_path)

    # Stage 1: Deduplicate
    # similarity() is a custom trigram UDF registered at runtime
    dupes = conn.execute("""
        SELECT m1.id, m2.id 
        FROM memories m1 
        JOIN memories m2 ON m1.source = m2.source 
            AND m1.id < m2.id
            AND m1.tier = m2.tier
        WHERE similarity(m1.content, m2.content) > 0.85
    """).fetchall()
    # Keep the newer one, mark older as distilled

    # Stage 2: Compress episodic → semantic
    raw_episodes = conn.execute("""
        SELECT id, content, metadata 
        FROM memories 
        WHERE tier = 'episodic' 
            AND distilled = 0
            AND created_at < datetime('now', '-7 days')
        ORDER BY created_at DESC
        LIMIT ?
    """, (batch_size,)).fetchall()

    for episode in raw_episodes:
        # Extract facts, decisions, preferences
        # Create or update semantic memories
        # Mark episodic as distilled
        pass

    # Stage 3: Verify against ground truth
    unverified = conn.execute("""
        SELECT id, content, tier 
        FROM memories 
        WHERE confidence < 0.8 AND distilled = 0
    """).fetchall()

    for memory in unverified:
        # Cross-reference with git log, file system, configs
        # Bump confidence or mark for deletion
        pass

The similarity check in Stage 1 uses trigram comparison — not vector embeddings. For deduplication, you don't need semantic similarity; you need near-duplicate detection. **Trigrams at 0.85 threshold caught 115 duplicates from a single source in my first run.**

Stage 2 is where the LLM earns its keep. I feed batches of raw episodic memories to a cheap model (Claude Haiku) and ask it to extract structured facts.

Stage 3 is non-negotiable. I had memories that claimed certain functions existed — functions that had been renamed two weeks prior. **An unverified memory is worse than no memory because the agent will state it as fact.**

## Loading Context at Session Start

The session startup hook assembles the active context from all four tiers and injects it before the first token of each session:

bash

!/bin/bash

session-start.sh — runs before every Claude Code session

DB="$HOME/.claude/jarvis/data/jarvis.db"

1. User preferences and procedural rules (always loaded)

sqlite3 "$DB" "SELECT content FROM memories
WHERE tier = 'procedural' AND distilled = 1
ORDER BY access_count DESC LIMIT 20"

2. Active project context (filtered by cwd)

PROJECT=$(basename "$PWD")
sqlite3 "$DB" "SELECT content FROM memories
WHERE tier = 'project' AND distilled = 1
AND json_extract(metadata, '$.project') = '$PROJECT'
ORDER BY updated_at DESC LIMIT 15"

3. Recent decisions (last 7 days)

sqlite3 "$DB" "SELECT content FROM memories
WHERE tier = 'semantic' AND distilled = 1
AND json_extract(metadata, '$.type') = 'decision'
AND updated_at > datetime('now', '-7 days')
ORDER BY updated_at DESC LIMIT 10"

4. Knowledge graph summary

sqlite3 "$DB" "SELECT
(SELECT COUNT() FROM entities) || ' entities, ' ||
(SELECT COUNT() FROM relationships) || ' relationships'"

The entire loaded context fits in roughly 3,000-4,000 tokens. That's the key metric. If your memory system needs 20K tokens of context, you've failed at compression. The agent needs room to think.

I also track a drift_score — a measure of how much the agent's behavior diverges from accumulated feedback. If drift exceeds 30/100, I trigger a feedback review cycle. This is crude but it catches regression.

Health Monitoring

Memory systems rot silently. The warning sign is when unverified memories outnumber distilled ones — that ratio is the canary. I added three health checks that run weekly:


sql
-- Unverified count (should stay under 1,000)
SELECT COUNT(*) as unverified 
FROM memories WHERE distilled = 0;

-- Stale memories (not accessed in 30 days)
SELECT COUNT(*) as stale 
FROM memories 
WHERE access_count = 0 
AND created_at < datetime('now', '-30 days');

-- Duplicate density (should stay under 5%)
SELECT CAST(dupe_count AS REAL) / total * 100 as dupe_pct
FROM (
    SELECT COUNT(*) as total FROM memories
), (
    SELECT COUNT(*) as dupe_count FROM memories 
    WHERE source IN (
        SELECT source FROM memories 
        GROUP BY source HAVING COUNT(*) > 50
    )
);

When unverified memories hit 6,327, I knew the distillation pipeline was falling behind. That number should be under 1,000 at steady state. **If you're accumulating faster than you're distilling, your memory system is a log file with extra steps.**

## What I'd Do Differently

**1. Start with the distillation pipeline, not the accumulation.** I built the write path first and the compression path second. Two months of unchecked growth created a cleanup project. Build both simultaneously.

**2. Use stricter entity deduplication from day one.** I ended up with "Claude Code", "claude-code", and "CC" as three separate entities referring to the same thing. A normalization layer at write time would have prevented this.

**3. Track memory provenance more carefully.** Some memories came from conversation, some from git history, some from file analysis. When a memory is wrong, you need to trace it back to its source to fix the extraction logic — not just delete the memory.

## The Numbers

| Metric | Before Memory System | After |
|--------|---------------------|-------|
| Context restoration time | 10-15 min/session | ~0 (automated) |
| Repeated corrections/week | 8-12 | 1-2 |
| Monthly time saved | — | 5-8 hours |
| Infrastructure cost | — | $0 |
| Storage (SQLite file) | — | 2.4 MB |

| System Metric | Value |
|---------------|-------|
| Total entities | 187 |
| Total relationships | 128 |
| Distilled memories | ~1,200 |
| Raw (pending distillation) | ~6,300 |
| Context tokens per session | 3,000-4,000 |
| Query latency (SQLite) | < 5ms |
| Distillation ratio | ~50:1 (raw:distilled) |

## FAQ

**Q: Why SQLite instead of a vector database?**  
A: For sub-1,000 entities and structured queries, SQLite is faster to set up, has zero operational overhead, and the entire database ships as a single file alongside your agent configuration. Vector search adds value when you need semantic retrieval over thousands of unstructured documents. My memories are structured and categorized — SQL queries handle them fine.

**Q: How do you prevent the agent from hallucinating based on stale memories?**  
A: Three mechanisms. First, every memory has a `confidence` score — unverified memories are labeled as such in the prompt. Second, the session startup hook only loads distilled memories, which have been cross-referenced against code and git history. Third, procedural rules explicitly instruct the agent to verify memories against current file state before acting on them.

**Q: Does this work with agents other than Claude Code?**  
A: The SQLite schema and distillation pipeline are agent-agnostic. The session startup hook is specific to Claude Code's hook system, but any agent framework that supports pre-session context injection (Cursor, Cline, Aider) can use the same approach with a different loader script.

**Q: How much maintenance does this require?**  
A: About 15 minutes per week. I run the distillation pipeline manually (planning to automate via cron), review the health metrics, and occasionally merge duplicate entities. The biggest maintenance task is updating project-tier memories when priorities shift — but that's 2-3 edits, not a rebuild.

**Q: Won't the knowledge graph grow unmanageably large?**  
A: Not at individual developer scale. After 3 months of daily use across multiple projects, I'm at 187 entities. **The growth is logarithmic** — most new sessions reference existing entities rather than creating new ones. I'd estimate steady state at 300-500 entities for a solo developer working on 3-5 active projects.

## Try It Yourself

1. Create the SQLite database with the schema above: `sqlite3 ~/.agent-memory/memory.db < schema.sql`
2. Add session logging to your agent's hook system — capture decisions, corrections, and preferences as raw episodic memories
3. Write the distillation script and run it weekly to compress episodic memories into semantic facts
4. Build the session startup hook to inject the top 40-50 distilled memories as context
5. After two weeks, check your health metrics — if unverified count is growing faster than distilled count, increase distillation frequency

## What's Next

If you're building agents that persist across sessions — coding assistants, DevOps bots, personal AI tools — memory isn't optional. It's the difference between a tool you configure once and a tool you configure every day.

The entire system is ~400 lines of Python and SQL. No ML pipeline. No vector store. No monthly bill. Just structured data and disciplined compression.

**What's your approach to agent memory?** Drop your setup in the comments — I'm particularly curious if anyone has solved the entity deduplication problem more elegantly than my trigram hack.

If this was useful, bookmark it for when you hit the "why does my agent keep forgetting" wall. And follow for more posts about the operational reality of running AI-powered dev tools.

---

*I build AI-powered developer tools and run a 4-tier memory system across 30+ coding sessions per week. Currently shipping automation infrastructure at CyBarrier.*

DEV Community