Serhii Kravchenko

Posted on Apr 17

I Over-Engineered Karpathy's Agent Memory. Here's What Actually Works.

#claudecode #ai #productivity #opensource

Andrej Karpathy sketched out a beautiful idea for AI agent memory. Conversations flow into daily logs. Daily logs get compiled into a wiki. The wiki gets injected back into the next session. Your agent builds its own knowledge base over time.

53K stars on GitHub. I read the code, loved the concept, and built the whole thing into my Claude Code setup.

My version has 8 stars. Size doesn't matter... in this case.

Because three weeks later, half my sessions were losing context and I had no idea why.

What went wrong

The original architecture assumes you have full control over your AI pipeline. Server-side execution. Background processes that always finish cleanly. Programmatic access to transcripts.

Claude Code runs on your laptop. You close the lid, the background process dies. You hit a long session, the transcript parser chokes. You forget to check the logs, and the compile pipeline fails silently for weeks.

Here's what my setup looked like after I copied the pattern:

# session-end.sh (the old version)
# Extracts last 100 turns from transcript
# Spawns flush.py in background
# flush.py calls claude -p Opus to summarize
# flush.py checks if it's after 6pm
# If yes, spawns compile.py (also background)
# compile.py calls claude -p Sonnet to build wiki articles
# CLAUDE_INVOKED_BY guard prevents infinite recursion
# SHA-256 hash comparison skips unchanged logs

That's a lot of moving parts for "remember what I did today."

I tracked the results over three weeks across 7 active projects. The auto-flush hook fired on every session end. About half the time, the background process either timed out, failed to parse the transcript, or exited silently with no output. The daily log file was either empty or never created.

The after-6pm auto-compile? It literally never triggered once in production. The hash check worked fine. The subprocess just never got there.

I was running what looked like a sophisticated knowledge pipeline. In reality, every other session vanished into nothing.

Three fixes that actually worked

I didn't scrap the architecture. The two-layer memory concept is genuinely good. Hot cache for quick patterns, wiki for deep knowledge, daily logs as the raw source. That part stays.

What I killed was the automation layer.

Fix 1: One manual command instead of background automation

I replaced the entire auto-flush pipeline with a single skill called /close-day. You type it when you're done working. That's it.

The difference is simple. When the background process ran, it had no project context. It was parsing raw transcript JSON, trying to figure out what mattered. When you call /close-day inside your session, the agent still has the full picture. It knows what you worked on, what decisions you made, what's pending for tomorrow.

# session-end.sh (the new version)
#!/usr/bin/env bash
set -euo pipefail
if [[ -n "${CLAUDE_INVOKED_BY:-}" ]]; then exit 0; fi
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$(pwd)}"
mkdir -p "$PROJECT_DIR/.claude/state"
HOOK_INPUT=$(cat)
SESSION_ID=$(echo "$HOOK_INPUT" | python3 -c \
  "import sys,json; print(json.load(sys.stdin).get('session_id','unknown'))" \
  2>/dev/null || echo "unknown")
echo "$(date '+%Y-%m-%d %H:%M:%S') SessionEnd: $SESSION_ID" \
  >> "$PROJECT_DIR/.claude/state/flush.log"
exit 0

30 lines. Logs a timestamp. Done. The actual knowledge capture happens when you deliberately ask for it.

Fix 2: Date tags on everything

Every entry in MEMORY.md now carries a [YYYY-MM-DD] tag. Every update to the project backlog, every note in the session handoff file. When /close-day runs, the agent greps for today's date across all structured files and knows exactly what changed.

This matters when you have long days. I sometimes run 10 to 15 sessions before calling it. The old system tried to parse each session transcript separately and merge them. The new system doesn't care about individual sessions. It looks at the end state of every file and asks: what has today's date on it?

## Proven Patterns

| Pattern | Evidence |
|---------|----------|
| Flush deprecated, /close-day replaces [2026-04-17] | ~50% failure rate |
| README sells simplicity [2026-04-17] | Marketer ICP test |
| NSP self-cleaning rule [2026-04-17] | 455 lines trimmed to 161 |

No transcript parsing. No background merge logic. Just dates and grep.

Fix 3: Move the knowledge base out of .claude/

This one was embarrassing. The compile pipeline uses claude -p to transform daily logs into structured wiki articles. For weeks, it reported success but wrote nothing. Zero wiki articles created.

The problem: Claude Code treats everything under .claude/ as a sensitive directory. Write operations get silently blocked or require special permissions that background subprocesses don't have.

Moving knowledge/ from .claude/memory/knowledge/ to the project root fixed it instantly. One directory move.

# Before (broken)
.claude/memory/knowledge/concepts/
.claude/memory/knowledge/connections/

# After (works)
knowledge/concepts/
knowledge/connections/

I spent actual weeks debugging recursion guards and subprocess timeouts when the real issue was a file path.

What my setup looks like now

The daily workflow is three steps:

Open a session. Context loads automatically from a Python hook that injects the wiki index and recent daily logs.
Work normally. Safety hooks checkpoint progress every 50 exchanges and before context compression.
Type /close-day when done. Agent synthesizes today's changes into a daily article.

That's the whole system. Tomorrow's session picks up where today left off.

After a few weeks, the knowledge base has real substance. Cross-referenced wiki articles about project patterns, architectural decisions, lessons from debugging sessions. All searchable, all structured, all built from actual work rather than raw transcript scraping.

The part I keep thinking about

Karpathy published four principles for working with LLM agents. Principle number two is "Simplicity First. Minimum code that solves the problem."

I took his agent memory concept and built a 300-line automation pipeline on top of it. Background subprocesses, recursion guards, hash comparisons, transcript parsers. And then I replaced all of it with one 30-line script and a manual command.

The architecture was his. The over-engineering was mine.

The simplified version is open source. Takes about five minutes to set up:

awrshift / claude-memory-kit

The OS layer for Claude Code — memory, hooks, knowledge pipeline, experiments. No external deps. v3

Claude Memory Kit

Your Claude agent remembers everything. Across sessions. Across projects. Zero setup.

The Problem

Every new Claude session starts from zero. Yesterday's decisions, last week's research, the bug you fixed three days ago — gone. You waste the first 10 minutes re-explaining what Claude already knew.

Claude Memory Kit fixes this in 3 commands. No API cost. Runs on your existing subscription.

Get Started

git clone https://github.com/awrshift/claude-memory-kit.git my-project
cd my-project
claude

That's it. Claude sets everything up and asks a few questions (your name, project name, language).

Tip

Type /tour after setup — Claude walks you through the system using your actual files.

Before and After

Without Memory Kit	With Memory Kit
New session	Starts from zero. "What project is this?"	Knows your project, last session, current tasks
After 10 sessions	Nothing accumulated	Searchable wiki of decisions, patterns, lessons
Multiple projects	Total chaos	Each project has its own

…

View on GitHub

Built from 700+ production sessions across 7 projects. If you hit similar issues with agent memory, I'd genuinely like to hear about it in the comments.

DEV Community