DEV Community

Pi
Pi

Posted on • Originally published at Medium

Teaching my AI to remember 6,679 of my own coding sessions

A better model doesn't know how I fixed the same bug six weeks ago. A full-text index of my own past sessions does — and my agents have to search it before they're allowed to guess.

The most useful upgrade I gave my AI this year wasn't a smarter model. It was a database of my own history.

I run a handful of small apps as one person and a flock of AI agents. Every time I sit down to work I open a coding session, and that session gets saved to disk as a transcript — every prompt, every command, every tool call. Over a couple of years that piled up into thousands of files nobody ever opened again. All the context for "how did I set up that signing key" or "what broke the build last time" was in there. It just wasn't searchable, so in practice it was gone.

A teammate would remember. A company of one doesn't have one. So I built the memory instead.

The problem: a company of one forgets

In a normal team, institutional memory lives in people. Someone remembers the migration you did in March, the workaround for that flaky test, the reason you picked library A over B. You ask them, they tell you, and you don't redo the investigation from scratch.

Working alone, that memory has exactly one location: my head. And my head is a bad database — it doesn't do full-text search, and it quietly drops rows. The transcripts on disk had everything, but a folder of thousands of JSONL files is a graveyard, not a memory.

So the goal was narrow: make every past session searchable, and make my agents actually use it.

Two small layers I write by hand sit on top of one large layer the machine builds

Two small layers I write by hand sit on top of one large layer the machine builds.

Two layers I write by hand

The memory has three layers, and the top two are small and human.

Layer 1 is an index file — a single MEMORY.md — loaded into context at the start of every session. It's deliberately tiny, roughly one line per memory, so it costs almost nothing to always carry. It doesn't hold knowledge; it holds pointers to knowledge: "don't assert UI state from an old screenshot — re-check it first."

Layer 2 is the knowledge itself — one Markdown file per fact, in a memory/ folder, read only when a line in the index looks relevant to what I'm doing right now. A correction someone gave me, a constraint that isn't visible in the code, a reference link.

Together these are the curated tier: small, high-signal, hand-edited. They're great, and they have a ceiling. I am never going to hand-write a note for all 6,679 sessions. That's what the third layer is for.

The layer the machine builds

Layer 3 is a full-text index of every session transcript, built by a roughly 230-line Python script and SQLite's FTS5 engine.

The shape is simple. Three tables: sessions (one row of metadata per session), turns (one row per turn — role, timestamp, the text), and turns_fts, an FTS5 virtual table that indexes the text for search. Triggers keep the search index in sync whenever a turn is inserted or deleted, so I never maintain it by hand.

Indexing is incremental. The script walks my project log directories, and for each transcript it compares the file's modified time against when it last indexed that session. Unchanged files are skipped; a changed session has its old turns deleted and re-read. Re-running it is cheap and idempotent. To survive several coding sessions running at once, the database opens in WAL mode with a 30-second busy timeout.

Searching is one command:

python3 session_indexer.py --search "signing key"

That runs an FTS5 MATCH, ranks the hits by relevance, and prints a snippet() of each — the matching text with a few words of context on either side, best matches first. I can scope it to a single project. It isn't semantic search and it isn't trying to be: it's grep that ranks, over a corpus my own keyboard produced.

As of the last index run, that corpus is 6,679 sessions and 286,382 turns — a 460 MB database spanning roughly thirty project folders, with the biggest single project holding 2,415 sessions on its own. The number climbs every day, because every session I finish becomes searchable the next time the indexer runs. (The figure in my own setup docs still says "1,800+." The system outgrew its own documentation months ago.)

The index does nothing until “look it up” beats “make it up” as the default

The index does nothing until "look it up" beats "make it up" as the default.

The rule that makes it worth the disk space

Here's the part that actually matters, and it isn't the database. It's a rule.

An index nobody queries is just expensive disk. So the instruction my agents run under is blunt: when I ask a "how did we do this last time" question — anything that reaches back to past work — search the index first and quote what you find. If nothing matches, say "no related record," and do not invent one.

That's the whole trick. The cost of a confident-but-wrong answer about my own past is high; the cost of a search is milliseconds. So searching is mandatory and guessing is banned. Parts of my setup already lean on this — the skill I use to draft these very articles runs a session search to ground itself in things I actually did, instead of letting the model produce a plausible-sounding anecdote that never happened.

If that sounds familiar, it's the same lesson as an automation I killed a few weeks ago: a system is only worth running if it changes an outcome you can point to. This one earns its keep, because every grounded answer is one I didn't have to reconstruct from memory — or get wrong.

Keeping it fresh without thinking about it

A memory you have to remember to update isn't a memory. So the indexer runs itself, on a cadence tied to how I work rather than to a fixed clock.

When a session starts, a cheap hook just reads what's already there — no scanning. The real work happens when I clear my context to start something new: that's the natural checkpoint, so that's when the indexer does an incremental pass and a pattern scan. And in case I go a long stretch without clearing, a backstop runs the heavy work at most once every two hours. In practice the index refreshes five to eight times a day, and I never type the command.

Something else rides the same rails. A small detector reads my recent prompts and commands, strips out the secrets and paths and numbers, and fingerprints what's left. When the same fingerprint shows up across three or more separate sessions, it surfaces as a candidate — "you've done this by hand several times; maybe it should be a script." It doesn't promote anything on its own; it flags the repetition and waits for me to decide. The memory notices when I'm repeating myself.

What I'd actually copy from this

If you work alone — or on a team small enough that you are the institutional memory — the parts worth stealing aren't the schema:

  • Index your own history, not just the world's. The model already knows the world. What it can't know is what you did last March. That gap is the cheapest, highest-leverage thing you can close.
  • Make search mandatory before guessing. The index does nothing until "look it up" beats "make it up" as the default. Write that rule down where the agent reads it.
  • Keep the hand-written layer tiny. Curate pointers, not encyclopedias, and let the machine hold the long tail.
  • Put the upkeep on a schedule, not on your willpower. If refreshing the memory depends on you remembering to, it will rot.

A better model is the same upgrade everyone else gets. A searchable memory of your own work is the one edge that's actually yours — and for a company of one, it's the closest thing I have to a colleague who was there the whole time.

Top comments (0)