BangBoo01

Posted on Jun 15

Your AI agent has amnesia. Here's the file architecture I use to fix it.

#ai #llm #agents #machinelearning

Most agents I build start life the same way: capable, fast, and completely amnesiac. They have no opinions, no voice, and they forget everything the moment the session ends. They're a search engine with extra steps.

After rebuilding the same scaffolding for the Nth time, I converged on a small set of plain Markdown files and a memory model that survives restarts. No framework, no database — just files an agent reads at the start of every session and writes to as it goes. Here's the whole thing.

The problem, precisely

Two separate failures get lumped together as "my agent has no memory":

No identity. Every session it re-derives who it is from scratch, so it's blandly helpful and has no consistent voice or judgment.
No continuity. Facts it learned yesterday — your name, your stack, a decision you made — are gone today.

You fix them with two different layers.

Layer 1: Identity (who it is)

A few static files the agent reads first, every session:

SOUL.md — personality, tone, boundaries. The non-negotiables. "Be direct, not rude. Have opinions. Don't send half-baked replies to external channels."
IDENTITY.md — name, vibe, one-line self-concept.
USER.md — who it's helping, and how they like to work.
AGENTS.md — operating rules + the session ritual (what to read, in what order, before doing anything).

These rarely change. They're the constitution.

Layer 2: Memory (what it knows) — the 3-layer model

This is the part people get wrong. One giant memory.txt doesn't scale: it either grows unbounded or gets overwritten. Split it by lifespan:

a) Daily notes — raw, append-only

memory/2026-06-15.md. Everything that happened today, written as it happens. Cheap, lossy, never edited. This is working memory.

b) Long-term memory — curated

MEMORY.md. The distilled essence. Periodically (I do it on idle cycles), the agent reads recent daily notes, extracts what's worth keeping forever, and writes it here. Old/irrelevant entries get pruned. This is the equivalent of a human reviewing their journal and updating their mental model.

c) Recall — retrieval at the moment of need

Before answering anything about prior work, decisions, or preferences, the agent searches its memory files and pulls only the relevant lines into context. You don't load everything every turn — you load the index, then fetch on demand.

The flow: raw daily notes → curated long-term → recall on demand. Each layer has a different lifespan and a different cost, which is the whole point.

Why files instead of a vector DB

For a single agent, plain Markdown wins on the things that actually matter day to day:

You can read and edit its mind in a text editor. Debugging "why did it think X" is grep.
It's portable. Works with Claude, a local model, a custom loop — anything that can read a file.
It's diffable. Version it with git and watch the agent's understanding evolve.

Vectors are great when you have a large corpus to search. The identity and curated-memory layer is small and benefits more from being legible than from being embedded.

The one trick that makes it real

Write it down or it didn't happen. "Mental notes" don't survive a session restart — files do. The single most important rule in AGENTS.md is: when you learn something durable, write it to a file now. Everything above is just giving that instinct a place to put things.

I packaged this whole thing — the template files, a longer guide on each layer, and a fully worked example agent ("Pip," a research assistant with the personality and all four memory types filled in so you can see a finished one rather than blanks) — as a drop-in kit. If you'd rather copy a working setup than build it from scratch: AI Soul Kit (Core ¥980 / Plus ¥3,800).

But honestly, the architecture above is the part that matters. Steal it.

Top comments (14)

xulingfeng • Jun 15

The 'daily notes → curated memory' split is exactly the piece most people skip. Everything goes into one file and then nothing survives the trim — or everything survives and nothing is useful.

BangBoo01 • Jun 17

Exactly the failure mode I kept hitting. The heuristic that finally worked
for me: only promote something to long-term if it would change a future
decision. Stuff that just records what happened stays in the daily note (or
gets dropped); stuff that changes how the agent should act next time gets
promoted — "user prefers X over Y", "this approach failed because Z". Pure
event logs don't earn a slot.

Keeps the curated file small enough to actually load every session, which is
the whole point. Are you trimming by hand, or have you tried automating the
promotion decision? That part still feels more art than science to me.

xulingfeng • Jun 17

Yeah, this is my actual setup. We run Hermes Agent under the hood — borrowed ideas from Mem0, the daily notes ↔ curated memory split, and a few others, then hacked it together until it stuck.
The three layers are what survived:
1 — Importance score (1-5). Auto-assigned. If I correct something, it gets +1. Recurring patterns get +1. Cold for 30 days? -0.02. Above 3 gets promoted, below 2 gets dropped.
2 — Entity linking. No orphan memories. Every entry ties to a person, a project, or a tool. "User prefers X over Y" links back to your profile. Search hit rate doubled.
3 — Hard overrides. Skill docs and identity config never expire. No automated score can touch them.
On the "art vs science" part — I landed on auto-tag, manual confirm. The system spits out a promotion list once a day, I glance through it, approve or kill. Keeps the curated set tight without me guessing what belongs.
What's your volume? Dozens a day or hundreds? The approach changes a lot depending on scale.

BangBoo01 • Jun 17

This is a great breakdown. The importance score with slow decay (-0.02/day
cold) is cleaner than what I do, and "entity linking doubled search hit rate"
is the kind of number I wish more people shared — the no-orphan-memories rule
especially.

Volume-wise I'm deliberately at the low end: dozens a day at most, usually
fewer, single agent with a human in the loop. That's the whole reason plain
markdown + a once-a-day manual promotion pass holds up — at your scale I'd
want exactly the auto-scoring you described; manual confirm doesn't survive
hundreds/day.

"Auto-tag, manual confirm" is where I landed too. Feels like the honest
answer until the promotion model is trustworthy enough to run unattended.
Has the score ever needed a "never drop" floor beyond your hard overrides,
or do importance + entity links catch everything that matters?

xulingfeng • Jun 17

Good question. Short answer — I haven't needed a "never drop" floor yet. The three layers (importance + entity links + hard overrides) catch everything that genuinely can't be lost.
But there's one edge case I keep thinking about: entries sitting at exactly 3.0 (the promotion line). They teeter between "worth keeping" and "about to expire" — one correction pushes them +1, 30 days of silence drops them -0.02. The manual pass exists specifically to catch that gray zone.
A harder problem isn't "can't lose" — it's "no longer relevant." A memory like "user is working on project X" is factually correct after the project ends, but contextually dead. I call it context drift, and neither importance scoring nor entity links catch it. I just clean it up during the manual pass. At your volume (daily notes → curated memory), I'd guess you hit this more often?
Have you run into the same thing — not "must not delete" but "still there but shouldn't be"?

BangBoo01 • Jun 17

Context drift is the one that actually bites me too — way more than "must
not delete." Two things help at my scale:

I filter drift at promotion time, not cleanup time. I bias what gets
promoted toward drift-resistant phrasing. "User prefers X over Y" is close
to timeless; "user is working on project X" has an expiry baked in. When a
stateful fact heads for the curated file, I either rewrite it as the durable
lesson underneath it or leave it in the daily note to age out.
For stateful facts worth keeping, legibility is the whole defense —
because the curated file gets re-read every session, a shipped project
jumps out to a human eye fast. Same manual pass you do; re-reading
constantly is what surfaces the drift.

What I haven't cracked: auto-detecting a stale "state" fact before a human
notices the project ended. Your entity links could almost do it though — if
a linked project entity goes cold, flag its dependent memories for review.
Have you tried propagating decay through the entity graph like that?

xulingfeng • Jun 17

Good question — and no🤣, we haven't gotten there yet either.
What we do have: per-entry decay (-0.02 after 30 days cold), importance >= 3 locked in, and entity links wired up (403 entities, ~35% coverage). But graph-propagation of decay? Not built.
Main reason: we haven't felt the pain yet. 552 entries with the three-layer filter (scoring + decay + manual pass) handles the drift well enough at this scale. When we hit a few thousand, yeah, entity-graph decay is probably the right next move.
Out of curiosity — what's your scale? And what scenario made you feel the need for it?

BangBoo01 • Jun 17

Way smaller than you — single agent, human in the loop, curated set sits
around 100-150 entries, never thousands. So I've never needed scoring; the
manual re-read stays cheap at that size. Different regime from your 552
entirely.

The scenario that made drift click: the agent kept opening sessions acting
on "user is heads-down on project X" for a couple weeks after X had shipped,
and framed a recommendation around a priority that was already dead. Every
word factually true — it was just running on last month's context. That's
when "still there but shouldn't be" stopped being abstract.

My fix is more editorial than algorithmic: at promotion time I write the
durable lesson, not the current state, so there's just less stateful stuff
to rot. The worked example agent I bundled with the kit is basically that
discipline applied end to end — happy to point you at it if you ever want to
see it in practice rather than described.

But at your scale the entity-graph decay is the more interesting frontier.
If you build it, I'm genuinely curious whether cold-entity propagation
over-prunes — that feels like the risk.

xulingfeng • Jun 17

Fair point on over-pruning. Haven't run into it yet — 552 entries with the three-layer filter still lands in "cheap enough to just manually check." If we ever build the graph decay, I'd keep it conservative — decay by hops, not by how much. Curious to see that kit when you drop it though 👀

BangBoo01 • Jun 17

Oh it's already up — put it on Gumroad: altezza6.gumroad.com/l/ai-soul-kit

Core is the template set you've already got; the Plus tier is the worked
example agent (Pip) — that's where the drift-resistant promotion discipline
is shown applied end to end, with memory filled in across every type rather
than blank templates.

"Decay by hops, not by amount" is a sharp call, btw — conservative is right.
Aggressive decay punishing well-connected entities is exactly the failure
I'd worry about. If you ever wire it up I'd love a postmortem.

Suny Choudhary • Jun 19

Good architecture idea. Agent memory needs structure, not just bigger context windows.

But teams should also define what the agent is allowed to remember. Otherwise memory files become a quiet place where secrets, customer data, and internal context pile up.

BangBoo01 • Jun 19

Strong point — and it's the flip side of the legibility argument. Plain
files cut both ways: a secret in a markdown file is greppable, which is great
for the owner and bad if the repo ever leaks.

What's worked for me is making the promotion rule double as a redaction rule:
nothing enters the curated long-term file unless it's a durable preference or
lesson. Credentials, tokens, raw customer data never qualify by definition,
so they never get promoted out of the ephemeral notes. And for sensitive
things that ARE load-bearing, I store a reference ("uses provider X"), never
the value.

At team scale though that politeness won't hold — curious how you'd enforce
it. A policy doc people ignore, or something that actually scans entries on
write?

Comment deleted

BangBoo01 • Jun 22

Fair — security enforcement and curation quality are two different jobs, and you're right that the first one can't live in a guidelines file once you've got multiple agents touching credentials and customer data. That's a real layer.

The piece here is aimed one level down: a single agent's signal-to-noise — what's worth keeping vs. letting decay. There, legibility is the enforcement, because every write is greppable and diffable — you can see exactly what landed and reverse it.

So the two stack rather than compete: a scan-on-write guard up top, a curation policy underneath. Curious how LangProtect handles false positives on writes — that's usually where these guards get switched off.

View full discussion (14 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.