AI agents do not usually break mature codebases because they cannot write code.
They break them because they forget why the code is shaped the way it is.
They forget the package boundary.
They forget the feature shipped, but it is still flagged off.
They forget we are only halfway through a multi-stage migration.
They forget that one abstraction already caused problems once.
Then a later session makes a locally reasonable change that is globally wrong.
When people talk about AI memory, the conversation often drifts toward the same
idea:
give the agent more.
More chat history.
More docs.
More logs.
More old reasoning.
That is why I am skeptical of the idea that better AI memory means storing more.
If all of that just turns into a larger pile to drag from session to session,
the memory layer becomes a junk drawer.
The agent has "more context," but the work does not get more coherent.
It just gets noisier.
I think the goal is much narrower:
an AI-assisted project needs to preserve the small set of durable facts that are
load-bearing later.
Not everything.
Just the things that stop a later session from making a locally reasonable and
globally wrong decision.
Most project context is temporary.
The exact wording of a prompt is temporary.
The full output of a log search is temporary.
The back-and-forth while exploring an implementation is temporary.
The rough dead ends you hit during one session are mostly temporary too.
What actually needs to survive is a much smaller set.
For me, it looks something like this:
- decisions
- boundaries
- authoritative sources
- failure modes
- mandatory conventions
That is the real memory layer.
Decisions are the obvious one.
Not "we discussed this once."
I mean:
- what was decided
- why it was decided
- whether it is still active
- what it replaced
- what is now off the table
That last part matters more than people admit.
If a new decision supersedes an old one, the relationship has to be explicit.
Otherwise every later session has to guess whether the old rule is still alive.
A memory layer also has to store status.
I hit this recently on a staged release.
One release shipped the metrics and observer path for a future cutover, but the
actual cutover was still dormant. The next step was not "start the future
implementation." The next step was "watch the baseline for the agreed window,
then decide."
That is a very different kind of memory.
The important fact was not just that code had shipped.
It was:
- this capability is live but still dormant
- the follow-up is intentionally blocked for now
- we are in the observation window
- the earlier decision has not been superseded yet
Without that state recorded somewhere durable, a later session can easily treat
"the metrics shipped" as "the gate is cleared" and start nudging the project
forward too early.
Boundaries are the next thing.
This is where a lot of mature-repo drift starts.
The code the agent writes can be perfectly competent and still be wrong because
it landed in the wrong layer.
The package that defines canonical query types should not quietly become the
package that knows about one storage engine.
The service that owns orchestration should not quietly absorb unrelated business
logic just because the local edit looked convenient.
A useful memory layer needs to preserve those boundaries in plain language:
- what belongs here
- what does not belong here
- where the adjacent responsibility actually lives
Authoritative sources are another big one.
Projects often have several places that mention the same thing:
- code
- docs
- tickets
- release notes
- generated configs
If the agent cannot tell which source actually owns a field, a state transition,
or a behavior contract, it can make a change that looks harmless and still
quietly violate the system.
The project has to say, somewhere durable:
- this path owns this behavior
- this service is authoritative for this record
- this generated file should not be hand-edited
- this contract is defined here, not inferred from nearby code
Failure modes are the other kind of memory people under-capture.
Some of the most valuable project knowledge is not "how it works."
It is:
- what broke before
- which direction created drift
- which retry shape caused trouble
- which abstraction looked elegant and turned into a maintenance tax
If that only lives in one old chat or in your head, the agent will keep
rediscovering the same trap with total confidence.
Mandatory conventions matter too.
Not every common pattern is a rule.
Not every rule can be inferred from code.
Some things really are optional style preferences.
Some things are hard constraints.
If the agent cannot tell the difference, it will treat both like vibes.
A good memory layer should also encode when the agent must pause and ask for
human review.
Some decisions should not be made autonomously at all. Introducing a new Kafka
consumer, creating a new RPC boundary, changing auth or billing behavior,
removing backward compatibility, or adding a new queue or datastore are not
just implementation details. They are moments where the project should require
human judgment on purpose.
That is why I think the useful memory question is not:
"How do I make the agent remember more?"
It is:
"What absolutely has to survive a fresh session?"
That is a much better filter.
It also makes the "what not to store" side much clearer.
A durable memory layer probably does not need:
- every exploration path
- every copied stack trace
- every draft explanation
- every implementation detail from every task
- giant transcript dumps pretending to be knowledge
Those things belong in the window while you are working.
They do not all belong in the store.
This is why I keep coming back to one idea:
memory without a schema is just a bigger transcript.
And bigger transcripts rot.
They get longer.
They get noisier.
They become expensive to reread.
They make retrieval harder.
They turn "remember this" into "search this pile and hope."
Structured memory is less exciting to talk about.
It is also much more useful.
A small, boring memory layer with decisions, boundaries, ownership, and failure
modes will usually beat a richer but sloppier pile of context.
That does not have to mean some grand new platform.
It could be:
- a few markdown files
- a strict JSON ledger
- a tiny SQLite table
- a set of durable docs with clear update rules
The implementation matters less than the shape.
The shape is the point.
And it should not grow forever without judgment.
Some memory stops being useful because the project has moved on.
A boundary gets replaced.
A migration completes.
A temporary constraint expires.
A failure mode gets designed out.
If those records stay mixed in with the live ones forever, the memory layer
slowly turns back into the same clutter problem it was supposed to fix.
So part of a healthy memory layer is occasional pruning.
Not deleting history recklessly.
I mean deliberately:
- archiving records that are no longer active
- marking superseded decisions clearly
- removing stale task-local notes from durable memory
- keeping the live memory surface small enough to stay high-signal
That matters for quality and for cost.
A smaller, cleaner memory layer is easier to query, easier to trust, and easier
to carry forward across sessions.
If your AI sessions keep feeling like first contact, I do not think the answer
is automatically more memory.
It may just be better memory.
What is the smallest set of durable records that would stop your project from
forgetting itself between sessions?
Top comments (1)
The failure mode you describe — “locally reasonable, globally wrong” — is exactly what CONTEXT.md solves in practice. Not a platform, just a file. Decisions with status (active/superseded), boundaries in plain language, failure modes that already happened. Sessions start by reading it, end by updating it. The schema stays small because anything that doesn’t survive three sessions gets archived. What made it click was treating it like a protocol, not a log.