Mary Olowu

Posted on May 14

AI Can Write the Code. It Still Forgets the Decisions That Matter.

#discuss #ai #programming #productivity

The session boundary and architectural drift

A lot of AI coding advice quietly assumes the same thing:

if the output is bad, you probably need a better model, a better prompt, or more tooling.

Sometimes that is true.

But one AI coding failure keeps showing up for me, and I do not think a better model is the real fix.

In one session, we make a decision that is supposed to guide the rest of the project.

Then in a later session, the model answers that same question differently and starts nudging the project down another path.

Usually it is more subtle than "the code is wrong."

We already decided that deprecated paths stay backward compatible for a reason, that receivers fan out to downstream consumers instead of owning business logic inline, and that idempotency gets enforced before side effects fire. Then a later session solves the local task as if those decisions were optional because it only sees the immediate diff.

Nothing is obviously broken right away.

The code still looks competent.
It still compiles.
It still sounds reasonable.

But the project starts to feel scattered.

It no longer feels like one person with memory has been carrying the work forward.

That changed how I think about AI coding.

On an ongoing project, the bigger issue is often not generation quality. It is continuity.

The model does not know:

which decision had already been made
which tradeoff we had already accepted
which docs were still authoritative
what had changed recently
what should not be changed again

That is not really an intelligence problem.

It is a memory problem.

I feel this most on a solo-dev monorepo, where I am not just using AI for one-off code generation. I am also using it for backlog triage, bug capture, planning, reports, and picking work back up across sessions.

The frustrating part is not that the model cannot code.

It is that it can code while waking up without durable context.

Sometimes the missing memory is shallow and local.

A simple rule in the codebase or in CLAUDE.md helps a lot:

follow the existing conventions
match the existing code patterns
use FIFO here, not LIFO
do not add a second library when the current one already covers the job

That kind of memory is useful and surprisingly high leverage.

But the harder problem is when the missing memory is much deeper than code style.

It is about remembering why the project should not go a certain direction again.

Things like:

deprecated behavior stays backward compatible until the migration path is actually complete
receivers fan out work instead of embedding downstream business logic directly
idempotency has to happen before side effects, not after them
this webhook should update the existing record, not create a second one
this state transition only happens after this other condition is true
duplicate events should be absorbed here, not after side effects have already fired
this source is authoritative for this field, so do not let another path quietly overwrite it
this module already has a helper for this logic, so do not bypass it and create a second path
do not bring in a new dependency to solve a problem the existing stack already solves
do not create a retry flow that can turn into an infinite loop
do not quietly undo an earlier system decision because the current session cannot see its history

Those decisions are usually load-bearing.

They were made for a reason.

Forgetting why they exist is a bit like forgetting why a house has support pillars in the frame. Once the reason disappears, the pillar starts to look optional. Then removing it or building around it the wrong way starts to feel harmless, right up until the cost shows up somewhere else.

This is where AI-written code starts to feel different from human-guided code.

A person with memory usually carries more invisible continuity into the work.

They remember:

why the earlier choice was made
what problem we were trying to avoid
which convention is mandatory versus just common
which "reasonable" branch is actually the wrong one for this project

Without that continuity, AI can produce code that looks fine in isolation while introducing costly mistakes into the project over time.

If the model keeps re-litigating the same decision, reopening the same tradeoff, or proposing work that was already decided against, the problem is not just generation quality. The system has no reliable memory layer.

That is why I have become much more interested in boring project context than in prompt tricks.

What helped me was giving AI a few stable places to look:

short repo guardrails
maintainers docs for durable context
lightweight local memory for session continuity
real systems of record for backlog and releases
explicit notes about patterns to keep following and failure modes to avoid

None of that is glamorous.

It is also what made the biggest difference.

Once I had that structure, the sessions stopped feeling like first contact every time.

The model still made mistakes. It still needed review. It still needed boundaries.

But the failures got more honest.

Instead of "the AI is useless," the problem became easier to diagnose:

the memory is stale
the docs are weak
the workflow has no source of truth
the instructions are doing the job that documentation should be doing
a deeper architectural rule is being treated like a surface-level style preference

That is a much better problem to have because you can actually fix it.

I think a lot of AI coding frustration is really project-memory failure wearing a model-shaped mask.

People keep trying to solve it with one more model upgrade or one more agent when the actual missing piece is memory that survives the chat window.

That does not mean model quality is irrelevant.

It means there is a ceiling on how useful any model can be if the project keeps forgetting its own load-bearing decisions.

The shift for me was simple:

I stopped asking, "How do I make the model smarter?"

I started asking, "How do I stop a later session from quietly taking the project in a different direction?"

The future of AI coding is not just better generation.

It is better memory around the decisions that hold the work up.

What breaks AI coding more often in your projects: weak generation, or weak continuity?

Top comments (30)

Theo Valmis • May 20

The CLAUDE.md approach helps with shallow local rules, but architectural decisions are a different category. A rule says "don't do X." A decision record says "we considered X, Y, and Z, chose Y because of constraints A and B, and X becomes valid if constraint A changes." Those aren't equivalent things to encode.

Rules suppress behavior. Decision context guides it. When the model re-encounters the problem in a later session, a rule might block the wrong path — but without the reasoning behind the decision, it can't recognize when circumstances have genuinely changed and the original decision no longer applies.

We've been building around this at Mneme — treating project memory as a first-class artifact, not something that gets flattened into a rules file. The continuity problem is real and underrated compared to the generation quality conversation.

Mary Olowu • May 20

Yes, this is exactly the distinction.

CLAUDE.md helps with guardrails and local rules, but it is the wrong shape for
load-bearing architectural decisions. A rule can say "do not do X." A decision
record has to preserve why Y won, what constraints made it win, and what would
have to change before X becomes valid again.

That is the difference between suppressing behavior and guiding judgment. Once
the reasoning disappears, a later session can follow the rule mechanically and
still miss that the world around the decision has changed.

Treating project memory as a first-class artifact is the important move.

Andrii Krugliak • May 19

The "wakes up without durable context" framing is the most accurate description I've read of this failure mode. The code review or PR analogy doesn't quite capture it, because PRs at least carry the explicit history of decisions in their commit messages and review threads. The session boundary just erases all of that.

The pattern that's worked for me on a solo-dev monorepo is a decisions/ folder of single-page memos. Each memo ends with a short footer covering what changed, what's now load-bearing, and what not to touch without re-opening the memo. Then a CLAUDE.md rule that says "before editing $area, read decisions/$area.md." The model reads it. The retention isn't perfect, but architectural-rule violations dropped from around 30% to under 5% across a quarter.

Mykola Kondratiuk • May 15

this is the session boundary problem - and better prompts don’t fix it. treating decisions as artifacts you explicitly pass in each time actually helps. ADRs as context prepend, spec files. the model isn’t forgetting - it was never told.

Mary Olowu • May 15

Exactly. “The model isn’t forgetting, it was never told” is probably the cleanest way to put it. ADRs, spec files, and other durable artifacts are what make context portable across sessions instead of trapping it in one chat window.

Mykola Kondratiuk • May 16

right - but the format gap matters too. most ADRs are written for human audit, not model ingestion. context-optimized ADRs are shorter, lean on explicit why, skip rationale prose. same artifact class, different structure.

Mary Olowu • May 16

Agreed, and that is a real shift in how we think about specs/artifacts.

If AI is doing 85% of the work, it may actually be the primary audience. So maybe the artifact should optimize for model ingestion first, then generate the human-readable version when needed.

I tried this with ADRs: more structured, schema-like, harder for me to read directly, but easier for an agent to consume consistently. Then a skill can turn that durable context into a readable report for me periodically.

So the artifact stops being “documentation humans read” and becomes “decision memory the system can safely reuse.”

Mykola Kondratiuk • May 16

yeah that's the thing. same pattern with deployment docs - condensed into key/value blocks agents can parse, kept narrative version for humans in confluence. the weird part: the condensed version became easier for people to review too.

Gábor Mészáros • May 19

yep, there are a lot of fun diagnostics rules (for example never say NEVER, and DO NOT think of a pink elephant, not joking, these are generic failure modes on any llm.)

GnomeMan4201 • May 15

This is the exact mindset I have about Ai . From this post alone, you can tell you are very good at turning from what i consider a complex concept into a clear narrative. That also would indicate you have a lot of hands on “in the trenches “ experience. I don’t really know what im getting at but just want to say this is a legit post to read

Mary Olowu • May 15

Really appreciate that. A lot of this came from hitting the same failure mode over and over on real project work, so I wanted to explain it in plain terms instead of treating it like magic. Glad it landed for you.

Comment deleted

Mary Olowu • May 16

Yes, this is exactly the failure mode I keep seeing. I really like your framing of decisions as records with provenance plus explicit supersedes links, because that turns memory into something queryable instead of something buried in old chat logs. Even a small SQLite decisions table gets a team surprisingly far.

Mike Talbot ⭐ • May 15

It's entirely a tooling problem. If you don't have the proper memory or the proper insights, it's like bringing in a new developer each time and asking them to take the next step.

Mary Olowu • May 15

That “bringing in a new developer each time” analogy is spot on. The painful part is not that the model can’t code, it’s that it doesn’t inherit the expensive decisions unless the tooling gives it durable state. That’s the gap I was trying to point at.

Andy Stewart • May 15

This hits the nail on the head. No matter how high the AI's IQ, without deterministic state management, it’s just "stochastic mediocrity." The solution isn't blindly stacking compute, but building a persistent context that outlives the chat window. An Agent without memory is just a code monkey; one that preserves engineering decisions is a true digital partner.

Mary Olowu • May 15

Persistent context that outlives the chat window is the key. Once engineering decisions are preserved somewhere stable, the agent stops feeling like a fast stateless assistant and starts feeling much closer to a real collaborator.

Andy Stewart • May 18

Completely agree. Persistent context shifts AI from a stateless utility to a true local-first partner. When an agent retains our deep architectural decisions and past bug fixes beyond the session, it masters our logic. That's when it stops being a mere assistant and becomes a reliable, long-term collaborator.

Mary Olowu • May 18 • Edited

Yes, I like your use of the term collaborator. Thats a good way to look at at, and the better we can make its memory long term, the better it will be at being a collaborator.

buildbasekit • May 16

AI pair programming is basically:

Session 1:
“we must preserve backward compatibility at all costs.”

Session 7:
“cleaned up deprecated code for simplicity :)”

Production:
“interesting.”

Zorbiks • May 16

yes and telling claude to "consider every possible error and make no mistake" doesn't work

Elmar Chavez • May 15

AI context is expensive. Memory is expensive. That's why they enforce these limits. AI companies are bleeding money left and right and the revenue are not catching up from what I heard.

Mary Olowu • May 15

I think cost is definitely part of why context limits exist. The part I keep coming back to, though, is that even with a bigger window, the workflow still breaks if the important decisions are not externalized somewhere durable. More tokens help, but better state management helps more.

View full discussion (30 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.