DEV Community

My AI Agent Keeps Forgetting Everything

Stephen J Newhouse on April 07, 2026

My AI Agent Keeps Forgetting Everything; So do I... I have multiple sclerosis. Some days are better than others, but one thing is constant: repe...

Read full post

Elmar Chavez • Apr 16

This is a step but there will always be a challenge to make AI work consistently in the long run. Almost everything that happens with software engineering involves subjective decisions and these hallucinations and inconsistencies prove this.

Jess Lee • Apr 13

Shout out to @diet-code103!!

Daniel Yarmoluk • Apr 15

A compressed knowledge graph, particularly on MS as .md level...I would be happy to do that for you to improve your memory issue.

leob • Apr 16 • Edited

I think you need to explain that a bit more clearly for the rest of us to understand - are you proposing a different (or "better", even) approach than what the author proposed?

Daniel Yarmoluk • Apr 16

Well I don’t need to do a thing. However, I’d that a kind request of further explanation?

leob • Apr 16 • Edited

No you don't have to do anything, but you could ;-)

My point basically is that the author already seems to have a pretty good grasp of the issue, and how to tackle it :-)

Daniel Yarmoluk • Apr 16

Fair point — let me explain.

The author's five-file structure is excellent execution tracking. What I was gesturing at is a different layer: instead of storing project context as flat markdown files, you compress it into a knowledge graph — nodes and edges representing concepts, decisions, and relationships, serialized as .md.

The practical difference: flat files grow linearly. A knowledge graph stays compact because relationships replace repetition. The agent doesn't re-read "we use Postgres" buried in a decisions log — it traverses a typed edge from DatabaseChoice → Postgres with the rationale attached. Context retrieval becomes a graph query, not a document scan.

So not a better approach — a different abstraction built on a similar idea. Stephen's five-file structure could sit underneath a KG layer: the files feed the graph, the graph feeds the agent.

The MS angle was specific: for someone managing cognitive fatigue, a compressed, queryable knowledge graph reduces the mental overhead of re-orienting the agent each session. Less to re-explain, because the structure carries more of the context automatically.

leob • Apr 16

Thank you, that makes a lot of sense:

"A knowledge graph stays compact because relationships replace repetition"

joinwell52 • Apr 20

This resonates — we hit the exact same primitive from a different angle.

Your AA-MA solves "how does a single agent keep its own memory across
sessions." We hit the same wall (Markdown + structure + separation by
behavior type) trying to solve a different problem: how do N agents
coordinate without a broker.

The core insight we converged on independently:

You separate knowledge by how it behaves (static / decisions / state / plan / log) — 5 files per task
We separate work by routing (filename encodes sender-to-recipient) — directory encodes status

Both exploit the same fact: the filesystem is already a state machine.
rename is atomic (POSIX). ls is a full diagnostic. You get
visibility + atomicity + zero infra, if you stop trying to mediate
everything through a chat context.

And your "None of this was designed upfront — each piece was bolted on
after a failure made it obvious" is the exact pattern we observed. After
48 hours of 4 Cursor agents running on a minimal rulebook, they had
invented 6 coordination patterns we hadn't written (broadcast addressing,
anonymous role slots, traceability frontmatter, subtask sub-folders…).
All of them surfaced as new filenames in a shared folder. None of this
is designable. It emerges.

Field report + MIT protocol: github.com/joinwell52-AI/FCoP

Genuinely curious what happens if AA-MA's per-task 5-file memory sits
underneath FCoP's routing layer. Feels like they compose, not conflict.

Re @leob's "time for a standard?" — I suspect this won't come from
Anthropic, because the whole point is tool-neutral. If it works across
Claude Code, Cursor, and Codex, it has to come from users. Which is
what we're both doing :)

leob • Apr 16

Impressive, both Diet-Coder's effort and yours ...

With all of these separate efforts going on I start wondering if it's time for Anthropic to pull together some sort of "standard" and baking it into CC ? Because right now everyone seems to be scrambling to reinvent this wheel, with different approaches and different ambition levels ...

Max Quimby • Apr 8

The distinction you're drawing here — separating knowledge by behavioral type (what changes vs. what doesn't) — is the insight that most "just use CLAUDE.md" advice misses. Treating a single instruction file as both strategy and execution state creates the hallucination problem you described: the agent can't tell the difference between a settled architectural decision and current task state.

The five-file structure maps well to how working memory actually functions: long-term facts, deliberate decisions, current focus, planning, and audit trail. What strikes me is that this is really typed memory — you're enforcing contracts between information types so the agent can't confuse "we always use postgres" with "this PR is still in review."

One thing I've found useful on a similar structure: a versioned decisions log where you append rather than overwrite. If an agent re-litigates a settled decision, you can trace exactly when and why it was resolved — helpful during post-mortems when you're not sure whether the agent worked from stale context or genuinely hit an edge case.

The part about this emerging from real regulated-industry failures rather than theoretical design resonates — these patterns always look obvious in retrospect.

Andrew Rozumny • Apr 15

This hits way too close.

My biggest frustration isn’t even “new session = no memory” — I’m used to that.

It’s when the agent forgets things inside the same session / project flow.

I’ll explain architecture, constraints, decisions — everything looks aligned.
Then 20–30 messages later it starts drifting, ignores earlier decisions, or straight up contradicts them.

That’s where it becomes painful, because it’s not just context loss — it’s trust loss.

And I’ve tried the usual fixes:
• long system prompts
• “single source of truth” docs
• summaries

But like you said — they mix static knowledge with dynamic state, and the agent just can’t prioritize what matters.

The idea of separating memory by type instead of just “more context” makes a lot of sense.

Curious — have you noticed this helping with in-session drift, or mostly across sessions?