DEV Community

My AI Agent Keeps Forgetting Everything

Stephen J Newhouse on April 07, 2026

My AI Agent Keeps Forgetting Everything; So do I... I have multiple sclerosis. Some days are better than others, but one thing is constant: repe...
Collapse
 
jess profile image
Jess Lee

Shout out to @diet-code103!!

Collapse
 
codingwithjiro profile image
Elmar Chavez

This is a step but there will always be a challenge to make AI work consistently in the long run. Almost everything that happens with software engineering involves subjective decisions and these hallucinations and inconsistencies prove this.

Collapse
 
daniel_yarmoluk_79a9d0364 profile image
Daniel Yarmoluk

A compressed knowledge graph, particularly on MS as .md level...I would be happy to do that for you to improve your memory issue.

Collapse
 
leob profile image
leob • Edited

I think you need to explain that a bit more clearly for the rest of us to understand - are you proposing a different (or "better", even) approach than what the author proposed?

Collapse
 
daniel_yarmoluk_79a9d0364 profile image
Daniel Yarmoluk

Well I don’t need to do a thing. However, I’d that a kind request of further explanation?

Thread Thread
 
leob profile image
leob • Edited

No you don't have to do anything, but you could ;-)

My point basically is that the author already seems to have a pretty good grasp of the issue, and how to tackle it :-)

Thread Thread
 
daniel_yarmoluk_79a9d0364 profile image
Daniel Yarmoluk

Fair point — let me explain.

The author's five-file structure is excellent execution tracking. What I was gesturing at is a different layer: instead of storing project context as flat markdown files, you compress it into a knowledge graph — nodes and edges representing concepts, decisions, and relationships, serialized as .md.

The practical difference: flat files grow linearly. A knowledge graph stays compact because relationships replace repetition. The agent doesn't re-read "we use Postgres" buried in a decisions log — it traverses a typed edge from DatabaseChoice → Postgres with the rationale attached. Context retrieval becomes a graph query, not a document scan.

So not a better approach — a different abstraction built on a similar idea. Stephen's five-file structure could sit underneath a KG layer: the files feed the graph, the graph feeds the agent.

The MS angle was specific: for someone managing cognitive fatigue, a compressed, queryable knowledge graph reduces the mental overhead of re-orienting the agent each session. Less to re-explain, because the structure carries more of the context automatically.

Thread Thread
 
leob profile image
leob

Thank you, that makes a lot of sense:

"A knowledge graph stays compact because relationships replace repetition"

Collapse
 
leob profile image
leob

Impressive, both Diet-Coder's effort and yours ...

With all of these separate efforts going on I start wondering if it's time for Anthropic to pull together some sort of "standard" and baking it into CC ? Because right now everyone seems to be scrambling to reinvent this wheel, with different approaches and different ambition levels ...

Collapse
 
andrewrozumny profile image
Andrew Rozumny

This hits way too close.

My biggest frustration isn’t even “new session = no memory” — I’m used to that.

It’s when the agent forgets things inside the same session / project flow.

I’ll explain architecture, constraints, decisions — everything looks aligned.
Then 20–30 messages later it starts drifting, ignores earlier decisions, or straight up contradicts them.

That’s where it becomes painful, because it’s not just context loss — it’s trust loss.

And I’ve tried the usual fixes:
• long system prompts
• “single source of truth” docs
• summaries

But like you said — they mix static knowledge with dynamic state, and the agent just can’t prioritize what matters.

The idea of separating memory by type instead of just “more context” makes a lot of sense.

Curious — have you noticed this helping with in-session drift, or mostly across sessions?

Collapse
 
joinwell52 profile image
joinwell52

This resonates — we hit the exact same primitive from a different angle.

Your AA-MA solves "how does a single agent keep its own memory across
sessions." We hit the same wall (Markdown + structure + separation by
behavior type) trying to solve a different problem: how do N agents
coordinate without a broker.

The core insight we converged on independently:

  • You separate knowledge by how it behaves (static / decisions / state / plan / log) — 5 files per task
  • We separate work by routing (filename encodes sender-to-recipient) — directory encodes status

Both exploit the same fact: the filesystem is already a state machine.
rename is atomic (POSIX). ls is a full diagnostic. You get
visibility + atomicity + zero infra, if you stop trying to mediate
everything through a chat context.

And your "None of this was designed upfront — each piece was bolted on
after a failure made it obvious" is the exact pattern we observed. After
48 hours of 4 Cursor agents running on a minimal rulebook, they had
invented 6 coordination patterns we hadn't written (broadcast addressing,
anonymous role slots, traceability frontmatter, subtask sub-folders…).
All of them surfaced as new filenames in a shared folder. None of this
is designable. It emerges.

Field report + MIT protocol: github.com/joinwell52-AI/FCoP

Genuinely curious what happens if AA-MA's per-task 5-file memory sits
underneath FCoP's routing layer. Feels like they compose, not conflict.

Re @leob's "time for a standard?" — I suspect this won't come from
Anthropic, because the whole point is tool-neutral. If it works across
Claude Code, Cursor, and Codex, it has to come from users. Which is
what we're both doing :)

Collapse
 
max_quimby profile image
Max Quimby

The distinction you're drawing here — separating knowledge by behavioral type (what changes vs. what doesn't) — is the insight that most "just use CLAUDE.md" advice misses. Treating a single instruction file as both strategy and execution state creates the hallucination problem you described: the agent can't tell the difference between a settled architectural decision and current task state.

The five-file structure maps well to how working memory actually functions: long-term facts, deliberate decisions, current focus, planning, and audit trail. What strikes me is that this is really typed memory — you're enforcing contracts between information types so the agent can't confuse "we always use postgres" with "this PR is still in review."

One thing I've found useful on a similar structure: a versioned decisions log where you append rather than overwrite. If an agent re-litigates a settled decision, you can trace exactly when and why it was resolved — helpful during post-mortems when you're not sure whether the agent worked from stale context or genuinely hit an edge case.

The part about this emerging from real regulated-industry failures rather than theoretical design resonates — these patterns always look obvious in retrospect.