The Real Problem With AI Agents Isn't the Code — It's the Memory

#waysofworking #mcp #toolchain

It's been a week since my first post, and I'm already back with two more.
There's an issue I keep running into that nobody talking about enough. And I'm not talking about setting up skills. You ask an agent to implement a feature. Three days later, another agent works in the same area and contradicts it. Different patterns. Different assumptions. Not because agents are bad at coding — because they don't have a humans working memory, and you feel like you starting from scratch or worried about what was just done.
My hypothesis: the solution isn't a better agent. It's better governance infrastructure.
I've been writing about the toolchain I'm using to test that — Jira, Confluence, GitHub, and AWS connected through MCP — and what a session actually looks like when it works. As a measure of what this approach produces, I also did a one-off technical deep dive into the multi-IDP authentication layer that will be shipped. It's the only time I'll go that deep on architecture, because what I'm building is centered around the solution, not the technical detail.
Both posts are here

Top comments (1)

Theo Valmis • May 12

This is the exact problem. Agent B doesn't know what Agent A decided, so it re-derives the architecture from scratch and arrives at a different answer. The code compiles, tests pass, but now you have two contradictory patterns living in the same codebase.

The MCP + Confluence approach is smart for capturing decisions, but I wonder about retrieval fidelity. When Agent B starts a session, does it actually pull the right prior decisions, or does it just get the most recent page? The relevance problem feels harder than the storage problem.

Curious how you handle the case where an agent legitimately should override a prior decision -- not because of memory loss, but because requirements changed. That's the boundary between "drift" and "evolution" and it's hard to distinguish programmatically.