DEV Community

Morgan
Morgan

Posted on

Agents need a black box recorder, not more memory

Every agent product eventually ends up talking about memory.

Longer memory. Better memory. Shared memory. Vector memory. Persistent memory.

I get why. Anyone who has used coding agents for real work has hit the same
wall: the agent loses context, forgets what happened in another client, repeats
itself, or makes a change that is hard to reconstruct later.

But I think "memory" is the wrong primary frame.

The more useful question is:

After the run is over, can I answer what happened?

Not just what the final answer was. What actually happened.

What did the user ask?

What files, tools, docs, and prior context were in play?

Why did the agent call a tool?

Which model produced that action?

What changed?

What did it cost?

Can I replay, audit, or explain the chain?

That is less like a second brain and more like a black box recorder.

The pain is showing up everywhere

The agent tooling conversations I keep seeing are not only about storage.
They are about operational trust.

One MCP discussion described the problem of context being trapped inside one
client. You can brainstorm on mobile, continue in the web app, then open a
coding agent locally and it has no idea what just happened.

That is not just a memory problem. It is a continuity problem.

Another thread proposed standard audit context for AI-initiated MCP tool calls:
why the AI invoked a tool, and which model produced that invocation.

That is not just a logging problem. It is an accountability problem.

Other threads are circling server identity, tool provenance, permission specs,
and tool bills of materials. People are asking questions like:

  • Who published this tool?
  • Did its metadata change?
  • What capabilities does it require?
  • Why should an agent be allowed to call it?

That is not just a security problem. It is a trust problem.

Then there are the everyday developer headaches: unexpected token usage, credits
attached to the wrong workspace, orphaned local subprocesses, tool calls that
worked in one environment but not another.

That is not just observability. It is run truth.

"Memory" hides too much

When we call all of this memory, we flatten several different needs into one
word.

Developers do need agents to remember useful context.

But they also need agents to preserve the reasoning trail around important
work:

  • task intent
  • active context
  • files and tools touched
  • model/tool calls
  • permission and trust assumptions
  • cost/token/process anomalies
  • receipts for important actions
  • a replayable or inspectable run history

Those are not all the same feature.

An agent can remember a fact and still be impossible to audit.

An agent can summarize a conversation and still leave you unable to explain why
it deleted a file, called a tool, burned tokens, or trusted a server.

The product shape I want

The layer I want is local-first and boring in the best way.

It sits under agent work and records enough truth that the user or another
agent can come back later and ask:

What happened here?

And get a useful answer.

Not a hallucinated summary. Not a vague activity feed. Not a giant dashboard
about dashboards.

A compact chain:

  1. The user asked this.
  2. The agent saw this context.
  3. It chose these tools for these reasons.
  4. These tool calls happened.
  5. These files or external states changed.
  6. This was the cost/runtime footprint.
  7. These actions were approved, deferred, or blocked.
  8. This is what a future agent should trust or re-check.

That would make agents safer to use for real work.

It would also make them easier to improve, because the failures would be
visible:

  • lost context
  • stale assumptions
  • wrong tool trust
  • runaway cost
  • missing approval
  • environment drift
  • actions with no durable deliverable

The phrase I keep coming back to

Agents do not only need memory.

They need a local truth layer.

Something closer to:

inspect, replay, and trust agent work across tools and clients.

That is the direction I am exploring with AMK.

The goal is not another knowledge base. The goal is to make "what happened?"
answerable after the run is over.

Because once agents are doing real work, that question matters more than almost
anything else.

Top comments (0)