<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Morgan</title>
    <description>The latest articles on DEV Community by Morgan (@morganlabs).</description>
    <link>https://dev.to/morganlabs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3929982%2Ff3df85bf-1850-4ed6-bf6e-955fe7798d76.png</url>
      <title>DEV Community: Morgan</title>
      <link>https://dev.to/morganlabs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/morganlabs"/>
    <language>en</language>
    <item>
      <title>Agents need a black box recorder, not more memory</title>
      <dc:creator>Morgan</dc:creator>
      <pubDate>Thu, 14 May 2026 20:21:21 +0000</pubDate>
      <link>https://dev.to/morganlabs/agents-need-a-black-box-recorder-not-more-memory-4hpg</link>
      <guid>https://dev.to/morganlabs/agents-need-a-black-box-recorder-not-more-memory-4hpg</guid>
      <description>&lt;p&gt;Every agent product eventually ends up talking about memory.&lt;/p&gt;

&lt;p&gt;Longer memory. Better memory. Shared memory. Vector memory. Persistent memory.&lt;/p&gt;

&lt;p&gt;I get why. Anyone who has used coding agents for real work has hit the same&lt;br&gt;
wall: the agent loses context, forgets what happened in another client, repeats&lt;br&gt;
itself, or makes a change that is hard to reconstruct later.&lt;/p&gt;

&lt;p&gt;But I think "memory" is the wrong primary frame.&lt;/p&gt;

&lt;p&gt;The more useful question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;After the run is over, can I answer what happened?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not just what the final answer was. What actually happened.&lt;/p&gt;

&lt;p&gt;What did the user ask?&lt;/p&gt;

&lt;p&gt;What files, tools, docs, and prior context were in play?&lt;/p&gt;

&lt;p&gt;Why did the agent call a tool?&lt;/p&gt;

&lt;p&gt;Which model produced that action?&lt;/p&gt;

&lt;p&gt;What changed?&lt;/p&gt;

&lt;p&gt;What did it cost?&lt;/p&gt;

&lt;p&gt;Can I replay, audit, or explain the chain?&lt;/p&gt;

&lt;p&gt;That is less like a second brain and more like a black box recorder.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pain is showing up everywhere
&lt;/h2&gt;

&lt;p&gt;The agent tooling conversations I keep seeing are not only about storage.&lt;br&gt;
They are about operational trust.&lt;/p&gt;

&lt;p&gt;One MCP discussion described the problem of context being trapped inside one&lt;br&gt;
client. You can brainstorm on mobile, continue in the web app, then open a&lt;br&gt;
coding agent locally and it has no idea what just happened.&lt;/p&gt;

&lt;p&gt;That is not just a memory problem. It is a continuity problem.&lt;/p&gt;

&lt;p&gt;Another thread proposed standard audit context for AI-initiated MCP tool calls:&lt;br&gt;
why the AI invoked a tool, and which model produced that invocation.&lt;/p&gt;

&lt;p&gt;That is not just a logging problem. It is an accountability problem.&lt;/p&gt;

&lt;p&gt;Other threads are circling server identity, tool provenance, permission specs,&lt;br&gt;
and tool bills of materials. People are asking questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who published this tool?&lt;/li&gt;
&lt;li&gt;Did its metadata change?&lt;/li&gt;
&lt;li&gt;What capabilities does it require?&lt;/li&gt;
&lt;li&gt;Why should an agent be allowed to call it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not just a security problem. It is a trust problem.&lt;/p&gt;

&lt;p&gt;Then there are the everyday developer headaches: unexpected token usage, credits&lt;br&gt;
attached to the wrong workspace, orphaned local subprocesses, tool calls that&lt;br&gt;
worked in one environment but not another.&lt;/p&gt;

&lt;p&gt;That is not just observability. It is run truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Memory" hides too much
&lt;/h2&gt;

&lt;p&gt;When we call all of this memory, we flatten several different needs into one&lt;br&gt;
word.&lt;/p&gt;

&lt;p&gt;Developers do need agents to remember useful context.&lt;/p&gt;

&lt;p&gt;But they also need agents to preserve the reasoning trail around important&lt;br&gt;
work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task intent&lt;/li&gt;
&lt;li&gt;active context&lt;/li&gt;
&lt;li&gt;files and tools touched&lt;/li&gt;
&lt;li&gt;model/tool calls&lt;/li&gt;
&lt;li&gt;permission and trust assumptions&lt;/li&gt;
&lt;li&gt;cost/token/process anomalies&lt;/li&gt;
&lt;li&gt;receipts for important actions&lt;/li&gt;
&lt;li&gt;a replayable or inspectable run history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not all the same feature.&lt;/p&gt;

&lt;p&gt;An agent can remember a fact and still be impossible to audit.&lt;/p&gt;

&lt;p&gt;An agent can summarize a conversation and still leave you unable to explain why&lt;br&gt;
it deleted a file, called a tool, burned tokens, or trusted a server.&lt;/p&gt;

&lt;h2&gt;
  
  
  The product shape I want
&lt;/h2&gt;

&lt;p&gt;The layer I want is local-first and boring in the best way.&lt;/p&gt;

&lt;p&gt;It sits under agent work and records enough truth that the user or another&lt;br&gt;
agent can come back later and ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What happened here?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And get a useful answer.&lt;/p&gt;

&lt;p&gt;Not a hallucinated summary. Not a vague activity feed. Not a giant dashboard&lt;br&gt;
about dashboards.&lt;/p&gt;

&lt;p&gt;A compact chain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user asked this.&lt;/li&gt;
&lt;li&gt;The agent saw this context.&lt;/li&gt;
&lt;li&gt;It chose these tools for these reasons.&lt;/li&gt;
&lt;li&gt;These tool calls happened.&lt;/li&gt;
&lt;li&gt;These files or external states changed.&lt;/li&gt;
&lt;li&gt;This was the cost/runtime footprint.&lt;/li&gt;
&lt;li&gt;These actions were approved, deferred, or blocked.&lt;/li&gt;
&lt;li&gt;This is what a future agent should trust or re-check.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That would make agents safer to use for real work.&lt;/p&gt;

&lt;p&gt;It would also make them easier to improve, because the failures would be&lt;br&gt;
visible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lost context&lt;/li&gt;
&lt;li&gt;stale assumptions&lt;/li&gt;
&lt;li&gt;wrong tool trust&lt;/li&gt;
&lt;li&gt;runaway cost&lt;/li&gt;
&lt;li&gt;missing approval&lt;/li&gt;
&lt;li&gt;environment drift&lt;/li&gt;
&lt;li&gt;actions with no durable deliverable&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The phrase I keep coming back to
&lt;/h2&gt;

&lt;p&gt;Agents do not only need memory.&lt;/p&gt;

&lt;p&gt;They need a local truth layer.&lt;/p&gt;

&lt;p&gt;Something closer to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;inspect, replay, and trust agent work across tools and clients.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the direction I am exploring with AMK.&lt;/p&gt;

&lt;p&gt;The goal is not another knowledge base. The goal is to make "what happened?"&lt;br&gt;
answerable after the run is over.&lt;/p&gt;

&lt;p&gt;Because once agents are doing real work, that question matters more than almost&lt;br&gt;
anything else.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>agents</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
