<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Armorer Labs</title>
    <description>The latest articles on DEV Community by Armorer Labs (@armorer_labs).</description>
    <link>https://dev.to/armorer_labs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3926042%2F65e84c1c-3670-4ad8-8b59-fafffc931bb4.png</url>
      <title>DEV Community: Armorer Labs</title>
      <link>https://dev.to/armorer_labs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/armorer_labs"/>
    <language>en</language>
    <item>
      <title>Why block counts are not enough for agent safety</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:36:42 +0000</pubDate>
      <link>https://dev.to/armorer_labs/why-block-counts-are-not-enough-for-agent-safety-590l</link>
      <guid>https://dev.to/armorer_labs/why-block-counts-are-not-enough-for-agent-safety-590l</guid>
      <description>&lt;p&gt;A block count is not an audit record.&lt;/p&gt;

&lt;p&gt;If an agent guard says it blocked 200 actions, I still need to know whether those blocks were correct.&lt;/p&gt;

&lt;p&gt;Were they real risks?&lt;/p&gt;

&lt;p&gt;Were they false positives?&lt;/p&gt;

&lt;p&gt;Did the policy match the intended scope?&lt;/p&gt;

&lt;p&gt;Did the guard normalize the action correctly?&lt;/p&gt;

&lt;p&gt;Could a human reviewer reproduce the decision later?&lt;/p&gt;

&lt;p&gt;For agent safety, I care less about the headline count and more about the decision record behind each allowed or blocked action.&lt;/p&gt;

&lt;p&gt;A useful receipt should include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;requested action&lt;/li&gt;
&lt;li&gt;tool or capability&lt;/li&gt;
&lt;li&gt;actor / session / run id&lt;/li&gt;
&lt;li&gt;normalized params or params hash&lt;/li&gt;
&lt;li&gt;policy or rule version&lt;/li&gt;
&lt;li&gt;decision&lt;/li&gt;
&lt;li&gt;reason code&lt;/li&gt;
&lt;li&gt;evidence or replay pointer&lt;/li&gt;
&lt;li&gt;result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the thinking behind Armorer Guard.&lt;/p&gt;

&lt;p&gt;Repo:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And it pairs with Armorer, the local control plane around agent setup, jobs, logs, approvals, and recovery:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The goal is not to make agents timid. The goal is to make agent decisions inspectable enough that teams can actually trust, debug, and improve them.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>monitoring</category>
      <category>security</category>
    </item>
    <item>
      <title>The boring checklist before running a new local agent</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:34:58 +0000</pubDate>
      <link>https://dev.to/armorer_labs/the-boring-checklist-before-running-a-new-local-agent-1cn1</link>
      <guid>https://dev.to/armorer_labs/the-boring-checklist-before-running-a-new-local-agent-1cn1</guid>
      <description>&lt;p&gt;Before I run a new local agent, I want a boring checklist.&lt;/p&gt;

&lt;p&gt;Not hype. Not a demo video. Just operational basics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What will it install?&lt;/li&gt;
&lt;li&gt;Where will it store state?&lt;/li&gt;
&lt;li&gt;What provider credentials does it need?&lt;/li&gt;
&lt;li&gt;Which folders can it read or write?&lt;/li&gt;
&lt;li&gt;Which tools or MCP servers can it call?&lt;/li&gt;
&lt;li&gt;Does it run in a container or directly on the host?&lt;/li&gt;
&lt;li&gt;Where are the logs?&lt;/li&gt;
&lt;li&gt;How do I stop it?&lt;/li&gt;
&lt;li&gt;How do I resume a failed run?&lt;/li&gt;
&lt;li&gt;How do I remove it cleanly?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the layer I think local agents are missing.&lt;/p&gt;

&lt;p&gt;Frameworks help you build agents. Model providers help you run inference. MCP helps tools plug in.&lt;/p&gt;

&lt;p&gt;But operators still need a local control plane for setup, jobs, logs, approvals, and recovery.&lt;/p&gt;

&lt;p&gt;That is what Armorer is trying to become:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And for consequential actions, Armorer Guard is the companion layer for decision receipts:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you run local agents, I would love feedback on what belongs in this checklist.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>devops</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Coding agents need branch policy at runtime</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:34:33 +0000</pubDate>
      <link>https://dev.to/armorer_labs/coding-agents-need-branch-policy-at-runtime-4gi7</link>
      <guid>https://dev.to/armorer_labs/coding-agents-need-branch-policy-at-runtime-4gi7</guid>
      <description>&lt;p&gt;Telling a coding agent "do not push to main" is useful.&lt;/p&gt;

&lt;p&gt;It is not enough.&lt;/p&gt;

&lt;p&gt;Branch policy has to be a runtime boundary.&lt;/p&gt;

&lt;p&gt;For agent-driven coding workflows, I want the runner to know and record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;current branch&lt;/li&gt;
&lt;li&gt;protected branches&lt;/li&gt;
&lt;li&gt;allowed git commands&lt;/li&gt;
&lt;li&gt;whether commits are allowed&lt;/li&gt;
&lt;li&gt;whether push is allowed&lt;/li&gt;
&lt;li&gt;whether a human approved the action&lt;/li&gt;
&lt;li&gt;diff scope&lt;/li&gt;
&lt;li&gt;files touched&lt;/li&gt;
&lt;li&gt;commit hash&lt;/li&gt;
&lt;li&gt;rollback path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If an agent violates policy, the interesting question is not only "what did the instructions say?"&lt;/p&gt;

&lt;p&gt;It is: which runtime boundary allowed the action?&lt;/p&gt;

&lt;p&gt;This is the type of operating surface we want in Armorer: agents as supervised jobs with visible state and controls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And for higher-risk actions, Armorer Guard should leave a compact decision receipt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instructions are documentation. Runtime boundaries are control.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>devops</category>
      <category>git</category>
    </item>
    <item>
      <title>Agent evals should explain why they passed</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:34:21 +0000</pubDate>
      <link>https://dev.to/armorer_labs/agent-evals-should-explain-why-they-passed-g8k</link>
      <guid>https://dev.to/armorer_labs/agent-evals-should-explain-why-they-passed-g8k</guid>
      <description>&lt;p&gt;A passing agent eval is not always reassuring.&lt;/p&gt;

&lt;p&gt;Sometimes it means the agent behaved correctly.&lt;/p&gt;

&lt;p&gt;Sometimes it means the eval got too narrow, the fixture got stale, or the evaluator rewarded the wrong behavior.&lt;/p&gt;

&lt;p&gt;A passing eval should leave evidence.&lt;/p&gt;

&lt;p&gt;For agent systems, I want each eval run to record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model and provider&lt;/li&gt;
&lt;li&gt;prompt/skill version&lt;/li&gt;
&lt;li&gt;tool surface&lt;/li&gt;
&lt;li&gt;fixture state&lt;/li&gt;
&lt;li&gt;expected behavior&lt;/li&gt;
&lt;li&gt;actual behavior&lt;/li&gt;
&lt;li&gt;evidence path&lt;/li&gt;
&lt;li&gt;cost and latency&lt;/li&gt;
&lt;li&gt;evaluator decision&lt;/li&gt;
&lt;li&gt;reason code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reason code matters because "passed" is not a diagnosis. It is a label.&lt;/p&gt;

&lt;p&gt;This is one of the ideas behind Armorer Guard: agent gates and evaluators should create decision receipts that can be inspected later.&lt;/p&gt;

&lt;p&gt;Repo:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And Armorer is the local layer where those agent runs can be installed, observed, stopped, repaired, and replayed:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Green dashboards are nice. Replayable receipts are better.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>testing</category>
    </item>
    <item>
      <title>Local AI agents should be easier to uninstall</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:33:45 +0000</pubDate>
      <link>https://dev.to/armorer_labs/local-ai-agents-should-be-easier-to-uninstall-310m</link>
      <guid>https://dev.to/armorer_labs/local-ai-agents-should-be-easier-to-uninstall-310m</guid>
      <description>&lt;p&gt;One underrated test for local AI-agent tooling: can you uninstall it cleanly?&lt;/p&gt;

&lt;p&gt;A lot of local agent setups sprawl across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;env files&lt;/li&gt;
&lt;li&gt;provider config&lt;/li&gt;
&lt;li&gt;Docker containers&lt;/li&gt;
&lt;li&gt;local databases&lt;/li&gt;
&lt;li&gt;MCP server config&lt;/li&gt;
&lt;li&gt;project folders&lt;/li&gt;
&lt;li&gt;generated logs&lt;/li&gt;
&lt;li&gt;background jobs&lt;/li&gt;
&lt;li&gt;secrets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I cannot answer what was installed, I probably cannot confidently remove it.&lt;/p&gt;

&lt;p&gt;That is why uninstall is part of trust.&lt;/p&gt;

&lt;p&gt;A local agent control plane should know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what it installed&lt;/li&gt;
&lt;li&gt;what config it created&lt;/li&gt;
&lt;li&gt;what jobs it started&lt;/li&gt;
&lt;li&gt;what containers or processes belong to it&lt;/li&gt;
&lt;li&gt;where logs live&lt;/li&gt;
&lt;li&gt;which secrets/config keys are referenced&lt;/li&gt;
&lt;li&gt;what can be safely removed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one of the boring but important reasons we are building Armorer.&lt;/p&gt;

&lt;p&gt;Repo:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pitch is not magic autonomy. It is local control: install, configure, run, observe, stop, repair, and eventually remove agents without guessing what state is left behind.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>tooling</category>
    </item>
    <item>
      <title>MCP tools need runtime records, not just manifests</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:33:23 +0000</pubDate>
      <link>https://dev.to/armorer_labs/mcp-tools-need-runtime-records-not-just-manifests-19d5</link>
      <guid>https://dev.to/armorer_labs/mcp-tools-need-runtime-records-not-just-manifests-19d5</guid>
      <description>&lt;p&gt;MCP makes tool wiring much cleaner.&lt;/p&gt;

&lt;p&gt;But a manifest is not the same as a runtime record.&lt;/p&gt;

&lt;p&gt;A manifest tells you what tools might exist. A runtime record tells you what the agent actually saw and did.&lt;/p&gt;

&lt;p&gt;For each agent run, I want to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which MCP servers were connected&lt;/li&gt;
&lt;li&gt;which tool schemas/descriptions were exposed&lt;/li&gt;
&lt;li&gt;which tool versions were active&lt;/li&gt;
&lt;li&gt;which calls were made&lt;/li&gt;
&lt;li&gt;which params were passed&lt;/li&gt;
&lt;li&gt;what state changed&lt;/li&gt;
&lt;li&gt;which calls required approval&lt;/li&gt;
&lt;li&gt;what result came back&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because the operational question is rarely only "is this MCP server installed?"&lt;/p&gt;

&lt;p&gt;The better question is: during this specific run, what capability surface did the agent have, and what did it do with it?&lt;/p&gt;

&lt;p&gt;That is one reason we are building Armorer as a local control plane around agents:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And Armorer Guard as a decision-record layer for consequential actions:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MCP gives agents hands. The operations layer needs to give humans a ledger.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>mcp</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Five receipts every AI agent run should leave behind</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:33:11 +0000</pubDate>
      <link>https://dev.to/armorer_labs/five-receipts-every-ai-agent-run-should-leave-behind-3aac</link>
      <guid>https://dev.to/armorer_labs/five-receipts-every-ai-agent-run-should-leave-behind-3aac</guid>
      <description>&lt;p&gt;When an AI agent finishes a task, I do not only want a final answer.&lt;/p&gt;

&lt;p&gt;I want an operating record.&lt;/p&gt;

&lt;p&gt;Here are the five receipts I want from every run.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Setup receipt
&lt;/h2&gt;

&lt;p&gt;What agent ran? Which model/provider did it use? Which project, environment, and config were loaded? Which MCP servers or tools were available?&lt;/p&gt;

&lt;p&gt;Without this, a successful run is hard to reproduce and a failed run is hard to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Tool receipt
&lt;/h2&gt;

&lt;p&gt;Every consequential tool call should have a compact record: tool name, normalized params or hash, result, latency, error state, and whether the call changed anything.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Approval receipt
&lt;/h2&gt;

&lt;p&gt;If a human approved something, record what they approved. Not just "approved" in a transcript, but capability, scope, policy, timestamp, and run id.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Evidence receipt
&lt;/h2&gt;

&lt;p&gt;If the agent made a claim or decision, what evidence did it use? File path, command output, API response, test result, or artifact.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Recovery receipt
&lt;/h2&gt;

&lt;p&gt;If the run failed, what can be retried? What state changed? What should be rolled back or resumed?&lt;/p&gt;

&lt;p&gt;This is the shape we are building toward with Armorer and Armorer Guard.&lt;/p&gt;

&lt;p&gt;Armorer is the local control plane for running and supervising agents:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Armorer Guard is the runtime decision/receipt layer:&lt;br&gt;
&lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If this is a problem you feel in your agent workflows, feedback or stars on the repos would help a lot.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Local coding agents need a control plane</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:28:48 +0000</pubDate>
      <link>https://dev.to/armorer_labs/local-coding-agents-need-a-control-plane-538d</link>
      <guid>https://dev.to/armorer_labs/local-coding-agents-need-a-control-plane-538d</guid>
      <description>&lt;p&gt;Local coding agents are getting good enough that the bottleneck is no longer always the model.&lt;/p&gt;

&lt;p&gt;The bottleneck is the boring operating surface around the agent.&lt;/p&gt;

&lt;p&gt;When I run local agents, I want answers to simple questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what agents are installed?&lt;/li&gt;
&lt;li&gt;which provider and model is each one configured to use?&lt;/li&gt;
&lt;li&gt;what jobs are currently running?&lt;/li&gt;
&lt;li&gt;where are the logs?&lt;/li&gt;
&lt;li&gt;what tools can this agent call?&lt;/li&gt;
&lt;li&gt;what failed last time?&lt;/li&gt;
&lt;li&gt;how do I stop, resume, or repair a run?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most agent frameworks focus on building the agent. That is useful. But once you have more than one workflow, you start needing the same things every other local service needs: state, visibility, controls, and recovery.&lt;/p&gt;

&lt;p&gt;That is the reason we are building Armorer.&lt;/p&gt;

&lt;p&gt;Armorer is an experimental local control plane for AI agents. The goal is not to replace Claude Code, Codex, local LLM workflows, MCP servers, or whatever agent stack you already like. The goal is to give those workflows a local operations layer.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The mental model is simple: agents should feel less like loose scripts and more like supervised jobs.&lt;/p&gt;

&lt;p&gt;A good local agent setup should make it easy to see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;installed agents&lt;/li&gt;
&lt;li&gt;running jobs&lt;/li&gt;
&lt;li&gt;configuration state&lt;/li&gt;
&lt;li&gt;provider setup&lt;/li&gt;
&lt;li&gt;logs and recent output&lt;/li&gt;
&lt;li&gt;approvals and action history&lt;/li&gt;
&lt;li&gt;recovery paths after failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are experimenting with local or self-hosted agents, I would love feedback. Stars help, but the more useful thing is hearing where your current agent setup still feels messy.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>monitoring</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Agent guards need receipts, not just block counts</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:28:22 +0000</pubDate>
      <link>https://dev.to/armorer_labs/agent-guards-need-receipts-not-just-block-counts-15kf</link>
      <guid>https://dev.to/armorer_labs/agent-guards-need-receipts-not-just-block-counts-15kf</guid>
      <description>&lt;p&gt;A lot of AI-agent safety tooling is framed around blocking bad actions.&lt;/p&gt;

&lt;p&gt;Blocking matters, but it is not enough.&lt;/p&gt;

&lt;p&gt;If a guard blocks 100 actions, I still want to know whether those decisions were correct. If it allows one action, I want to know why that action was allowed. If something goes wrong later, I want a record that can be inspected without replaying a whole chat transcript.&lt;/p&gt;

&lt;p&gt;That is the idea behind Armorer Guard.&lt;/p&gt;

&lt;p&gt;Armorer Guard is about runtime decision records for agent actions: compact receipts that explain what was requested, what policy or rule evaluated it, what evidence was used, what decision was made, and what changed afterward.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The shape I keep coming back to is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;requested action&lt;/li&gt;
&lt;li&gt;actor / session / run id&lt;/li&gt;
&lt;li&gt;tool or capability&lt;/li&gt;
&lt;li&gt;normalized params or params hash&lt;/li&gt;
&lt;li&gt;policy or gate version&lt;/li&gt;
&lt;li&gt;decision&lt;/li&gt;
&lt;li&gt;reason code&lt;/li&gt;
&lt;li&gt;result&lt;/li&gt;
&lt;li&gt;evidence bundle or replay pointer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point is not to make agents less useful. The point is to make consequential agent behavior inspectable.&lt;/p&gt;

&lt;p&gt;A model transcript is good debugging context, but it should not be the only audit record. The action boundary should leave boring, structured evidence.&lt;/p&gt;

&lt;p&gt;If you are building agent gateways, MCP tooling, coding-agent guardrails, eval harnesses, or approval systems, I would love feedback on the repo. A star is appreciated, but a sharp issue or critique is even better.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>monitoring</category>
      <category>security</category>
    </item>
    <item>
      <title>Agent demos are easy. Agent operations need receipts.</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sun, 21 Jun 2026 19:20:00 +0000</pubDate>
      <link>https://dev.to/armorer_labs/agent-demos-are-easy-agent-operations-need-receipts-2315</link>
      <guid>https://dev.to/armorer_labs/agent-demos-are-easy-agent-operations-need-receipts-2315</guid>
      <description>&lt;p&gt;I keep seeing the same pattern with AI agents: the demo works, the first workflow is exciting, and then the boring operational questions show up.&lt;/p&gt;

&lt;p&gt;What is installed?&lt;/p&gt;

&lt;p&gt;Which model/provider/config is this run using?&lt;/p&gt;

&lt;p&gt;What tool calls happened?&lt;/p&gt;

&lt;p&gt;Which actions needed approval?&lt;/p&gt;

&lt;p&gt;Can I replay the failure, resume the run, or prove what changed?&lt;/p&gt;

&lt;p&gt;That gap is what we are building around at Armorer Labs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Armorer
&lt;/h2&gt;

&lt;p&gt;Armorer is a local control plane for AI agents. The goal is to make local and self-hosted agent workflows feel less like scattered scripts and more like supervised jobs: installable, configurable, observable, stoppable, and recoverable.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The framing I like is: not another agent framework, but the local operations layer around the agents you already want to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Armorer Guard
&lt;/h2&gt;

&lt;p&gt;Armorer Guard is the companion idea for runtime decisions. If an agent, workflow, MCP server, or tool gateway makes a decision, I want a structured receipt for it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what was requested&lt;/li&gt;
&lt;li&gt;what policy/rule/gate evaluated it&lt;/li&gt;
&lt;li&gt;what evidence was used&lt;/li&gt;
&lt;li&gt;what decision was made&lt;/li&gt;
&lt;li&gt;what changed afterward&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;I do not think production agent systems will be trusted because the model sounds confident. They will be trusted because they leave boring, inspectable records.&lt;/p&gt;

&lt;p&gt;If you are building or running agents locally, I would love feedback on both repos. Stars are obviously helpful, but the more useful thing is sharp criticism: what would make these tools worth installing in your own workflow?&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Agent browser runs need receipts, not just screenshots</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sat, 20 Jun 2026 22:29:13 +0000</pubDate>
      <link>https://dev.to/armorer_labs/agent-browser-runs-need-receipts-not-just-screenshots-2j23</link>
      <guid>https://dev.to/armorer_labs/agent-browser-runs-need-receipts-not-just-screenshots-2j23</guid>
      <description>&lt;p&gt;Agentic browser work is getting good enough that teams are starting to trust it for real workflows: researching, filling forms, testing dashboards, and operating internal tools.\n\nBut once an agent can browse and click, the hard question changes from can it do the task to can we prove what happened?\n\nAt Armorer Labs, we keep coming back to a simple pattern: separate the agent planning loop from the control plane that owns tool permissions, policy, human approvals, and run receipts.\n\nThat separation matters most around side effects. A browser agent should be able to propose a click, a post, or a submit action, but the runtime should record the requested action, the page context, the approval scope, and the final verification evidence. Screenshots are helpful, but they are not enough by themselves. Teams need a structured trail that says what was attempted, what was approved, what was actually clicked, and what URL or artifact proves completion.\n\nFor Armorer and Armorer Guard, the goal is to make that operational layer local-first: agents can run on your machine or server, while the guard/control plane keeps the audit trail and enforces policy before external side effects happen.\n\nThe interesting design question is where this boundary should live. MCP gateway? Browser tool wrapper? Agent runtime? A separate local control plane?\n\nMy current bias: put the enforcement as close to the tool call as possible, then store receipts somewhere durable enough that a human can inspect them later.\n\nCurious how others are handling this for production browser agents. Are you logging raw traces, structured receipts, human approvals, or all three?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Agent frameworks create workflows. Production needs run receipts.</title>
      <dc:creator>Armorer Labs</dc:creator>
      <pubDate>Sat, 20 Jun 2026 21:32:14 +0000</pubDate>
      <link>https://dev.to/armorer_labs/agent-frameworks-create-workflows-production-needs-run-receipts-222g</link>
      <guid>https://dev.to/armorer_labs/agent-frameworks-create-workflows-production-needs-run-receipts-222g</guid>
      <description>&lt;p&gt;Everyone is comparing agent frameworks: LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude Code, Codex, MCP routers, custom harnesses.&lt;/p&gt;

&lt;p&gt;That comparison matters, but it misses the layer that starts hurting once the demo works.&lt;/p&gt;

&lt;p&gt;The framework creates the workflow. It does not automatically answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what is installed and running locally?&lt;/li&gt;
&lt;li&gt;which tools, MCP servers, skills, and providers are mounted?&lt;/li&gt;
&lt;li&gt;what repo, files, or workspace state were in scope?&lt;/li&gt;
&lt;li&gt;what did the agent change?&lt;/li&gt;
&lt;li&gt;which actions created side effects?&lt;/li&gt;
&lt;li&gt;which actions required approval, warning, redaction, block, or review?&lt;/li&gt;
&lt;li&gt;what evidence came from tests, evals, traces, or browser checks?&lt;/li&gt;
&lt;li&gt;what can be retried, resumed, rolled back, or cleaned up safely?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the layer we are building Armorer for: a local control plane around agents.&lt;/p&gt;

&lt;p&gt;The split we are converging on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Armorer: sessions, jobs, tool inventory, config, approvals, run records, and recovery&lt;/li&gt;
&lt;li&gt;Armorer Guard: fast runtime decisions on proposed tool calls and model/tool-output transitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to replace agent frameworks. It is to make agents operable once they exist.&lt;/p&gt;

&lt;p&gt;The artifact I keep coming back to is a run receipt.&lt;/p&gt;

&lt;p&gt;A useful agent run receipt should capture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the agent/app, version, and config&lt;/li&gt;
&lt;li&gt;the mounted tools, MCP servers, skills, and providers&lt;/li&gt;
&lt;li&gt;the workspace/repo/files in scope&lt;/li&gt;
&lt;li&gt;checkpoints before and after the run&lt;/li&gt;
&lt;li&gt;tool calls and side effects&lt;/li&gt;
&lt;li&gt;approval and review decisions&lt;/li&gt;
&lt;li&gt;test/eval/check evidence&lt;/li&gt;
&lt;li&gt;retry, resume, rollback, and cleanup state&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without this, debugging agent runs turns into transcript archaeology.&lt;/p&gt;

&lt;p&gt;With it, operating agents starts to feel more like operating software again.&lt;/p&gt;

&lt;p&gt;Repos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Armorer: &lt;a href="https://github.com/ArmorerLabs/Armorer" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Armorer Guard: &lt;a href="https://github.com/ArmorerLabs/Armorer-Guard" rel="noopener noreferrer"&gt;https://github.com/ArmorerLabs/Armorer-Guard&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Questions I would love feedback on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is the minimum useful run receipt for an agent session?&lt;/li&gt;
&lt;li&gt;Which approval events should become first-class history?&lt;/li&gt;
&lt;li&gt;Where should MCP/tool metadata stop and runtime policy begin?&lt;/li&gt;
&lt;li&gt;What recovery action do you wish your agent harness exposed after a bad run?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Disclosure: I am building Armorer and Armorer Guard.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
