DEV Community

Armorer Labs
Armorer Labs

Posted on

Agent traces are not enough. Agent runs need operating records.

Most production-agent discussions eventually land on observability.

That is good. Traces matter.

But I think traces are only one slice of what teams actually need once agents start touching tools, files, tickets, browsers, MCP servers, credentials, or customer-facing systems.

A trace answers: what happened inside this run?

An operating record answers a wider set of questions:

  • Which agent was installed and running?
  • Which model/provider/config was active?
  • Which MCP servers and tools were visible to the agent?
  • Which permissions were granted for this run?
  • Which actions required approval?
  • What did the agent actually change?
  • What failed, retried, or timed out?
  • Can another person replay the decision later?
  • Can I stop, recover, or uninstall the system cleanly?

That second set of questions is the part I keep seeing teams rebuild ad hoc.

The shift from prompt debugging to run operations

When an agent is just a demo, the prompt feels like the center of the system.

When an agent is running every day, the center shifts to operations:

  • setup state
  • tool exposure
  • run boundaries
  • approval policy
  • event history
  • rollback path
  • cost and latency drift
  • evidence for what happened

This is especially true with MCP. A manifest can tell you which tools exist. It does not, by itself, tell you which tools were exposed to a specific agent run, which arguments were passed, what side effects happened, and why a guard allowed or blocked the action.

What I want from agent infrastructure

For local and self-hosted agents, I want a boring control surface:

  1. Install the agent.
  2. Configure the provider and runtime.
  3. See which tools and permissions exist.
  4. Start and stop runs.
  5. Inspect the job state.
  6. Require approvals for risky actions.
  7. Keep receipts for what happened.
  8. Uninstall cleanly.

That sounds less exciting than a new agent demo, but it is the layer that makes repeated use feel sane.

Where Armorer fits

This is what we are building Armorer around: a local control plane for AI agents.

Armorer is not meant to be another agent framework. The goal is to sit around agents and make the operational state visible: installed agents, setup, running jobs, local configuration, approvals, audit trails, and recovery.

Repo: https://github.com/ArmorerLabs/Armorer

Where Armorer Guard fits

Armorer Guard is the companion piece: a local Rust guard layer for agent inputs and tool-call risk.

The key idea is that a guard decision should not just be a yes/no result or a block count. It should leave a small record that someone can inspect later: what was evaluated, what policy applied, why the decision happened, and what the runtime did with it.

Repo: https://github.com/ArmorerLabs/Armorer-Guard

The question I am trying to answer

What is the minimum useful operating record for an AI-agent run?

My current answer is:

  • run identity
  • agent/runtime version
  • effective tool/capability set
  • inputs and relevant context
  • policy/guard decisions
  • approvals
  • side effects
  • recovery/stop state
  • evidence links

Curious how other people are modeling this. If you are running agents in production or locally with MCP-heavy workflows, what fields do you wish every run left behind?

Top comments (0)