Most production-agent discussions eventually land on observability.
That is good. Traces matter.
But I think traces are only one slice of what teams actually need once agents start touching tools, files, tickets, browsers, MCP servers, credentials, or customer-facing systems.
A trace answers: what happened inside this run?
An operating record answers a wider set of questions:
- Which agent was installed and running?
- Which model/provider/config was active?
- Which MCP servers and tools were visible to the agent?
- Which permissions were granted for this run?
- Which actions required approval?
- What did the agent actually change?
- What failed, retried, or timed out?
- Can another person replay the decision later?
- Can I stop, recover, or uninstall the system cleanly?
That second set of questions is the part I keep seeing teams rebuild ad hoc.
The shift from prompt debugging to run operations
When an agent is just a demo, the prompt feels like the center of the system.
When an agent is running every day, the center shifts to operations:
- setup state
- tool exposure
- run boundaries
- approval policy
- event history
- rollback path
- cost and latency drift
- evidence for what happened
This is especially true with MCP. A manifest can tell you which tools exist. It does not, by itself, tell you which tools were exposed to a specific agent run, which arguments were passed, what side effects happened, and why a guard allowed or blocked the action.
What I want from agent infrastructure
For local and self-hosted agents, I want a boring control surface:
- Install the agent.
- Configure the provider and runtime.
- See which tools and permissions exist.
- Start and stop runs.
- Inspect the job state.
- Require approvals for risky actions.
- Keep receipts for what happened.
- Uninstall cleanly.
That sounds less exciting than a new agent demo, but it is the layer that makes repeated use feel sane.
Where Armorer fits
This is what we are building Armorer around: a local control plane for AI agents.
Armorer is not meant to be another agent framework. The goal is to sit around agents and make the operational state visible: installed agents, setup, running jobs, local configuration, approvals, audit trails, and recovery.
Repo: https://github.com/ArmorerLabs/Armorer
Where Armorer Guard fits
Armorer Guard is the companion piece: a local Rust guard layer for agent inputs and tool-call risk.
The key idea is that a guard decision should not just be a yes/no result or a block count. It should leave a small record that someone can inspect later: what was evaluated, what policy applied, why the decision happened, and what the runtime did with it.
Repo: https://github.com/ArmorerLabs/Armorer-Guard
The question I am trying to answer
What is the minimum useful operating record for an AI-agent run?
My current answer is:
- run identity
- agent/runtime version
- effective tool/capability set
- inputs and relevant context
- policy/guard decisions
- approvals
- side effects
- recovery/stop state
- evidence links
Curious how other people are modeling this. If you are running agents in production or locally with MCP-heavy workflows, what fields do you wish every run left behind?
Top comments (0)