Most AI-agent security advice collapses into one sentence: "add guardrails."
That is too vague to implement.
For agents with tools, the useful question is: where should the scanner sit?
Here is the practical map we use for Armorer Guard.
1. Before Tool Execution
This is the obvious boundary.
If an agent is about to call a shell, browser, database, email sender, payment API, or MCP tool, scan the concrete arguments before execution.
You are not asking whether the tool is generally safe. You are asking whether this invocation is safe.
Examples:
- shell command contains destructive flags
- browser navigation points to an attacker-controlled endpoint
- email body includes a secret
- MCP
tools/callarguments include prompt-injected instructions
2. After Tool Results, Before Model Context
This is the boundary teams miss.
Prompt injection often arrives through retrieved content: web pages, docs, tickets, emails, database rows, or MCP tool output.
If that result goes straight back into the model, the attacker is now part of the next prompt.
Scan tool results before they enter context.
3. Before Logs and Memory Writes
Agent traces are useful, but they also become a second leak path.
Scan before writing:
- run logs
- memory
- vector stores
- chat transcripts
- debugging artifacts
This is where credential redaction matters most.
4. Before External Sends
Some actions are irreversible.
The final send boundary deserves its own check:
- email send
- Slack/Discord post
- ticket update
- GitHub comment
- payment/refund
- deployment action
A plan can look safe until the last mile.
5. Feedback Loop
A scanner will have local false positives and false negatives.
The trick is to learn from feedback without silently mutating global model weights or uploading prompts to a cloud service.
Armorer Guard's Learning Loop does that locally:
armorer-guard feedback-record
armorer-guard feedback-export
armorer-guard feedback-stats
Local feedback can adapt local enforcement. Reviewed exports can later feed offline retraining.
Try It
The Rust CLI is on Cargo:
cargo install armorer-guard --locked
The browser demo is here:
https://huggingface.co/spaces/armorer-labs/armorer-guard-demo
Repo:
https://github.com/ArmorerLabs/Armorer-Guard
The short version: do not make guardrails a prompt. Put them at the runtime boundaries where data and actions cross trust zones.
Top comments (0)