Armorer Labs

Posted on May 14

Where to plug security hooks into AI agents: tool calls, MCP results, logs, and sends

#security #ai #mcp #rust

Most AI-agent security advice collapses into one sentence: "add guardrails."

That is too vague to implement.

For agents with tools, the useful question is: where should the scanner sit?

Here is the practical map we use for Armorer Guard.

1. Before Tool Execution

This is the obvious boundary.

If an agent is about to call a shell, browser, database, email sender, payment API, or MCP tool, scan the concrete arguments before execution.

You are not asking whether the tool is generally safe. You are asking whether this invocation is safe.

Examples:

shell command contains destructive flags
browser navigation points to an attacker-controlled endpoint
email body includes a secret
MCP tools/call arguments include prompt-injected instructions

2. After Tool Results, Before Model Context

This is the boundary teams miss.

Prompt injection often arrives through retrieved content: web pages, docs, tickets, emails, database rows, or MCP tool output.

If that result goes straight back into the model, the attacker is now part of the next prompt.

Scan tool results before they enter context.

3. Before Logs and Memory Writes

Agent traces are useful, but they also become a second leak path.

Scan before writing:

run logs
memory
vector stores
chat transcripts
debugging artifacts

This is where credential redaction matters most.

4. Before External Sends

Some actions are irreversible.

The final send boundary deserves its own check:

email send
Slack/Discord post
ticket update
GitHub comment
payment/refund
deployment action

A plan can look safe until the last mile.

5. Feedback Loop

A scanner will have local false positives and false negatives.

The trick is to learn from feedback without silently mutating global model weights or uploading prompts to a cloud service.

Armorer Guard's Learning Loop does that locally:

armorer-guard feedback-record
armorer-guard feedback-export
armorer-guard feedback-stats

Local feedback can adapt local enforcement. Reviewed exports can later feed offline retraining.

Try It

The Rust CLI is on Cargo:

cargo install armorer-guard --locked

The browser demo is here:

https://huggingface.co/spaces/armorer-labs/armorer-guard-demo

Repo:

https://github.com/ArmorerLabs/Armorer-Guard

The short version: do not make guardrails a prompt. Put them at the runtime boundaries where data and actions cross trust zones.

DEV Community