DEV Community

Armorer Labs
Armorer Labs

Posted on

Where to plug security hooks into AI agents: tool calls, MCP results, logs, and sends

Most AI-agent security advice collapses into one sentence: "add guardrails."

That is too vague to implement.

For agents with tools, the useful question is: where should the scanner sit?

Here is the practical map we use for Armorer Guard.

1. Before Tool Execution

This is the obvious boundary.

If an agent is about to call a shell, browser, database, email sender, payment API, or MCP tool, scan the concrete arguments before execution.

You are not asking whether the tool is generally safe. You are asking whether this invocation is safe.

Examples:

  • shell command contains destructive flags
  • browser navigation points to an attacker-controlled endpoint
  • email body includes a secret
  • MCP tools/call arguments include prompt-injected instructions

2. After Tool Results, Before Model Context

This is the boundary teams miss.

Prompt injection often arrives through retrieved content: web pages, docs, tickets, emails, database rows, or MCP tool output.

If that result goes straight back into the model, the attacker is now part of the next prompt.

Scan tool results before they enter context.

3. Before Logs and Memory Writes

Agent traces are useful, but they also become a second leak path.

Scan before writing:

  • run logs
  • memory
  • vector stores
  • chat transcripts
  • debugging artifacts

This is where credential redaction matters most.

4. Before External Sends

Some actions are irreversible.

The final send boundary deserves its own check:

  • email send
  • Slack/Discord post
  • ticket update
  • GitHub comment
  • payment/refund
  • deployment action

A plan can look safe until the last mile.

5. Feedback Loop

A scanner will have local false positives and false negatives.

The trick is to learn from feedback without silently mutating global model weights or uploading prompts to a cloud service.

Armorer Guard's Learning Loop does that locally:

armorer-guard feedback-record
armorer-guard feedback-export
armorer-guard feedback-stats
Enter fullscreen mode Exit fullscreen mode

Local feedback can adapt local enforcement. Reviewed exports can later feed offline retraining.

Try It

The Rust CLI is on Cargo:

cargo install armorer-guard --locked
Enter fullscreen mode Exit fullscreen mode

The browser demo is here:

https://huggingface.co/spaces/armorer-labs/armorer-guard-demo

Repo:

https://github.com/ArmorerLabs/Armorer-Guard

The short version: do not make guardrails a prompt. Put them at the runtime boundaries where data and actions cross trust zones.

Top comments (0)