We just shipped the Armorer Guard Learning Loop: a Rust-native feedback layer for local AI-agent security enforcement.
The short version:
Armorer Guard supports hybrid live learning: feedback adapts local enforcement immediately, while global model improvements go through reviewed, versioned retraining. No scanner network calls. No silent cloud upload. No poisoning-by-default.
Armorer Guard is a local-first Rust scanner for AI-agent boundaries: prompts, retrieved content, model output, tool-call arguments, logs, memory writes, and outbound messages. It detects prompt injection, data exfiltration, sensitive data requests, safety bypasses, destructive commands, system prompt extraction, and credentials.
The new loop adds three CLI modes:
armorer-guard feedback-record
armorer-guard feedback-stats
armorer-guard feedback-export --reviewed-only
inspect and inspect-json now include:
{
"scan_id": "sha256:...",
"model_version": "word-sgd-native-v1",
"learning_version": "local-learning-v1"
}
Why this design?
A lot of "self-learning" security systems quietly drift. That is scary in an agent runtime because a malicious or noisy feedback stream can teach the guard to allow exactly the thing it should block.
So Armorer Guard splits learning into two lanes:
-
Local learning overlay: immediate deployment-specific allow/block/review corrections, stored locally under
~/.armorer-guard/feedbackorARMORER_GUARD_HOME. -
Global model training: reviewed, deduped, provenance-checked, versioned retraining. Unreviewed feedback defaults to
can_train=false.
A local allow exemplar can suppress eligible semantic false positives, but it cannot suppress:
detected:credential
policy:credential_disclosure
policy:dangerous_tool_call
That gives a practical demo story:
- Paste a benign security runbook that gets flagged.
- Record
false_positivefeedback with desired actionallow. - Re-run the scan.
- Guard returns
learning:local_allow_matchand suppresses the noisy semantic flag. - Try the same thing with a credential or dangerous tool call; those still stay protected.
Repo: https://github.com/ArmorerLabs/Armorer-Guard
Demo: https://huggingface.co/spaces/armorer-labs/armorer-guard-demo
Model artifact: https://huggingface.co/armorer-labs/armorer-guard-semantic-classifier
I would love feedback from people building agent runtimes, eval harnesses, or security gates: where would you put this check in your stack: prompt ingress, retrieval ingress, model output, tool-call args, or all of them?
Top comments (0)