DEV Community

Why Regex Can’t Stop AI SQL Injection (And How AST Parsing Can)

Introduction

The rise of "Text-to-SQL" AI agents has created a massive security headache. Giving an LLM access to your database is incredibly powerful, but fundamentally unsafe. Prompt engineering is easily bypassed, and while read-only database credentials stop destructive writes, they do nothing to stop an AI from executing an unbounded SELECT query that brings your production database to its knees via Denial of Service (DoS).

The Regex Fallacy

To solve this, many teams implement middleware that uses regular expressions to scan incoming SQL for keywords like DELETE, DROP, or TRUNCATE.

This is a losing battle. SQL is a complex, declarative language. A clever attacker (or a hallucinating LLM) can bypass regex easily using nested Common Table Expressions (CTEs):

WITH recursive_trap AS (
    DELETE FROM users RETURNING *
) 
SELECT * FROM recursive_trap;
Enter fullscreen mode Exit fullscreen mode

A regex looking for ^DELETE completely misses this. Trying to write a regex to parse full SQL syntax is mathematically impossible.

The Solution: Abstract Syntax Trees (AST)

The only deterministic way to understand a query is to parse it exactly how the database parses it.

I built AgentIAM, an open-source Postgres proxy written in Go. Instead of reading strings, AgentIAM intercepts the Postgres wire protocol and uses pg_query_go (a wrapper around PostgreSQL's native C parser) to convert the incoming SQL into an AST.

By representing the query as a tree structure, we can use a recursive Visitor pattern to walk through every node. If we find a DeleteStmt node—no matter how deeply it is buried inside a subquery or CTE—we instantly reject the query.

Beyond Blocking: AST Rewriting

Blocking bad queries is only half the battle. To prevent DoS attacks from unbounded SELECT statements, AgentIAM goes a step further: it mutates the AST on the fly.

If the proxy traverses the tree and finds a SelectStmt that lacks a Limit node, it dynamically injects a LIMIT 100 node into the tree, deparses the AST back into a SQL string, and forwards it to the database. The AI agent gets its data, and your database survives.

The Challenges of Wire Proxies

Building this required deeply understanding the pgproto3 wire protocol. Intercepting the Parse, Bind, and Execute messages requires careful state management to ensure the client driver (like psycopg2) doesn't desync and crash.

If you're building AI data integrations, stop relying on regex. Check out the open-source implementation of AgentIAM on GitHub and see how AST manipulation can secure your agents.

AgentIAM

A Postgres wire proxy that blocks SQL injection from AI agents at the AST level.

Connecting Large Language Models (LLMs) directly to your database for "Text-to-SQL" functionality is incredibly dangerous. AgentIAM sits between your LangChain/LlamaIndex agent and your database, intercepting Postgres wire protocol traffic to parse and block destructive queries before they can execute.

CI Go Report Card License: AGPL v3

AgentIAM Demo

🛑 The Problem

If you give an AI Agent a database connection, it will eventually try to delete data or overwhelm your database.

  • Prompt Injection: An attacker can easily trick the LLM into generating DELETE FROM users; instead of a harmless query.
  • Denial of Service (DoS): An LLM might accidentally run SELECT * FROM massive_table;, attempting to fetch millions of rows and crashing your database server.
  • Regex Evasion: Standard SQL firewalls that use regular expressions can be easily bypassed using nested Common Table Expressions (CTEs), subqueries, or obscure formatting.

Relying on "prompt engineering" or…

Top comments (0)