DEV Community

Cover image for What every `?` in your SQL is hiding
Eitamos Ring
Eitamos Ring

Posted on

What every `?` in your SQL is hiding

Take a query that comes out of pg_stat_statements:

SELECT date_trunc(?, o.created_at) AS week,
       count(*) AS total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
WHERE o.created_at >= ?
  AND o.amount > ?
  AND c.plan = ?
GROUP BY ?
ORDER BY 2 DESC
LIMIT ?
Enter fullscreen mode Exit fullscreen mode

Six question marks. Each one means something completely different.

The first, inside date_trunc, expects a string like 'week' — it's telling the function which time bucket to use. The second is a timestamp comparing against created_at. The third is a number comparing against amount. The fourth is a string joined through to the customers table — it has to match a plan value over there. The fifth, sitting bare inside GROUP BY, is a positional integer like 1, pointing back at the first column in the SELECT list. It's not a value, it's an index. The sixth, after LIMIT, is a page-size integer.

Six placeholders, four different value types, two completely different kinds of integer. There isn't a regex that gets all six right — not without re-implementing a SQL parser inside it.

postgresparser is the open-source Go/ANTLR PostgreSQL parser we maintain at ValkDB. Until this release, when you got back an AST, every ? was just a leaf node with positional information and nothing else. The parser knew exactly what each ? meant — it had to, in order to parse — but it never told you. So everyone downstream fell back to regular expressions, string scanning, and increasingly elaborate guesswork.

This week's release tells you what every ? actually is.


The new API

result, _ := analysis.AnalyzeSQL(querySQL)

for _, p := range result.Placeholders {
    fmt.Printf("placeholder %d: role=%s column=%s\n",
        p.Index, p.Role, p.ColumnRef)
}
Enter fullscreen mode Exit fullscreen mode
placeholder 1: role=function_arg     column=         (date_trunc, arg 0)
placeholder 2: role=where_value      column=created_at
placeholder 3: role=where_value      column=amount
placeholder 4: role=where_value      column=plan
placeholder 5: role=group_by_ordinal column=
placeholder 6: role=limit            column=
Enter fullscreen mode Exit fullscreen mode

Six placeholders, six correct classifications, no string scanning. Switch on the role, fill in the right value.


How the old way failed

Without role information, this is the pipeline most tools end up with:

┌─────────────────────┐
│  Normalized SQL     │
│  with ? placeholders│
└──────────┬──────────┘
           │
           ▼
┌─────────────────────────────────┐
│   Regex sweep for "?"           │
│                                 │
│   finds ? in string literals    │
│   finds ? in comments           │
│   can't see GROUP BY context    │
│   mis-IDs JSONB ? operator      │
│   picks same value twice for    │
│   same column on >= and <       │
└──────────┬──────────────────────┘
           │
           ▼
┌─────────────────────┐
│  Hand-written       │
│  per-position guess │
│  (fragile)          │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Substituted SQL    │
│  often broken       │
└─────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The role-aware version skips all of that by walking the parse tree the parser already built. String literals are leaves of their own kind, so question marks inside them are never seen as placeholders. Comments are stripped before tree construction. The JSONB operator is parsed as an operator node, not a placeholder leaf, so it never enters the placeholder list. GROUP BY and ORDER BY ordinals carry their own dedicated role. And every placeholder's syntactic role — its actual position in the grammar — comes back attached.


The five footguns this release closes

1. The JSONB ? operator is not a placeholder

PostgreSQL has three jsonb operators that look like placeholder tokens:

WHERE data ? 'key'                  -- "does jsonb contain top-level key?"
WHERE data ?| array['a','b']        -- "any of these keys?"
WHERE data ?& array['a','b']        -- "all of these keys?"
Enter fullscreen mode Exit fullscreen mode

A regex sweep can't tell these apart from real placeholders. The new placeholder list excludes JSONB operator tokens by construction.

2. INTERVAL ? actually parses

Before this release, INTERVAL ? was rejected with a syntax error — a real problem if you consume pg_stat_statements, because every query that uses an interval literal gets normalized to that form. The grammar now accepts a parameter token in interval-operand position.

3. ? inside string literals stays inside string literals

WHERE notes = 'has a ?'
WHERE notes = 'don''t mark me ?'
Enter fullscreen mode Exit fullscreen mode

The collector walks the parse tree, never the raw SQL — so string-literal ? and comment ? simply don't appear in the placeholder list.

4. GROUP BY ? is an ordinal, not a value

pg_stat_statements rewrites GROUP BY 1, 2 to GROUP BY ?, ?. These placeholders need to be substituted with positional integers referring to SELECT-list slots — not with arbitrary values. A dedicated role makes this explicit.

5. Function-argument placeholders need to know their function

SELECT date_trunc(?, created_at), extract(? FROM created_at) FROM t
Enter fullscreen mode Exit fullscreen mode

The first ? must be a string like 'week'. The second must be a string like 'year'. Both are function-args, but the function differs — so the right substitution differs. Each placeholder of this kind now carries its parent function name and argument index.


Who this is for

If you build an ORM or query builder and you've ever wanted to type-check a placeholder before binding to it, this is for you. If you build a SQL linter, a migration tool that rewrites queries between dialects, a monitoring agent that ingests pg_stat_statements, an AI-assisted SQL generator that emits parameterized queries — same. The common thread is that you have a normalized SQL string with ? placeholders in it, and you need to know what each one means before you can do anything useful.

If that sounds like work you've done, you've probably written a private placeholder classifier already. With this release, you don't have to.


Closing

The parser tells you what the SQL says; type inference belongs a layer up. The API stays narrow on purpose — roles, positions, and the structural context needed to make sense of them. Function-wrapper exposure on column usage is next on the roadmap; lateral-join and recursive-CTE refinements after that.

The parser knew. Now it tells you.


postgresparser — open-source PostgreSQL parser. Go, ANTLR-based. Contributions welcome.

Top comments (0)