TL;DR — We went from parsing CREATE TABLE as a bag of columns to extracting full relational metadata: primary keys, foreign keys, unique constraints, CHECK expressions, and referential actions. Here's how we did it in Go with ANTLR, and why it matters for anyone building developer tools on top of SQL.
The Problem Nobody Talks About
You've seen a hundred blog posts about writing SQL.
This one is about reading it programmatically.
We maintain valk-postgres-parser, an open-source Go library that parses PostgreSQL into a structured IR (intermediate representation). It powers linters, migration validators, and query analyzers that need to understand what SQL means, not just what it says.
For months, our CREATE TABLE extraction looked like this:
SQL in → columns out
That's it. Column names, types, nullability. No relationships. No constraints. If you wanted to know which column was the primary key or what foreign key pointed where — you were on your own.
CREATE TABLE orders (
id serial PRIMARY KEY,
email text UNIQUE NOT NULL,
user_id int REFERENCES users(id) ON DELETE CASCADE,
total numeric CONSTRAINT positive_total CHECK (total > 0)
);
Before: We'd hand you [id, email, user_id, total] with types. That's barely more useful than a regex.
After: You get the full picture:
{
"Constraints": {
"PrimaryKey": { "Columns": ["id"] },
"ForeignKeys": [{
"Columns": ["user_id"],
"ReferencesTable": "users",
"ReferencesColumns": ["id"],
"OnDelete": "CASCADE"
}],
"UniqueKeys": [{ "Columns": ["email"] }],
"CheckConstraints": [{
"ConstraintName": "positive_total",
"Expression": "(total > 0)"
}]
}
}
That's the difference between a tokenizer and a parser that understands your schema.
The Architecture: Two Layers, One Truth
Here's how data flows through the parser. Constraints live at both layers:
┌──────────────────────┐
│ Raw SQL │
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ ANTLR 4 Grammar │
│ (PostgreSQL Lexer + │
│ Parser) │
└──────────┬───────────┘
│ parse tree
┌──────────▼───────────┐
│ Core IR Layer │
│ │
│ DDLAction │
│ ├─ ColumnDetails[] │
│ └─ Constraints │
│ ├─ PrimaryKey │
│ ├─ ForeignKeys │
│ ├─ UniqueKeys │
│ └─ CheckConst. │
└──────────┬───────────┘
│ convert
┌──────────▼───────────┐
│ Analysis Layer │
│ │
│ SQLDDLAction │
│ ├─ ColumnDetails[] │
│ └─ Constraints │
│ ├─ PrimaryKey │
│ ├─ ForeignKeys │
│ ├─ UniqueKeys │
│ └─ CheckConst. │
└──────────────────────┘
Core IR = 1:1 with the parse tree. It's the what.
Analysis layer = the consumer-friendly view. It's the so what.
Both layers carry the same constraint shape, but the analysis layer normalizes and simplifies for downstream consumers. Pick your depth: low-level IR for tool authors, analysis structs for application developers.
How It Works: Constraint Extraction in 3 Steps
Step 1: Walk the ANTLR Tree
PostgreSQL's grammar defines constraints in two places:
CREATE TABLE orders (
id serial PRIMARY KEY, -- ← INLINE constraint
user_id int REFERENCES users(id), -- ← INLINE constraint
CONSTRAINT uq_email UNIQUE (email) -- ← TABLE-LEVEL constraint
);
Inline constraints are attached to a column definition node. Table-level constraints are siblings of the column list. We walk both.
Step 2: Build the IR Structs
Each constraint type maps to a dedicated Go struct:
type DDLPrimaryKey struct {
ConstraintName string
Columns []string
}
type DDLForeignKey struct {
ConstraintName string
Columns []string
ReferencesSchema string
ReferencesTable string
ReferencesColumns []string
OnDelete FKAction // CASCADE, SET NULL, RESTRICT, ...
OnUpdate FKAction
}
type DDLCheckConstraint struct {
ConstraintName string
Expression string // raw expression text: "(price > 0)"
}
These get bundled into DDLConstraints and attached to the DDLAction:
type DDLConstraints struct {
PrimaryKey *DDLPrimaryKey
ForeignKeys []DDLForeignKey
UniqueKeys []DDLUniqueConstraint
CheckConstraints []DDLCheckConstraint
}
Step 3: Merge Inline + Table-Level
This is the tricky part. A single table can define its PK inline on one column and also have a composite unique key at the table level. We use an internal tableConstraints struct with a merge() method to combine both sources without duplication:
inline constraints ──┐
├──▶ merge() ──▶ DDLConstraints
table-level constraints┘
ALTER TABLE: The Forgotten Half
Most SQL parsers stop at CREATE TABLE. But in the real world, constraints are added after table creation just as often:
ALTER TABLE public.products
ADD CONSTRAINT positive_price CHECK (price > 0);
ALTER TABLE orders
ADD CONSTRAINT fk_user FOREIGN KEY (user_id)
REFERENCES users(id) ON DELETE SET NULL;
Before our changes, ALTER TABLE ... ADD CONSTRAINT was silently dropped — the extraction guard only checked for column adds/drops. Now it emits a proper DDLAction with ADD_CONSTRAINT flag and fully populated Constraints.
Real-World Use Cases
Why does any of this matter? Here's what you can build with structured constraint metadata:
┌─────────────────┐ ┌──────────────────────┐
│ Migration Safety│ │ Schema Docs Gen │
│ │ │ │
│ "This migration │ │ Auto-generate ER │
│ drops a FK that│ │ diagrams from SQL │
│ 3 services │ │ migration files │
│ depend on" │ │ │
└────────┬────────┘ └──────────┬───────────┘
│ │
│ ┌────────────────┐ │
└────┤ postgresparser├───┘
│ Constraints │
┌────┤ IR ├───┐
│ └────────────────┘ │
┌────────▼─────────┐ ┌─────────▼─────────────┐
│ Query Linting │ │ Access Control │
│ │ │ │
│ "WARNING: INSERT │ │ "Column user_id has │
│ missing required│ │ FK to users — ensure │
│ FK column" │ │ caller has read on │
│ │ │ both tables" │
└──────────────────┘ └───────────────────────┘
Migration safety — Diff two versions of your schema SQL. If a FK is removed that other services reference, flag it before it hits production.
Schema documentation — Generate ER diagrams directly from .sql files. No database connection needed. Works in CI.
Query linting — Cross-reference INSERT/UPDATE column lists against known constraints. Catch missing NOT NULL or FK violations statically.
Access control analysis — FK relationships imply data access paths. Map them automatically for security reviews.
The Numbers
| Metric | Value |
|---|---|
| GitHub Stars | 200+ |
| Constraint types supported | PK, FK, UNIQUE, CHECK |
| FK referential actions | CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION |
| Works on | CREATE TABLE (inline + table-level) + ALTER TABLE ADD CONSTRAINT |
| Grammar changes needed | Zero. Pure tree-walking. |
That last point is worth emphasizing: we didn't touch the ANTLR grammar at all. The PostgreSQL grammar already parses constraints — we just weren't reading them. The entire feature was built by walking deeper into the existing parse tree.
Try It
import parser "github.com/ValkDB/postgresparser"
sql := `CREATE TABLE orders (
id serial PRIMARY KEY,
user_id int REFERENCES users(id) ON DELETE CASCADE,
total numeric CHECK (total > 0)
);`
result, _ := parser.Parse(sql)
constraints := result.DDLActions[0].Constraints
fmt.Println(constraints.PrimaryKey.Columns) // [id]
fmt.Println(constraints.ForeignKeys[0].OnDelete) // CASCADE
fmt.Println(constraints.CheckConstraints[0].Expression) // (total > 0)
Install:
go get github.com/ValkDB/postgresparser@latest
What's Next
We're not done. The constraint work opened doors for a brand new world!
Follow along, and feel free to request stuff
Built and maintained by the ValkDB team. Star the repo if this saved you from writing another regex.
Top comments (0)