Eitamos Ring

Posted on Mar 12

From Flat Columns to Full Relationships: How We Taught Our SQL Parser to Understand Table Constraints

#go #postgres #parsing #opensource

TL;DR — We went from parsing CREATE TABLE as a bag of columns to extracting full relational metadata: primary keys, foreign keys, unique constraints, CHECK expressions, and referential actions. Here's how we did it in Go with ANTLR, and why it matters for anyone building developer tools on top of SQL.

The Problem Nobody Talks About

You've seen a hundred blog posts about writing SQL.

This one is about reading it programmatically.

We maintain valk-postgres-parser, an open-source Go library that parses PostgreSQL into a structured IR (intermediate representation). It powers linters, migration validators, and query analyzers that need to understand what SQL means, not just what it says.

For months, our CREATE TABLE extraction looked like this:

SQL in → columns out

That's it. Column names, types, nullability. No relationships. No constraints. If you wanted to know which column was the primary key or what foreign key pointed where — you were on your own.

CREATE TABLE orders (
    id    serial PRIMARY KEY,
    email text   UNIQUE NOT NULL,
    user_id int  REFERENCES users(id) ON DELETE CASCADE,
    total numeric CONSTRAINT positive_total CHECK (total > 0)
);

Before: We'd hand you [id, email, user_id, total] with types. That's barely more useful than a regex.

After: You get the full picture:

{
  "Constraints": {
    "PrimaryKey": { "Columns": ["id"] },
    "ForeignKeys": [{
      "Columns": ["user_id"],
      "ReferencesTable": "users",
      "ReferencesColumns": ["id"],
      "OnDelete": "CASCADE"
    }],
    "UniqueKeys": [{ "Columns": ["email"] }],
    "CheckConstraints": [{
      "ConstraintName": "positive_total",
      "Expression": "(total > 0)"
    }]
  }
}

That's the difference between a tokenizer and a parser that understands your schema.

The Architecture: Two Layers, One Truth

Here's how data flows through the parser. Constraints live at both layers:

                    ┌──────────────────────┐
                    │      Raw SQL         │
                    └──────────┬───────────┘
                               │
                    ┌──────────▼───────────┐
                    │   ANTLR 4 Grammar    │
                    │  (PostgreSQL Lexer +  │
                    │       Parser)         │
                    └──────────┬───────────┘
                               │  parse tree
                    ┌──────────▼───────────┐
                    │   Core IR Layer       │
                    │                       │
                    │  DDLAction            │
                    │   ├─ ColumnDetails[]  │
                    │   └─ Constraints      │
                    │       ├─ PrimaryKey   │
                    │       ├─ ForeignKeys  │
                    │       ├─ UniqueKeys   │
                    │       └─ CheckConst.  │
                    └──────────┬───────────┘
                               │  convert
                    ┌──────────▼───────────┐
                    │  Analysis Layer       │
                    │                       │
                    │  SQLDDLAction         │
                    │   ├─ ColumnDetails[]  │
                    │   └─ Constraints      │
                    │       ├─ PrimaryKey   │
                    │       ├─ ForeignKeys  │
                    │       ├─ UniqueKeys   │
                    │       └─ CheckConst.  │
                    └──────────────────────┘

Core IR = 1:1 with the parse tree. It's the what.
Analysis layer = the consumer-friendly view. It's the so what.

Both layers carry the same constraint shape, but the analysis layer normalizes and simplifies for downstream consumers. Pick your depth: low-level IR for tool authors, analysis structs for application developers.

How It Works: Constraint Extraction in 3 Steps

Step 1: Walk the ANTLR Tree

PostgreSQL's grammar defines constraints in two places:

CREATE TABLE orders (
    id serial PRIMARY KEY,              -- ← INLINE constraint
    user_id int REFERENCES users(id),   -- ← INLINE constraint
    CONSTRAINT uq_email UNIQUE (email)  -- ← TABLE-LEVEL constraint
);

Inline constraints are attached to a column definition node. Table-level constraints are siblings of the column list. We walk both.

Step 2: Build the IR Structs

Each constraint type maps to a dedicated Go struct:

type DDLPrimaryKey struct {
    ConstraintName string
    Columns        []string
}

type DDLForeignKey struct {
    ConstraintName    string
    Columns           []string
    ReferencesSchema  string
    ReferencesTable   string
    ReferencesColumns []string
    OnDelete          FKAction   // CASCADE, SET NULL, RESTRICT, ...
    OnUpdate          FKAction
}

type DDLCheckConstraint struct {
    ConstraintName string
    Expression     string       // raw expression text: "(price > 0)"
}

These get bundled into DDLConstraints and attached to the DDLAction:

type DDLConstraints struct {
    PrimaryKey       *DDLPrimaryKey
    ForeignKeys      []DDLForeignKey
    UniqueKeys       []DDLUniqueConstraint
    CheckConstraints []DDLCheckConstraint
}

Step 3: Merge Inline + Table-Level

This is the tricky part. A single table can define its PK inline on one column and also have a composite unique key at the table level. We use an internal tableConstraints struct with a merge() method to combine both sources without duplication:

    inline constraints ──┐
                         ├──▶ merge() ──▶ DDLConstraints
  table-level constraints┘

ALTER TABLE: The Forgotten Half

Most SQL parsers stop at CREATE TABLE. But in the real world, constraints are added after table creation just as often:

ALTER TABLE public.products
  ADD CONSTRAINT positive_price CHECK (price > 0);

ALTER TABLE orders
  ADD CONSTRAINT fk_user FOREIGN KEY (user_id)
  REFERENCES users(id) ON DELETE SET NULL;

Before our changes, ALTER TABLE ... ADD CONSTRAINT was silently dropped — the extraction guard only checked for column adds/drops. Now it emits a proper DDLAction with ADD_CONSTRAINT flag and fully populated Constraints.

Real-World Use Cases

Why does any of this matter? Here's what you can build with structured constraint metadata:

  ┌─────────────────┐     ┌──────────────────────┐
  │ Migration Safety│     │  Schema Docs Gen     │
  │                 │     │                      │
  │ "This migration │     │ Auto-generate ER     │
  │  drops a FK that│     │ diagrams from SQL    │
  │  3 services     │     │ migration files      │
  │  depend on"     │     │                      │
  └────────┬────────┘     └──────────┬───────────┘
           │                         │
           │    ┌────────────────┐   │
           └────┤  postgresparser├───┘
                │  Constraints   │
           ┌────┤  IR            ├───┐
           │    └────────────────┘   │
  ┌────────▼─────────┐     ┌─────────▼─────────────┐
  │ Query Linting    │     │  Access Control       │
  │                  │     │                       │
  │ "WARNING: INSERT │     │ "Column user_id has   │
  │  missing required│     │  FK to users — ensure │
  │  FK column"      │     │  caller has read on   │
  │                  │     │  both tables"         │
  └──────────────────┘     └───────────────────────┘

Migration safety — Diff two versions of your schema SQL. If a FK is removed that other services reference, flag it before it hits production.

Schema documentation — Generate ER diagrams directly from .sql files. No database connection needed. Works in CI.

Query linting — Cross-reference INSERT/UPDATE column lists against known constraints. Catch missing NOT NULL or FK violations statically.

Access control analysis — FK relationships imply data access paths. Map them automatically for security reviews.

The Numbers

Metric	Value
GitHub Stars	200+
Constraint types supported	PK, FK, UNIQUE, CHECK
FK referential actions	CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION
Works on	CREATE TABLE (inline + table-level) + ALTER TABLE ADD CONSTRAINT
Grammar changes needed	Zero. Pure tree-walking.

That last point is worth emphasizing: we didn't touch the ANTLR grammar at all. The PostgreSQL grammar already parses constraints — we just weren't reading them. The entire feature was built by walking deeper into the existing parse tree.

Try It

import parser "github.com/ValkDB/postgresparser"

sql := `CREATE TABLE orders (
    id serial PRIMARY KEY,
    user_id int REFERENCES users(id) ON DELETE CASCADE,
    total numeric CHECK (total > 0)
);`

result, _ := parser.Parse(sql)

constraints := result.DDLActions[0].Constraints
fmt.Println(constraints.PrimaryKey.Columns)           // [id]
fmt.Println(constraints.ForeignKeys[0].OnDelete)       // CASCADE
fmt.Println(constraints.CheckConstraints[0].Expression) // (total > 0)

Install:

go get github.com/ValkDB/postgresparser@latest

What's Next

We're not done. The constraint work opened doors for a brand new world!
Follow along, and feel free to request stuff

Built and maintained by the ValkDB team. Star the repo if this saved you from writing another regex.

DEV Community