Why we built this
We needed PostgreSQL SQL parsing in environments where CGO was not an option:
- Alpine containers
- AWS Lambda
- Distroless images
- Scratch builds
- ARM deployments
- Anywhere
CGO_ENABLED=0is required
Most existing approaches either:
- Depend on native Postgres parser bindings
- Require CGO
- Require running a Postgres server
- Are too heavy for infrastructure tooling
So we built a pure Go PostgreSQL parser.
The goal
Not to replace Postgres parsing.
Not to be 100% server-compatible.
The goal was simple:
Give infrastructure and tooling systems structured query data safely and deterministically.
What it extracts
The parser outputs an intermediate representation (IR) with:
- Tables (with aliases)
- Columns
- Joins
- WHERE filters
- GROUP BY
- ORDER BY
- CTEs
- Subqueries
Example
result, err := postgresparser.ParseSQL(`
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.active = true
GROUP BY u.name
ORDER BY order_count DESC
`)
fmt.Println(result.Command) // "SELECT"
fmt.Println(result.Tables) // users, orders with aliases
fmt.Println(result.Columns) // u.name, COUNT(o.id) AS order_count
fmt.Println(result.Where) // ["u.active=true"]
fmt.Println(result.JoinConditions) // ["o.user_id=u.id"]
fmt.Println(result.GroupBy) // ["u.name"]
fmt.Println(result.ColumnUsage) // each column with its role: filter, join, projection, group, order
Now tooling can answer:
- What tables does this query touch?
- What joins exist?
- What filters are applied?
Why ANTLR + Pure Go
We evaluated:
- libpg_query bindings
- WASM approaches
- regex / string parsing
- custom parsers
Tradeoffs we cared about
| Requirement | Why |
|---|---|
| Pure Go | Simpler deploy, fewer runtime risks |
| No CGO | Works in restricted environments |
| Deterministic behavior | Important for tooling / analysis |
| Performance | Needed for production workloads |
ANTLR gave us:
- Mature grammar ecosystem
- Strong parsing guarantees
- Good performance with SLL mode
Performance
Most real-world queries parse in roughly:
~70–350 microseconds
(using SLL prediction mode)
Where this is useful
Typical use cases:
- CI SQL validation
- Query lineage hints
- Migration safety checks
- Static query analysis before deploy
- “What tables does this service touch?” automation
Open Source
We’ve been using this internally for months and decided to open source it.
If you break it with weird SQL, please open issues — that’s how coverage improves.
Top comments (0)