We didn't want an AI SQL reviewer. We wanted deterministic

#postgres #database #ai #devops

Most SQL linters scan .sql files. The problem is, most SQL doesn't live in .sql files.

It lives in db.Query() calls. In Goqu builder chains. In SQLAlchemy ORM methods. In migration files mixed with application logic. By the time SQL reaches production, it's been assembled, concatenated, or synthesized by code that no .sql-only tool will ever see.

I built Valk Guard to solve that. It's a static analysis tool that walks your source code's AST, reconstructs the SQL your ORMs and query builders will generate, parses it through a real PostgreSQL grammar, and reports findings in CI-friendly formats. No database connection. No runtime. Just structure.

go install github.com/valkdb/valk-guard/cmd/valk-guard@latest
valk-guard scan .

19 rules enabled by default. Zero config. Takes seconds.

What it actually catches

Here's a Goqu chain in Go:

goqu.From("orders").Delete()

There's no raw SQL anywhere in that line. But Valk Guard walks the Go AST, recognizes the Goqu method chain, synthesizes DELETE FROM orders, feeds it through postgresparser, and fires VG003: DELETE without WHERE may affect all rows.

Same thing with SQLAlchemy:

session.query(User).delete()

No SQL string. Valk Guard's embedded Python AST extractor reconstructs it, and the same rule fires.

The full rule set covers three categories:

Query safety — UPDATE without WHERE (VG002), DELETE without WHERE (VG003), SELECT * (VG001), unbounded SELECT without LIMIT (VG004), leading wildcard LIKE '%...' (VG005), SELECT ... FOR UPDATE without WHERE (VG006).

Dangerous DDL — DROP TABLE / TRUNCATE in application code (VG007), CREATE INDEX without CONCURRENTLY (VG008).

Schema drift — ORM model references a column that migrations dropped (VG101). NOT NULL column missing from model (VG102). Type mismatch between model and DDL (VG103). Model table has no CREATE TABLE in migrations (VG104). Query SELECTs a column that doesn't exist in the schema (VG105). And several more cross-reference checks between your code and your migrations.

That last category is the one I haven't seen elsewhere. Valk Guard reads Go struct tags (db, gorm) and Python __tablename__ / Column(...) definitions, builds a schema snapshot from your migration DDL, and cross-references them. If your ORM model says email exists but your migration dropped it, that's VG101 at PR time — not a runtime panic in production.

Why AST, not AI

This was a deliberate choice, and it's worth explaining.

CI is not a brainstorming session. If a PR check comments on your code and changes its mind between runs, people stop trusting it. If it floods you with false positives, people add --skip-lint and move on. The tool is dead even if the idea was good.

I needed the opposite: same input, same output, every time. Testable. Explainable. Boring in the best way.

AI is useful for exploration and suggestions. But a blocking CI step needs determinism. Even structured-output approaches for LLMs improve schema conformance — they don't make a generative model behave like a static analyzer. The questions Valk Guard answers are structural: "does this statement have a WHERE clause?" "does this builder chain produce a bounded query?" "does this model match this schema?" Those are AST questions, not generation questions.

The same logic applies to regex. Regex is fine when the thing you're checking is a flat string. It falls apart when SQL is buried inside Go method chains or Python ORM calls. You can't regex your way through goqu.From("users").Where(goqu.C("id").Eq(42)).Select("name") and reliably reconstruct the query. You need to parse the source language's AST, understand the builder pattern, and synthesize the SQL. That's what Valk Guard does.

A small number of checks do use targeted regex after parsing — when a parser-extracted clause doesn't expose the exact field a rule needs. But that's regex as a surgical helper on already-parsed output, not regex as the foundation.

The pipeline

Source files go in. Findings come out. Here's what happens in between:

1. Extraction — Four scanners run concurrently. The raw SQL scanner handles .sql files with proper dollar-quoting and nested block comments. The Go scanner uses go/ast to extract SQL from db.Query, db.Exec, and db.QueryRow. The Goqu scanner walks builder chains and synthesizes SQL. The SQLAlchemy scanner invokes an embedded Python script (stdlib only — no pip dependencies) that parses ORM chains via Python's ast module.

2. Parsing — Every extracted statement goes through postgresparser, a pure-Go PostgreSQL parser I built on ANTLR. It produces a structured IR: tables, columns, joins, WHERE clauses, command type. No CGO, no database connection. Most queries parse in 70–350 µs.

3. Rule evaluation — Rules are dispatched by SQL command type for efficiency. Query rules (VG001–VG008) run against every parsed statement. Schema-drift rules (VG101+) cross-reference ORM model definitions against a migration-derived schema snapshot. Query-schema rules (VG105–VG108) validate that columns and tables referenced in queries actually exist.

4. Output — Findings are deduplicated, sorted by file and line, and formatted as terminal output, JSON, SARIF (for GitHub Code Scanning), or rdjsonl (for reviewdog PR comments).

The whole thing is ~8,100 lines of Go (plus ~700 lines of embedded Python), with nearly 1:1 test coverage. Three runtime dependencies: cobra, postgresparser, and yaml. That's it.

CI integration

Valk Guard was designed for pull request workflows. Exit code 0 means clean, 1 means findings, 2 means config/parser error. Hook it into reviewdog and you get inline PR comments:

- name: Run valk-guard
  run: |
    set +e
    valk-guard scan . --format rdjsonl > valk-guard.rdjsonl
    code=$?
    set -e
    if [ "$code" -gt 1 ]; then exit "$code"; fi

- name: Post review comments
  env:
    REVIEWDOG_GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: |
    reviewdog -f=rdjsonl -name="valk-guard" \
      -reporter=github-pr-review -filter-mode=added \
      < valk-guard.rdjsonl

Findings are non-blocking by default. Config errors fail the job. You can see real example PRs with live review comments in the valk-guard-example repo.

Where it fits

Valk Guard is not a runtime firewall, not a database advisor, and not a replacement for EXPLAIN ANALYZE. It's a guardrail for the most common and most expensive SQL mistakes — the ones that happen when someone pushes a DELETE FROM orders without a WHERE at 4pm on a Friday.

It's PostgreSQL-only. It doesn't auto-fix. It doesn't need a running database. It reads your source code, understands your ORMs, and tells you what's going to break — before it merges.

Less magic. More signal. Same answer every run.

Repo: github.com/ValkDB/valk-guard