<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Eitamos Ring</title>
    <description>The latest articles on DEV Community by Eitamos Ring (@eitamos_ring_0508146ca448).</description>
    <link>https://dev.to/eitamos_ring_0508146ca448</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3395918%2F5ae6659e-ea1d-4142-b982-ec79776600b3.png</url>
      <title>DEV Community: Eitamos Ring</title>
      <link>https://dev.to/eitamos_ring_0508146ca448</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/eitamos_ring_0508146ca448"/>
    <language>en</language>
    <item>
      <title>A Protobuf for Database Schemas</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Wed, 18 Mar 2026 07:52:54 +0000</pubDate>
      <link>https://dev.to/eitamos_ring_0508146ca448/a-protobuf-for-database-schemas-pc8</link>
      <guid>https://dev.to/eitamos_ring_0508146ca448/a-protobuf-for-database-schemas-pc8</guid>
      <description>&lt;p&gt;Every serious system has an interface definition for its wire format. gRPC has protobuf. REST has OpenAPI. GraphQL has its SDL. But databases -- the thing everything else is built on top of -- have nothing.&lt;/p&gt;

&lt;p&gt;Your database schema is one of the most important artifacts in your system. It defines every table, column, type, constraint, relationship, and index. It encodes years of domain decisions. And yet there is no standard, portable, machine-readable format for it.&lt;/p&gt;

&lt;p&gt;We built one. We call it &lt;code&gt;ctxexport.json&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem is older than LLMs
&lt;/h2&gt;

&lt;p&gt;Before you assume this is an AI-context story, consider how many times you have needed your schema outside the database itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Onboarding a new engineer who needs to understand the data model.&lt;/li&gt;
&lt;li&gt;Diffing staging against production to catch drift before a deploy.&lt;/li&gt;
&lt;li&gt;Running a linter in CI to enforce naming conventions or catch missing indexes.&lt;/li&gt;
&lt;li&gt;Generating documentation that is not immediately stale.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every time, you end up writing a bespoke script that queries &lt;code&gt;information_schema&lt;/code&gt; or &lt;code&gt;pg_catalog&lt;/code&gt;, parses the output, and feeds it into whatever tool you need. The script is Postgres-specific. It breaks when you add a second schema. Nobody maintains it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pg_dump --schema-only&lt;/code&gt; exists, but it is a restore format, not a consumption format. It is Postgres-specific SQL with &lt;code&gt;SET&lt;/code&gt; statements, ownership clauses, and an ordering designed for replay, not reading. Try parsing it reliably. Try feeding it to a linter. Try diffing two of them without drowning in noise.&lt;/p&gt;

&lt;p&gt;MongoDB is worse. There is no &lt;code&gt;mongodump --schema-only&lt;/code&gt;. Your schema lives in the shape of whatever documents happen to exist. Good luck extracting that into something a tool can reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extract once, use many ways
&lt;/h2&gt;

&lt;p&gt;The core insight behind &lt;code&gt;ctxexport.json&lt;/code&gt; is the same one behind protobuf: &lt;strong&gt;separate the definition from the consumption&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A protobuf &lt;code&gt;.proto&lt;/code&gt; file is written once and compiled to Go structs, Python classes, TypeScript types, gRPC stubs, or REST gateways. The definition is the single source of truth. The consumers are many and varied.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ctxexport.json&lt;/code&gt; works the same way. You extract your schema once -- from Postgres, MongoDB, or whatever backend -- and produce a single canonical JSON file. That file contains entities (tables, views, collections), fields (columns with types, nullability, defaults), edges (foreign keys and inferred references), and access paths (indexes). Everything a tool needs to understand your data model, nothing it does not.&lt;/p&gt;

&lt;p&gt;From that single artifact, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compile to a lighthouse map&lt;/strong&gt; -- a compact table-and-relationship summary that fits in an LLM prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compile to full SQL DDL&lt;/strong&gt; -- standard &lt;code&gt;CREATE TABLE&lt;/code&gt; statements for any subset of tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serve over MCP&lt;/strong&gt; -- give an AI agent schema awareness without database credentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff across environments&lt;/strong&gt; -- compare staging and production schemas as structured data, not text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lint offline&lt;/strong&gt; -- check naming conventions, missing indexes, or orphaned foreign keys in CI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate in CI&lt;/strong&gt; -- catch schema regressions before they reach production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commit to git&lt;/strong&gt; -- your schema becomes a versioned artifact with a real history.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these consumers need to know whether the source was Postgres or MongoDB. None of them need a live database connection. The extraction happened once, upstream, and everything downstream reads the same contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  The sidecar pattern
&lt;/h2&gt;

&lt;p&gt;Databases have never been good at carrying human knowledge alongside the schema. Your &lt;code&gt;users.deleted_at&lt;/code&gt; column is a soft-delete flag, but the database only knows it is a &lt;code&gt;timestamp with time zone&lt;/code&gt;. Your &lt;code&gt;orders.payload&lt;/code&gt; column is JSONB with a specific structure, but the database sees an opaque blob.&lt;/p&gt;

&lt;p&gt;A sidecar file (&lt;code&gt;dbdense.yaml&lt;/code&gt;) layers descriptions and value annotations onto the extracted schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;entities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;payments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorized"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paid"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refunded"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;deleted_at&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Soft&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;delete&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;timestamp.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;NULL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;active."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This merges at export time. The compiled DDL gets inline comments like &lt;code&gt;-- Values: pending, authorized, paid, failed, refunded&lt;/code&gt;. Every downstream consumer -- linter, LLM, documentation generator -- picks it up automatically. Write it once in a YAML file committed to the repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why JSON, not SQL
&lt;/h2&gt;

&lt;p&gt;SQL DDL is human-readable but machine-hostile. Parsing &lt;code&gt;CREATE TABLE&lt;/code&gt; statements reliably across dialects is a nightmare. Defaults are quoted differently. Constraints can be inline or out-of-band. Comments use different syntax. There is no standard way to represent a foreign key relationship as structured data.&lt;/p&gt;

&lt;p&gt;JSON is boring and that is the point. It is a declarative state representation -- you look up a table by name, not by parsing DDL statement order. Every language has a JSON parser. The schema is simple: a version string, an array of entities, and an array of edges. You can validate it with a JSON Schema. You can diff it with &lt;code&gt;jq&lt;/code&gt;. You can read it in any language without a SQL parser.&lt;/p&gt;

&lt;p&gt;A minimal entity looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"payments"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"table"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uuid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"is_pk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"not_null"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"values"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"pending"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"paid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"failed"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Flat, predictable, zero ambiguity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stop treating your schema like a black box
&lt;/h2&gt;

&lt;p&gt;The immediate use case is LLM context -- giving AI agents schema awareness without live database access. But the format is deliberately general. If your tool can read JSON, it can read a database schema. That was not true before.&lt;/p&gt;

&lt;p&gt;The project is at &lt;a href="https://github.com/valkdb/dbdense" rel="noopener noreferrer"&gt;github.com/valkdb/dbdense&lt;/a&gt;. The contract is documented in &lt;code&gt;docs/ctxexport-contract.md&lt;/code&gt;. It supports Postgres and MongoDB today. The extractor interface is small enough that adding a new backend is a single file.&lt;/p&gt;

&lt;p&gt;Your database schema is too important to be locked inside the database. Export it. Version it. Build on it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Stop Sending 93K Tokens of Schema to Your LLM Agent!</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Wed, 18 Mar 2026 07:52:06 +0000</pubDate>
      <link>https://dev.to/eitamos_ring_0508146ca448/stop-sending-93k-tokens-of-schema-to-your-llm-agent-5c67</link>
      <guid>https://dev.to/eitamos_ring_0508146ca448/stop-sending-93k-tokens-of-schema-to-your-llm-agent-5c67</guid>
      <description>&lt;p&gt;I've watched agents query &lt;code&gt;information_schema&lt;/code&gt; over and over, spending 4-6 turns just to figure out which tables exist, what columns they have, and how they join. On a 500-table database, the full DDL is around 93,000 tokens. Most questions touch 3-5 tables. On a complex multi-table join, I measured a 64% token reduction by just giving the agent the schema upfront.&lt;/p&gt;

&lt;p&gt;That's what &lt;a href="https://github.com/valkdb/dbdense" rel="noopener noreferrer"&gt;dbdense&lt;/a&gt; does.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/valkdb/dbdense" rel="noopener noreferrer"&gt;dbdense&lt;/a&gt; to fix this.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;dbdense is a three-step offline pipeline: &lt;strong&gt;extract&lt;/strong&gt;, &lt;strong&gt;compile&lt;/strong&gt;, &lt;strong&gt;serve&lt;/strong&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Extract&lt;/strong&gt; connects to your database once and snapshots the schema into a portable JSON file (&lt;code&gt;ctxexport.json&lt;/code&gt;). Tables, columns, types, primary keys, foreign keys, indexes -- everything an LLM needs to write correct queries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Compile&lt;/strong&gt; turns that snapshot into two artifacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;lighthouse&lt;/strong&gt; -- a compact table map (~4K tokens for 500 tables). It looks like this:
&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; T:users|J:orders,sessions
 T:orders|E:payload,shipping|J:payments,shipments,users
 T:payments|J:orders
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Every table, its FK neighbors, and embedded docs. 23x smaller than full DDL. This stays in the agent's context so it always knows what's available.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full DDL&lt;/strong&gt; -- standard &lt;code&gt;CREATE TABLE&lt;/code&gt; statements with constraints, rendered on demand only for the specific tables the agent asks about.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Serve&lt;/strong&gt; (optional) exposes the lighthouse as an MCP resource and the DDL via an MCP &lt;code&gt;slice&lt;/code&gt; tool. The agent reads the map, picks the tables it needs, and gets back just those definitions.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After the extract, everything runs locally. The compiled artifacts are plain text you can commit to your repo. No database connection needed at runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  No credentials in the agent runtime
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;export&lt;/code&gt; step is the only step that touches the database. After that, &lt;code&gt;compile&lt;/code&gt; and &lt;code&gt;serve&lt;/code&gt; work from the local snapshot. Your production database credentials never need to be in the agent's environment. The tool works offline and air-gapped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;I ran an agentic benchmark: n=3, same 5 questions, same seeded Postgres database (20K+ rows, 8 tables), same model (Claude Sonnet 4). One arm had only a Postgres MCP tool. The other had the same tool plus dbdense schema context injected into the prompt.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Without schema context&lt;/th&gt;
&lt;th&gt;With dbdense&lt;/th&gt;
&lt;th&gt;Delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Correct answers&lt;/td&gt;
&lt;td&gt;13/15&lt;/td&gt;
&lt;td&gt;13/15&lt;/td&gt;
&lt;td&gt;equal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg turns&lt;/td&gt;
&lt;td&gt;4.1&lt;/td&gt;
&lt;td&gt;2.2&lt;/td&gt;
&lt;td&gt;-46%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per run&lt;/td&gt;
&lt;td&gt;285,922&lt;/td&gt;
&lt;td&gt;187,603&lt;/td&gt;
&lt;td&gt;-34%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Same accuracy. 34% fewer tokens. 46% fewer turns.&lt;/p&gt;

&lt;p&gt;The savings scale with query complexity. On simple single-table filters, both arms performed about the same. On a complex multi-table join, the baseline agent spent 6+ turns querying &lt;code&gt;information_schema&lt;/code&gt; to discover the schema. dbdense answered in 2 turns, using 64% fewer tokens for that query.&lt;/p&gt;

&lt;p&gt;The two wrong answers (both on the same question, in both arms) returned identical incorrect results, pointing to question ambiguity rather than a schema context issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sidecar enrichment
&lt;/h2&gt;

&lt;p&gt;Databases lie by omission. A column named &lt;code&gt;status&lt;/code&gt; with type &lt;code&gt;text&lt;/code&gt; tells the LLM nothing about what values are valid. The agent either guesses or wastes a &lt;code&gt;SELECT DISTINCT&lt;/code&gt; turn to find out.&lt;/p&gt;

&lt;p&gt;dbdense supports a &lt;code&gt;dbdense.yaml&lt;/code&gt; sidecar file where you annotate columns with descriptions and enum values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;entities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;payments&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorized"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;paid"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refunded"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;orders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fields&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Order&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lifecycle&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;status."&lt;/span&gt;
        &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confirmed"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delivered"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancelled"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These annotations merge into the compiled DDL as inline SQL comments. The LLM sees &lt;code&gt;-- Values: pending, authorized, paid, failed, refunded&lt;/code&gt; right next to the column definition. No extra queries needed.&lt;/p&gt;

&lt;p&gt;This also works for documenting JSONB structures, MongoDB embedded documents, or anything else the raw schema doesn't capture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it doesn't do
&lt;/h2&gt;

&lt;p&gt;The snapshot is static. If your schema changes, re-run &lt;code&gt;export&lt;/code&gt;. This is intentional -- schemas are stable; questions change.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;slice&lt;/code&gt; tool still depends on the LLM picking the right tables from the lighthouse. dbdense reduces the context problem; it doesn't solve table selection for the model.&lt;/p&gt;

&lt;p&gt;It's not a &lt;code&gt;pg_dump --schema-only&lt;/code&gt; replacement. The renderer covers columns, PKs, FKs, NOT NULL, defaults, unique constraints, and indexes, but skips triggers, RLS policies, and custom types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;install &lt;/span&gt;github.com/valkdb/dbdense/cmd/dbdense@latest
dbdense &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nt"&gt;--driver&lt;/span&gt; postgres &lt;span class="nt"&gt;--db&lt;/span&gt; &lt;span class="s2"&gt;"postgres://user:pass@localhost:5432/mydb"&lt;/span&gt; &lt;span class="nt"&gt;--schemas&lt;/span&gt; public
dbdense compile &lt;span class="nt"&gt;--mode&lt;/span&gt; lighthouse &lt;span class="nt"&gt;--in&lt;/span&gt; ctxexport.json &lt;span class="nt"&gt;--out&lt;/span&gt; lighthouse.txt
dbdense compile &lt;span class="nt"&gt;--in&lt;/span&gt; ctxexport.json &lt;span class="nt"&gt;--out&lt;/span&gt; schema.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You now have two files: a lighthouse map and full DDL. Point your agent at them. If you use Claude Code, &lt;code&gt;dbdense init-claude&lt;/code&gt; writes the MCP config for you.&lt;/p&gt;

&lt;p&gt;The project is open source at &lt;a href="https://github.com/valkdb/dbdense" rel="noopener noreferrer"&gt;github.com/valkdb/dbdense&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>From Flat Columns to Full Relationships: How We Taught Our SQL Parser to Understand Table Constraints</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Thu, 12 Mar 2026 14:30:24 +0000</pubDate>
      <link>https://dev.to/eitamos_ring_0508146ca448/from-flat-columns-to-full-relationships-how-we-taught-our-sql-parser-to-understand-table-1cd0</link>
      <guid>https://dev.to/eitamos_ring_0508146ca448/from-flat-columns-to-full-relationships-how-we-taught-our-sql-parser-to-understand-table-1cd0</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — We went from parsing &lt;code&gt;CREATE TABLE&lt;/code&gt; as a bag of columns to extracting full relational metadata: primary keys, foreign keys, unique constraints, CHECK expressions, and referential actions. Here's how we did it in Go with ANTLR, and why it matters for anyone building developer tools on top of SQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;You've seen a hundred blog posts about &lt;em&gt;writing&lt;/em&gt; SQL. &lt;/p&gt;

&lt;p&gt;This one is about &lt;strong&gt;reading&lt;/strong&gt; it programmatically.&lt;/p&gt;

&lt;p&gt;We maintain &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;valk-postgres-parser&lt;/a&gt;, an open-source Go library that parses PostgreSQL into a structured IR (intermediate representation). It powers linters, migration validators, and query analyzers that need to understand what SQL &lt;em&gt;means&lt;/em&gt;, not just what it &lt;em&gt;says&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For months, our &lt;code&gt;CREATE TABLE&lt;/code&gt; extraction looked like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;SQL in → columns out&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's it. Column names, types, nullability. No relationships. No constraints. If you wanted to know which column was the primary key or what foreign key pointed where — you were on your own.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;    &lt;span class="nb"&gt;serial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;   &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;  &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;CASCADE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="nb"&gt;numeric&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;positive_total&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; We'd hand you &lt;code&gt;[id, email, user_id, total]&lt;/code&gt; with types. That's barely more useful than a regex.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; You get the full picture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Constraints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PrimaryKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ForeignKeys"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ReferencesTable"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"users"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ReferencesColumns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"OnDelete"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CASCADE"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"UniqueKeys"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Columns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"CheckConstraints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ConstraintName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"positive_total"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Expression"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"(total &amp;gt; 0)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the difference between a &lt;em&gt;tokenizer&lt;/em&gt; and a &lt;em&gt;parser that understands your schema&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture: Two Layers, One Truth
&lt;/h2&gt;

&lt;p&gt;Here's how data flows through the parser. Constraints live at both layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    ┌──────────────────────┐
                    │      Raw SQL         │
                    └──────────┬───────────┘
                               │
                    ┌──────────▼───────────┐
                    │   ANTLR 4 Grammar    │
                    │  (PostgreSQL Lexer +  │
                    │       Parser)         │
                    └──────────┬───────────┘
                               │  parse tree
                    ┌──────────▼───────────┐
                    │   Core IR Layer       │
                    │                       │
                    │  DDLAction            │
                    │   ├─ ColumnDetails[]  │
                    │   └─ Constraints      │
                    │       ├─ PrimaryKey   │
                    │       ├─ ForeignKeys  │
                    │       ├─ UniqueKeys   │
                    │       └─ CheckConst.  │
                    └──────────┬───────────┘
                               │  convert
                    ┌──────────▼───────────┐
                    │  Analysis Layer       │
                    │                       │
                    │  SQLDDLAction         │
                    │   ├─ ColumnDetails[]  │
                    │   └─ Constraints      │
                    │       ├─ PrimaryKey   │
                    │       ├─ ForeignKeys  │
                    │       ├─ UniqueKeys   │
                    │       └─ CheckConst.  │
                    └──────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Core IR&lt;/strong&gt; = 1:1 with the parse tree. It's the &lt;em&gt;what&lt;/em&gt;.&lt;br&gt;
&lt;strong&gt;Analysis layer&lt;/strong&gt; = the consumer-friendly view. It's the &lt;em&gt;so what&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Both layers carry the same constraint shape, but the analysis layer normalizes and simplifies for downstream consumers. Pick your depth: low-level IR for tool authors, analysis structs for application developers.&lt;/p&gt;
&lt;h2&gt;
  
  
  How It Works: Constraint Extraction in 3 Steps
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Step 1: Walk the ANTLR Tree
&lt;/h3&gt;

&lt;p&gt;PostgreSQL's grammar defines constraints in two places:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;serial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;-- ← INLINE constraint&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;   &lt;span class="c1"&gt;-- ← INLINE constraint&lt;/span&gt;
    &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;uq_email&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- ← TABLE-LEVEL constraint&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inline constraints are attached to a column definition node. Table-level constraints are siblings of the column list. We walk both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Build the IR Structs
&lt;/h3&gt;

&lt;p&gt;Each constraint type maps to a dedicated Go struct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;DDLPrimaryKey&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ConstraintName&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Columns&lt;/span&gt;        &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;DDLForeignKey&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ConstraintName&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Columns&lt;/span&gt;           &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;ReferencesSchema&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;ReferencesTable&lt;/span&gt;   &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;ReferencesColumns&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;OnDelete&lt;/span&gt;          &lt;span class="n"&gt;FKAction&lt;/span&gt;   &lt;span class="c"&gt;// CASCADE, SET NULL, RESTRICT, ...&lt;/span&gt;
    &lt;span class="n"&gt;OnUpdate&lt;/span&gt;          &lt;span class="n"&gt;FKAction&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;DDLCheckConstraint&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ConstraintName&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Expression&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;       &lt;span class="c"&gt;// raw expression text: "(price &amp;gt; 0)"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These get bundled into &lt;code&gt;DDLConstraints&lt;/code&gt; and attached to the &lt;code&gt;DDLAction&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;DDLConstraints&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;PrimaryKey&lt;/span&gt;       &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;DDLPrimaryKey&lt;/span&gt;
    &lt;span class="n"&gt;ForeignKeys&lt;/span&gt;      &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;DDLForeignKey&lt;/span&gt;
    &lt;span class="n"&gt;UniqueKeys&lt;/span&gt;       &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;DDLUniqueConstraint&lt;/span&gt;
    &lt;span class="n"&gt;CheckConstraints&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;DDLCheckConstraint&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Merge Inline + Table-Level
&lt;/h3&gt;

&lt;p&gt;This is the tricky part. A single table can define its PK inline on one column &lt;strong&gt;and&lt;/strong&gt; also have a composite unique key at the table level. We use an internal &lt;code&gt;tableConstraints&lt;/code&gt; struct with a &lt;code&gt;merge()&lt;/code&gt; method to combine both sources without duplication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    inline constraints ──┐
                         ├──▶ merge() ──▶ DDLConstraints
  table-level constraints┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ALTER TABLE: The Forgotten Half
&lt;/h2&gt;

&lt;p&gt;Most SQL parsers stop at &lt;code&gt;CREATE TABLE&lt;/code&gt;. But in the real world, constraints are added &lt;em&gt;after&lt;/em&gt; table creation just as often:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;
  &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;positive_price&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
  &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;fk_user&lt;/span&gt; &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before our changes, &lt;code&gt;ALTER TABLE ... ADD CONSTRAINT&lt;/code&gt; was &lt;strong&gt;silently dropped&lt;/strong&gt; — the extraction guard only checked for column adds/drops. Now it emits a proper &lt;code&gt;DDLAction&lt;/code&gt; with &lt;code&gt;ADD_CONSTRAINT&lt;/code&gt; flag and fully populated &lt;code&gt;Constraints&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;Why does any of this matter? Here's what you can build with structured constraint metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  ┌─────────────────┐     ┌──────────────────────┐
  │ Migration Safety│     │  Schema Docs Gen     │
  │                 │     │                      │
  │ "This migration │     │ Auto-generate ER     │
  │  drops a FK that│     │ diagrams from SQL    │
  │  3 services     │     │ migration files      │
  │  depend on"     │     │                      │
  └────────┬────────┘     └──────────┬───────────┘
           │                         │
           │    ┌────────────────┐   │
           └────┤  postgresparser├───┘
                │  Constraints   │
           ┌────┤  IR            ├───┐
           │    └────────────────┘   │
  ┌────────▼─────────┐     ┌─────────▼─────────────┐
  │ Query Linting    │     │  Access Control       │
  │                  │     │                       │
  │ "WARNING: INSERT │     │ "Column user_id has   │
  │  missing required│     │  FK to users — ensure │
  │  FK column"      │     │  caller has read on   │
  │                  │     │  both tables"         │
  └──────────────────┘     └───────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Migration safety&lt;/strong&gt; — Diff two versions of your schema SQL. If a FK is removed that other services reference, flag it before it hits production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema documentation&lt;/strong&gt; — Generate ER diagrams directly from &lt;code&gt;.sql&lt;/code&gt; files. No database connection needed. Works in CI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query linting&lt;/strong&gt; — Cross-reference INSERT/UPDATE column lists against known constraints. Catch missing NOT NULL or FK violations statically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access control analysis&lt;/strong&gt; — FK relationships imply data access paths. Map them automatically for security reviews.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Stars&lt;/td&gt;
&lt;td&gt;200+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Constraint types supported&lt;/td&gt;
&lt;td&gt;PK, FK, UNIQUE, CHECK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FK referential actions&lt;/td&gt;
&lt;td&gt;CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Works on&lt;/td&gt;
&lt;td&gt;CREATE TABLE (inline + table-level) + ALTER TABLE ADD CONSTRAINT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grammar changes needed&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Zero. Pure tree-walking.&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last point is worth emphasizing: &lt;strong&gt;we didn't touch the ANTLR grammar at all&lt;/strong&gt;. The PostgreSQL grammar already parses constraints — we just weren't &lt;em&gt;reading&lt;/em&gt; them. The entire feature was built by walking deeper into the existing parse tree.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="s"&gt;"github.com/ValkDB/postgresparser"&lt;/span&gt;

&lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;`CREATE TABLE orders (
    id serial PRIMARY KEY,
    user_id int REFERENCES users(id) ON DELETE CASCADE,
    total numeric CHECK (total &amp;gt; 0)
);`&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;constraints&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DDLActions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Constraints&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PrimaryKey&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c"&gt;// [id]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ForeignKeys&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OnDelete&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;// CASCADE&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CheckConstraints&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Expression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// (total &amp;gt; 0)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go get github.com/ValkDB/postgresparser@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We're not done. The constraint work opened doors for a brand new world!&lt;br&gt;
Follow along, and feel free to request stuff&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Built and maintained by the &lt;a href="https://github.com/ValkDB" rel="noopener noreferrer"&gt;ValkDB&lt;/a&gt; team. Star the repo if this saved you from writing another regex.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>postgres</category>
      <category>parsing</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How does a linter know your column doesn't exist</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Mon, 09 Mar 2026 08:40:04 +0000</pubDate>
      <link>https://dev.to/eitamos_ring_0508146ca448/how-does-a-linter-know-your-column-doesnt-exist-7ff</link>
      <guid>https://dev.to/eitamos_ring_0508146ca448/how-does-a-linter-know-your-column-doesnt-exist-7ff</guid>
      <description>&lt;p&gt;You write a query that SELECTs ghost_status from the orders table. Your code compiles. Your tests pass. But ghost_status was never created in any migration. In production, that query crashes.&lt;br&gt;
Valk Guard catches this at PR time - with no database connection.&lt;br&gt;
This post walks through exactly how. Not hand-waving. The actual code path, from source file to finding.&lt;br&gt;
The setup&lt;br&gt;
Here's a Go file using Goqu to build a query:&lt;br&gt;
func ListBrokenUserOrderStatus(ctx context.Context) error {&lt;br&gt;
    _, _, err := goqu.From("users").&lt;br&gt;
        LeftJoin(&lt;br&gt;
            goqu.T("orders"),&lt;br&gt;
            goqu.On(goqu.I("orders.user_id").Eq(goqu.I("users.id"))),&lt;br&gt;
        ).&lt;br&gt;
        Select("users.id", "users.email", "orders.ghost_status").&lt;br&gt;
        Where(goqu.I("orders.missing_flag").Eq("pending")).&lt;br&gt;
        ToSQL()&lt;br&gt;
    return err&lt;br&gt;
}&lt;br&gt;
And here's the migration that created the orders table:&lt;br&gt;
CREATE TABLE orders (&lt;br&gt;
    id         SERIAL PRIMARY KEY,&lt;br&gt;
    user_id    INTEGER NOT NULL REFERENCES users(id),&lt;br&gt;
    total      NUMERIC(10,2) NOT NULL,&lt;br&gt;
    status     TEXT NOT NULL DEFAULT 'pending',&lt;br&gt;
    created_at TIMESTAMP DEFAULT now()&lt;br&gt;
);&lt;br&gt;
Notice: the query references orders.ghost_status. The migration never created that column. There is no ghost_status. Valk Guard reports:&lt;br&gt;
VG105: projection column "ghost_status" not found in table "orders" schema; check SELECT list and schema/model mappings&lt;br&gt;
How does it know?&lt;br&gt;
Let's walk through each phase.&lt;br&gt;
Phase 1: Query extraction&lt;br&gt;
The Goqu scanner doesn't look for SQL strings. It walks the Go AST looking for method chains rooted in goqu.From().&lt;br&gt;
When it finds one, it flattens the chain into a list of method calls: From("users") → LeftJoin(...) → Select(...) → Where(...). Each method gets parsed: From gives the base table, LeftJoin gives the join target, Select gives the projection columns, Where gives the predicates.&lt;br&gt;
From these parts, the scanner synthesizes a SQL statement:&lt;br&gt;
SELECT users.id, users.email, orders.ghost_status&lt;br&gt;
FROM users LEFT JOIN orders ON orders.user_id = users.id&lt;br&gt;
WHERE orders.missing_flag = 'pending'&lt;br&gt;
This SQL never existed in your source code. Valk Guard constructed it from the AST of your Go code. That's the key difference from regex-based tools - regex can't walk a method chain and reconstruct what the query builder will produce.&lt;br&gt;
Phase 2: Schema snapshot&lt;br&gt;
Separately, Valk Guard finds all .sql files under your migration paths. Each file gets parsed through postgresparser, and every DDL statement gets applied to a Snapshot - an in-memory representation of your schema's current state.&lt;br&gt;
The snapshot builder processes DDL actions in order:&lt;br&gt;
CREATE TABLE orders (id, user_id, total, status, created_at) → registers the table with five columns&lt;br&gt;
ALTER TABLE orders ADD COLUMN shipped_at TIMESTAMP → adds a sixth column&lt;br&gt;
ALTER TABLE orders DROP COLUMN shipped_at → removes it&lt;/p&gt;

&lt;p&gt;The end result is a map of table names to column definitions. For orders, that's: id, user_id, total, status, created_at. Five columns. No ghost_status.&lt;br&gt;
This is the same principle as running all your migrations on an empty database - except it happens in memory, with no database, in microseconds.&lt;br&gt;
Phase 3: Rule evaluation&lt;br&gt;
Now VG105 runs. It takes the synthesized SQL (already parsed into a structured IR by postgresparser) and the schema snapshot, and does a straightforward lookup:&lt;br&gt;
For each column in the SELECT list with usage type "projection", resolve which table it belongs to (using the alias or the single-table shortcut)&lt;br&gt;
Look up that table in the snapshot&lt;br&gt;
Check if the column exists in the table's column map&lt;br&gt;
If not → finding&lt;/p&gt;

&lt;p&gt;For ghost_status, the column usage says it belongs to orders (from the orders.ghost_status qualifier). The snapshot has an orders table. But orders.ghost_status is not in the column map. Finding.&lt;br&gt;
The same logic powers VG106 (unknown filter column - catches WHERE orders.missing_flag = 'pending' from the same query) and VG107 (unknown table reference).&lt;br&gt;
It also works with ORM models&lt;br&gt;
The same snapshot system powers schema-drift rules (VG101–VG104). Instead of checking queries against migrations, these rules check ORM models against migrations.&lt;br&gt;
Say you have a Go struct:&lt;br&gt;
type Order struct {&lt;br&gt;
    ID          int    &lt;code&gt;db:"id"&lt;/code&gt;&lt;br&gt;
    UserID      int    &lt;code&gt;db:"user_id"&lt;/code&gt;&lt;br&gt;
    Total       string &lt;code&gt;db:"total"&lt;/code&gt;&lt;br&gt;
    Status      string &lt;code&gt;db:"status"&lt;/code&gt;&lt;br&gt;
    GhostStatus string &lt;code&gt;db:"ghost_status"&lt;/code&gt;&lt;br&gt;
}&lt;br&gt;
Valk Guard's Go model extractor walks the AST, reads the db struct tags, and produces a ModelDef with columns: id, user_id, total, status, ghost_status.&lt;br&gt;
VG101 then compares each model column against the migration snapshot. ghost_status isn't in the orders table → finding:&lt;br&gt;
VG101: model "orders" references column "ghost_status" not found in table "orders" schema; check migration DDL or update model mapping&lt;br&gt;
Two different rules, two different input paths (query vs. model), same schema snapshot, same answer.&lt;br&gt;
What this means in practice&lt;br&gt;
You don't need a running database. You don't need to run migrations. You don't need to connect to staging. Valk Guard reads your source code and your migration files, builds everything in memory, and cross-references them statically.&lt;br&gt;
This runs in CI in seconds. It catches the kind of bug that usually shows up as a column "ghost_status" does not exist error in your logs at 2am - and moves it to a PR comment at 2pm instead.&lt;br&gt;
go install github.com/valkdb/valk-guard/cmd/valk-guard@latest&lt;br&gt;
valk-guard scan .&lt;br&gt;
Repo: github.com/ValkDB/valk-guard&lt;/p&gt;

</description>
      <category>ai</category>
      <category>postgres</category>
      <category>devops</category>
      <category>go</category>
    </item>
    <item>
      <title>We didn't want an AI SQL reviewer. We wanted deterministic</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Sat, 07 Mar 2026 16:33:20 +0000</pubDate>
      <link>https://dev.to/eitamos_ring_0508146ca448/we-didnt-want-an-ai-sql-reviewer-we-wanted-deterministic-oh6</link>
      <guid>https://dev.to/eitamos_ring_0508146ca448/we-didnt-want-an-ai-sql-reviewer-we-wanted-deterministic-oh6</guid>
      <description>&lt;p&gt;So we built &lt;a href="https://github.com/ValkDB/valk-guard" rel="noopener noreferrer"&gt;Valk Guard&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Most SQL linters scan &lt;code&gt;.sql&lt;/code&gt; files. The problem is, most SQL doesn't live in &lt;code&gt;.sql&lt;/code&gt; files.&lt;/p&gt;

&lt;p&gt;It lives in &lt;code&gt;db.Query()&lt;/code&gt; calls. In Goqu builder chains. In SQLAlchemy ORM methods. In migration files mixed with application logic. By the time SQL reaches production, it's been assembled, concatenated, or synthesized by code that no &lt;code&gt;.sql&lt;/code&gt;-only tool will ever see.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/ValkDB/valk-guard" rel="noopener noreferrer"&gt;Valk Guard&lt;/a&gt; to solve that. It's a static analysis tool that walks your source code's AST, reconstructs the SQL your ORMs and query builders will generate, parses it through a real PostgreSQL grammar, and reports findings in CI-friendly formats. No database connection. No runtime. Just structure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go &lt;span class="nb"&gt;install &lt;/span&gt;github.com/valkdb/valk-guard/cmd/valk-guard@latest
valk-guard scan &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;19 rules enabled by default. Zero config. Takes seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it actually catches
&lt;/h2&gt;

&lt;p&gt;Here's a Goqu chain in Go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;goqu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;From&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"orders"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Delete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's no raw SQL anywhere in that line. But Valk Guard walks the Go AST, recognizes the Goqu method chain, synthesizes &lt;code&gt;DELETE FROM orders&lt;/code&gt;, feeds it through &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;postgresparser&lt;/a&gt;, and fires &lt;strong&gt;VG003&lt;/strong&gt;: &lt;code&gt;DELETE without WHERE may affect all rows&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Same thing with SQLAlchemy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No SQL string. Valk Guard's embedded Python AST extractor reconstructs it, and the same rule fires.&lt;/p&gt;

&lt;p&gt;The full rule set covers three categories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query safety&lt;/strong&gt; — &lt;code&gt;UPDATE&lt;/code&gt; without &lt;code&gt;WHERE&lt;/code&gt; (VG002), &lt;code&gt;DELETE&lt;/code&gt; without &lt;code&gt;WHERE&lt;/code&gt; (VG003), &lt;code&gt;SELECT *&lt;/code&gt; (VG001), unbounded &lt;code&gt;SELECT&lt;/code&gt; without &lt;code&gt;LIMIT&lt;/code&gt; (VG004), leading wildcard &lt;code&gt;LIKE '%...'&lt;/code&gt; (VG005), &lt;code&gt;SELECT ... FOR UPDATE&lt;/code&gt; without &lt;code&gt;WHERE&lt;/code&gt; (VG006).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dangerous DDL&lt;/strong&gt; — &lt;code&gt;DROP TABLE&lt;/code&gt; / &lt;code&gt;TRUNCATE&lt;/code&gt; in application code (VG007), &lt;code&gt;CREATE INDEX&lt;/code&gt; without &lt;code&gt;CONCURRENTLY&lt;/code&gt; (VG008).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema drift&lt;/strong&gt; — ORM model references a column that migrations dropped (VG101). &lt;code&gt;NOT NULL&lt;/code&gt; column missing from model (VG102). Type mismatch between model and DDL (VG103). Model table has no &lt;code&gt;CREATE TABLE&lt;/code&gt; in migrations (VG104). Query &lt;code&gt;SELECT&lt;/code&gt;s a column that doesn't exist in the schema (VG105). And several more cross-reference checks between your code and your migrations.&lt;/p&gt;

&lt;p&gt;That last category is the one I haven't seen elsewhere. Valk Guard reads Go struct tags (&lt;code&gt;db&lt;/code&gt;, &lt;code&gt;gorm&lt;/code&gt;) and Python &lt;code&gt;__tablename__&lt;/code&gt; / &lt;code&gt;Column(...)&lt;/code&gt; definitions, builds a schema snapshot from your migration DDL, and cross-references them. If your ORM model says &lt;code&gt;email&lt;/code&gt; exists but your migration dropped it, that's VG101 at PR time — not a runtime panic in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AST, not AI
&lt;/h2&gt;

&lt;p&gt;This was a deliberate choice, and it's worth explaining.&lt;/p&gt;

&lt;p&gt;CI is not a brainstorming session. If a PR check comments on your code and changes its mind between runs, people stop trusting it. If it floods you with false positives, people add &lt;code&gt;--skip-lint&lt;/code&gt; and move on. The tool is dead even if the idea was good.&lt;/p&gt;

&lt;p&gt;I needed the opposite: same input, same output, every time. Testable. Explainable. Boring in the best way.&lt;/p&gt;

&lt;p&gt;AI is useful for exploration and suggestions. But a blocking CI step needs determinism. Even structured-output approaches for LLMs improve schema conformance — they don't make a generative model behave like a static analyzer. The questions Valk Guard answers are structural: "does this statement have a WHERE clause?" "does this builder chain produce a bounded query?" "does this model match this schema?" Those are AST questions, not generation questions.&lt;/p&gt;

&lt;p&gt;The same logic applies to regex. Regex is fine when the thing you're checking is a flat string. It falls apart when SQL is buried inside Go method chains or Python ORM calls. You can't regex your way through &lt;code&gt;goqu.From("users").Where(goqu.C("id").Eq(42)).Select("name")&lt;/code&gt; and reliably reconstruct the query. You need to parse the source language's AST, understand the builder pattern, and synthesize the SQL. That's what Valk Guard does.&lt;/p&gt;

&lt;p&gt;A small number of checks do use targeted regex after parsing — when a parser-extracted clause doesn't expose the exact field a rule needs. But that's regex as a surgical helper on already-parsed output, not regex as the foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;Source files go in. Findings come out. Here's what happens in between:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Extraction&lt;/strong&gt; — Four scanners run concurrently. The raw SQL scanner handles &lt;code&gt;.sql&lt;/code&gt; files with proper dollar-quoting and nested block comments. The Go scanner uses &lt;code&gt;go/ast&lt;/code&gt; to extract SQL from &lt;code&gt;db.Query&lt;/code&gt;, &lt;code&gt;db.Exec&lt;/code&gt;, and &lt;code&gt;db.QueryRow&lt;/code&gt;. The Goqu scanner walks builder chains and synthesizes SQL. The SQLAlchemy scanner invokes an embedded Python script (stdlib only — no pip dependencies) that parses ORM chains via Python's &lt;code&gt;ast&lt;/code&gt; module.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Parsing&lt;/strong&gt; — Every extracted statement goes through &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;postgresparser&lt;/a&gt;, a pure-Go PostgreSQL parser I built on ANTLR. It produces a structured IR: tables, columns, joins, WHERE clauses, command type. No CGO, no database connection. Most queries parse in 70–350 µs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Rule evaluation&lt;/strong&gt; — Rules are dispatched by SQL command type for efficiency. Query rules (VG001–VG008) run against every parsed statement. Schema-drift rules (VG101+) cross-reference ORM model definitions against a migration-derived schema snapshot. Query-schema rules (VG105–VG108) validate that columns and tables referenced in queries actually exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Output&lt;/strong&gt; — Findings are deduplicated, sorted by file and line, and formatted as terminal output, JSON, SARIF (for GitHub Code Scanning), or rdjsonl (for reviewdog PR comments).&lt;/p&gt;

&lt;p&gt;The whole thing is ~8,100 lines of Go (plus ~700 lines of embedded Python), with nearly 1:1 test coverage. Three runtime dependencies: cobra, postgresparser, and yaml. That's it.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI integration
&lt;/h2&gt;

&lt;p&gt;Valk Guard was designed for pull request workflows. Exit code 0 means clean, 1 means findings, 2 means config/parser error. Hook it into reviewdog and you get inline PR comments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run valk-guard&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;set +e&lt;/span&gt;
    &lt;span class="s"&gt;valk-guard scan . --format rdjsonl &amp;gt; valk-guard.rdjsonl&lt;/span&gt;
    &lt;span class="s"&gt;code=$?&lt;/span&gt;
    &lt;span class="s"&gt;set -e&lt;/span&gt;
    &lt;span class="s"&gt;if [ "$code" -gt 1 ]; then exit "$code"; fi&lt;/span&gt;

&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Post review comments&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;REVIEWDOG_GITHUB_API_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;reviewdog -f=rdjsonl -name="valk-guard" \&lt;/span&gt;
      &lt;span class="s"&gt;-reporter=github-pr-review -filter-mode=added \&lt;/span&gt;
      &lt;span class="s"&gt;&amp;lt; valk-guard.rdjsonl&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Findings are non-blocking by default. Config errors fail the job. You can see real example PRs with live review comments in the &lt;a href="https://github.com/ValkDB/valk-guard-example" rel="noopener noreferrer"&gt;valk-guard-example&lt;/a&gt; repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it fits
&lt;/h2&gt;

&lt;p&gt;Valk Guard is not a runtime firewall, not a database advisor, and not a replacement for &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;. It's a guardrail for the most common and most expensive SQL mistakes — the ones that happen when someone pushes a &lt;code&gt;DELETE FROM orders&lt;/code&gt; without a &lt;code&gt;WHERE&lt;/code&gt; at 4pm on a Friday.&lt;/p&gt;

&lt;p&gt;It's PostgreSQL-only. It doesn't auto-fix. It doesn't need a running database. It reads your source code, understands your ORMs, and tells you what's going to break — before it merges.&lt;/p&gt;

&lt;p&gt;Less magic. More signal. Same answer every run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo: &lt;a href="https://github.com/ValkDB/valk-guard" rel="noopener noreferrer"&gt;github.com/ValkDB/valk-guard&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Wed, 18 Feb 2026 06:18:53 +0000</pubDate>
      <link>https://dev.to/eitamos_ring_0508146ca448/building-a-postgresql-parser-in-go-what-broke-after-we-open-sourced-it-3i8h</link>
      <guid>https://dev.to/eitamos_ring_0508146ca448/building-a-postgresql-parser-in-go-what-broke-after-we-open-sourced-it-3i8h</guid>
      <description>&lt;h1&gt;
  
  
  Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;&lt;code&gt;postgresparser&lt;/code&gt;&lt;/a&gt; is a pure-Go PostgreSQL SQL parser. It turns SQL text into structured metadata (tables, columns, joins, filters, DDL actions, parameters) without executing queries.&lt;/p&gt;

&lt;p&gt;We thought it was solid. Open source proved we were wrong.&lt;/p&gt;

&lt;p&gt;Here is what open source forced us to learn.&lt;/p&gt;

&lt;p&gt;The biggest shift was not “more bug reports.” It was use-case expansion.&lt;br&gt;&lt;br&gt;
We built for our workflow. Users showed up with very different workloads.&lt;br&gt;
In the first week after release, most feedback centered on deterministic batch parsing.&lt;/p&gt;
&lt;h2&gt;
  
  
  Our internal assumptions broke immediately
&lt;/h2&gt;

&lt;p&gt;Inside a single team, ambiguous behavior survives because everyone “knows” the rules. Public users do not have that context.&lt;/p&gt;

&lt;p&gt;The first pressure point was multi-statement SQL. We had &lt;code&gt;ParseSQL&lt;/code&gt; (single statement) and figured batch parsing was “close enough.” It was not.&lt;/p&gt;

&lt;p&gt;People were using the parser for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI linting pipelines&lt;/li&gt;
&lt;li&gt;production tools&lt;/li&gt;
&lt;li&gt;llm wrappers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;People asked practical questions we could not answer cleanly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which exact statement failed?&lt;/li&gt;
&lt;li&gt;Is this a warning or a hard failure?&lt;/li&gt;
&lt;li&gt;Can I map diagnostics to the original SQL text reliably?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those questions forced us to define strict contracts instead of relying on implied behavior.&lt;/p&gt;

&lt;p&gt;If your tool consumes SQL in bulk, batch correlation is everything.&lt;/p&gt;
&lt;h2&gt;
  
  
  Broken behavior example
&lt;/h2&gt;

&lt;p&gt;This input exposed the issue quickly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Early batch behavior made correlation awkward because results were compacted and diagnostics were not statement-first. If you’re building CI checks or migration tooling, “something in the batch failed” is not actionable.&lt;/p&gt;

&lt;p&gt;Now each statement has deterministic correlation (&lt;code&gt;Index&lt;/code&gt;, &lt;code&gt;RawSQL&lt;/code&gt;, &lt;code&gt;Query&lt;/code&gt;, &lt;code&gt;Warnings&lt;/code&gt;), so downstream code can point to the exact source statement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before/after API diff
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- type ParseBatchResult struct {
-   Queries          []*ParsedQuery
-   Warnings         []ParseWarning
-   TotalStatements  int
-   ParsedStatements int
- }
&lt;/span&gt;&lt;span class="gi"&gt;+ type StatementParseResult struct {
+   Index    int
+   RawSQL   string
+   Query    *ParsedQuery   // nil =&amp;gt; IR conversion failure
+   Warnings []ParseWarning // statement-scoped warnings
+ }
+
+ type ParseBatchResult struct {
+   Statements       []StatementParseResult
+   TotalStatements  int
+   ParsedStatements int
+   HasFailures      bool
+ }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That shape is less convenient for quick demos, but much better for real integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real SQL in the wild is much uglier than test fixtures
&lt;/h2&gt;

&lt;p&gt;Open source usage also brought SQL shapes we did not have in internal tests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;trailing semicolons and odd whitespace&lt;/li&gt;
&lt;li&gt;invalid syntax in the middle of an otherwise valid batch&lt;/li&gt;
&lt;li&gt;mixed DDL + DML scripts&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ONLY&lt;/code&gt; variants in DDL paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The parser had to become resilient without becoming vague. That meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better statement-level warning attribution&lt;/li&gt;
&lt;li&gt;explicit failure semantics (&lt;code&gt;Query == nil&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;tighter handling across DDL relation extraction paths&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One concrete snippet (current behavior)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQLAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"total=%d parsed=%d has_failures=%t&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TotalStatements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParsedStatements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HasFailures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Statements&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"idx=%d failed=%t warnings=%d raw=%q&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Warnings&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RawSQL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the integration model people asked for: deterministic, inspectable, and boring in the best way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Open source removed our ability to hand-wave edge cases.&lt;/p&gt;

&lt;p&gt;The loop became:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;implement&lt;/li&gt;
&lt;li&gt;get challenged&lt;/li&gt;
&lt;li&gt;simplify&lt;/li&gt;
&lt;li&gt;lock behavior with tests&lt;/li&gt;
&lt;li&gt;document the contract&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That loop made &lt;code&gt;postgresparser&lt;/code&gt; better than it would have been as an internal-only tool.&lt;br&gt;
Internal tools can survive ambiguity. Public libraries cannot.&lt;/p&gt;

&lt;p&gt;If you're building something on top of &lt;code&gt;postgresparser&lt;/code&gt;, open an issue. Real-world SQL keeps improving the contract.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>go</category>
      <category>database</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Building a Pure Go PostgreSQL SQL Parser (No CGO, No Server, No Runtime Dependencies)</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Mon, 09 Feb 2026 17:29:25 +0000</pubDate>
      <link>https://dev.to/eitamos_ring_0508146ca448/building-a-pure-go-postgresql-sql-parser-no-cgo-no-server-no-runtime-dependencies-goi</link>
      <guid>https://dev.to/eitamos_ring_0508146ca448/building-a-pure-go-postgresql-sql-parser-no-cgo-no-server-no-runtime-dependencies-goi</guid>
      <description>&lt;h2&gt;
  
  
  Why we built this
&lt;/h2&gt;

&lt;p&gt;We needed PostgreSQL SQL parsing in environments where CGO was not an option:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alpine containers
&lt;/li&gt;
&lt;li&gt;AWS Lambda
&lt;/li&gt;
&lt;li&gt;Distroless images
&lt;/li&gt;
&lt;li&gt;Scratch builds
&lt;/li&gt;
&lt;li&gt;ARM deployments
&lt;/li&gt;
&lt;li&gt;Anywhere &lt;code&gt;CGO_ENABLED=0&lt;/code&gt; is required
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most existing approaches either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Depend on native Postgres parser bindings
&lt;/li&gt;
&lt;li&gt;Require CGO
&lt;/li&gt;
&lt;li&gt;Require running a Postgres server
&lt;/li&gt;
&lt;li&gt;Are too heavy for infrastructure tooling
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So we built a pure Go PostgreSQL parser.&lt;/p&gt;




&lt;h2&gt;
  
  
  The goal
&lt;/h2&gt;

&lt;p&gt;Not to replace Postgres parsing.&lt;br&gt;&lt;br&gt;
Not to be 100% server-compatible.&lt;/p&gt;

&lt;p&gt;The goal was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Give infrastructure and tooling systems structured query data safely and deterministically.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What it extracts
&lt;/h2&gt;

&lt;p&gt;The parser outputs an intermediate representation (IR) with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tables (with aliases)
&lt;/li&gt;
&lt;li&gt;Columns
&lt;/li&gt;
&lt;li&gt;Joins
&lt;/li&gt;
&lt;li&gt;WHERE filters
&lt;/li&gt;
&lt;li&gt;GROUP BY
&lt;/li&gt;
&lt;li&gt;ORDER BY
&lt;/li&gt;
&lt;li&gt;CTEs
&lt;/li&gt;
&lt;li&gt;Subqueries
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;postgresparser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ParseSQL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;`
    SELECT u.name, COUNT(o.id) AS order_count
    FROM users u
    LEFT JOIN orders o ON o.user_id = u.id
    WHERE u.active = true
    GROUP BY u.name
    ORDER BY order_count DESC
`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;// "SELECT"&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tables&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c"&gt;// users, orders with aliases&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;// u.name, COUNT(o.id) AS order_count&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c"&gt;// ["u.active=true"]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JoinConditions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// ["o.user_id=u.id"]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GroupBy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="c"&gt;// ["u.name"]&lt;/span&gt;
&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ColumnUsage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c"&gt;// each column with its role: filter, join, projection, group, order&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now tooling can answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What tables does this query touch?
&lt;/li&gt;
&lt;li&gt;What joins exist?
&lt;/li&gt;
&lt;li&gt;What filters are applied?
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why ANTLR + Pure Go
&lt;/h2&gt;

&lt;p&gt;We evaluated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;libpg_query bindings
&lt;/li&gt;
&lt;li&gt;WASM approaches
&lt;/li&gt;
&lt;li&gt;regex / string parsing
&lt;/li&gt;
&lt;li&gt;custom parsers
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tradeoffs we cared about
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pure Go&lt;/td&gt;
&lt;td&gt;Simpler deploy, fewer runtime risks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No CGO&lt;/td&gt;
&lt;td&gt;Works in restricted environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic behavior&lt;/td&gt;
&lt;td&gt;Important for tooling / analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;Needed for production workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ANTLR gave us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mature grammar ecosystem
&lt;/li&gt;
&lt;li&gt;Strong parsing guarantees
&lt;/li&gt;
&lt;li&gt;Good performance with SLL mode
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;Most real-world queries parse in roughly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;~70–350 microseconds&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
(using SLL prediction mode)&lt;/p&gt;




&lt;h2&gt;
  
  
  Where this is useful
&lt;/h2&gt;

&lt;p&gt;Typical use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI SQL validation
&lt;/li&gt;
&lt;li&gt;Query lineage hints
&lt;/li&gt;
&lt;li&gt;Migration safety checks
&lt;/li&gt;
&lt;li&gt;Static query analysis before deploy
&lt;/li&gt;
&lt;li&gt;“What tables does this service touch?” automation
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Open Source
&lt;/h2&gt;

&lt;p&gt;We’ve been using this internally for months and decided to open source it.&lt;/p&gt;

&lt;p&gt;If you break it with weird SQL, please open issues — that’s how coverage improves.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/ValkDB/postgresparser" rel="noopener noreferrer"&gt;https://github.com/ValkDB/postgresparser&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>go</category>
      <category>opensource</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why Database Indexes Keep Coming Up in My Performance Work</title>
      <dc:creator>Eitamos Ring</dc:creator>
      <pubDate>Tue, 29 Jul 2025 06:44:05 +0000</pubDate>
      <link>https://dev.to/eitamos_ring_0508146ca448/why-database-indexes-keep-coming-up-in-my-performance-work-2imo</link>
      <guid>https://dev.to/eitamos_ring_0508146ca448/why-database-indexes-keep-coming-up-in-my-performance-work-2imo</guid>
      <description>&lt;p&gt;I bounce between data pipelines, API fires, and new features all week, and there’s this one thing that keeps biting us. Slow pages. And 8 times out of 10 it’s the same root cause: we forgot the right index.&lt;/p&gt;

&lt;p&gt;We had this analytics dashboard—we tuned the React, cached the API, CDN was spotless. Still slow. The query behind it? joining big tables and scanning like theres no tomorrow. No index on the join keys. Oops.&lt;/p&gt;

&lt;p&gt;A quick demo to prove I’m not just ranting&lt;br&gt;
I spun up a tiny test on my dev box: made an orders table, loaded 100k rows, timed a few queries. First with no indexes, then with some obvious ones.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;CREATE TABLE orders (&lt;br&gt;
  id           SERIAL PRIMARY KEY,&lt;br&gt;
  customer_id  INTEGER NOT NULL,&lt;br&gt;
  order_date   TIMESTAMP NOT NULL,&lt;br&gt;
  total_amount DECIMAL(10,2) NOT NULL,&lt;br&gt;
  status       VARCHAR(50) NOT NULL,&lt;br&gt;
  country      VARCHAR(100)&lt;br&gt;
);&lt;/code&gt;&lt;br&gt;
Before indexes (avg):&lt;/p&gt;

&lt;p&gt;Customer lookup: 6.11 ms&lt;/p&gt;

&lt;p&gt;Status filter: 8.47 ms&lt;/p&gt;

&lt;p&gt;Date range: 6.73 ms&lt;/p&gt;

&lt;p&gt;After indexes:&lt;/p&gt;

&lt;p&gt;Customer lookup: 0.88 ms (~7x faster)&lt;/p&gt;

&lt;p&gt;Status filter: 2.41 ms (~3.5x)&lt;/p&gt;

&lt;p&gt;Date range: 1.48 ms (~4.5x)&lt;/p&gt;

&lt;p&gt;Note: these are from my laptop which is also running Docker, two IDEs, Slack, and that Electron app we dont talk about… so, not a lab.&lt;/p&gt;

&lt;p&gt;The classic slow page (you’ve seen this movie)&lt;br&gt;
Admin page loads in 10 seconds, everyones pointing fingers. Frontend swears it’s fine, backend says “works on my machine”. The DB? doing full table scans through millions of rows because the where clause is on status and order_date and, yeah, neither is indexed.&lt;/p&gt;

&lt;p&gt;Usual suspects&lt;/p&gt;

&lt;p&gt;Foreign keys without matching indexes on the child table&lt;/p&gt;

&lt;p&gt;Date columns everybody filters by (no index)&lt;/p&gt;

&lt;p&gt;Status fields in every WHERE (also no index)&lt;/p&gt;

&lt;p&gt;The tiny bit of code that tells the truth&lt;br&gt;
Here’s how I timed the “customer orders” lookup in Go:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;func testCustomerIDQuery(db *sql.DB, description string) {&lt;br&gt;
    var total time.Duration&lt;br&gt;
    for i := 0; i &amp;lt; numQueries; i++ {&lt;br&gt;
        id := rand.Intn(100000) + 1&lt;br&gt;
        start := time.Now()&lt;br&gt;
        rows, _ := db.Query("&lt;br&gt;
            SELECT id, customer_id, order_date, total_amount, status, country&lt;br&gt;
            FROM orders WHERE customer_id = $1&lt;br&gt;
            LIMIT 10", id)&lt;br&gt;
        if rows != nil { rows.Close() }&lt;br&gt;
        total += time.Since(start)&lt;br&gt;
    }&lt;br&gt;
    fmt.Printf("%s avg: %v\n", description, total/time.Duration(numQueries))&lt;br&gt;
}&lt;/code&gt;&lt;br&gt;
And the “magic” is not magic, it’s just this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
&lt;/code&gt;CREATE INDEX idx_customer_id   ON orders(customer_id);&lt;br&gt;
CREATE INDEX idx_status        ON orders(status);&lt;br&gt;
CREATE INDEX idx_total_amount  ON orders(total_amount);&lt;br&gt;
CREATE INDEX idx_order_date    ON orders(order_date);&lt;br&gt;
CREATE INDEX idx_country       ON orders(country);&lt;br&gt;
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);``&lt;br&gt;
Add those, re‑run the exact same queries, and your 10‑second page quietly becomes sub‑second. It’s almost embarrassing how often that’s the fix.&lt;/p&gt;

&lt;p&gt;Index size reality check&lt;br&gt;
People ask “wont indexes be huge?”. From the same test (100k rows):&lt;/p&gt;

&lt;p&gt;idx_customer_date: 3.1 MB&lt;/p&gt;

&lt;p&gt;orders_pkey: 2.2 MB&lt;/p&gt;

&lt;p&gt;idx_order_date: 2.2 MB&lt;/p&gt;

&lt;p&gt;idx_total_amount: 2.2 MB&lt;/p&gt;

&lt;p&gt;idx_customer_id: 1.9 MB&lt;/p&gt;

&lt;p&gt;idx_country: 712 KB&lt;/p&gt;

&lt;p&gt;idx_status: 688 KB&lt;/p&gt;

&lt;p&gt;Call it ~12 MB total. For the speedup you get, thats cheap.&lt;/p&gt;

&lt;p&gt;A couple gotchas (learned the hard way)&lt;br&gt;
Composite order matters. (customer_id, order_date) helps WHERE customer_id = ? ORDER BY order_date DESC LIMIT 10. Flip it and you’ll be sad.&lt;/p&gt;

&lt;p&gt;Check the plan. EXPLAIN (ANALYZE, BUFFERS)—you want Index Scan / Index Only Scan, not Seq Scan.&lt;/p&gt;

&lt;p&gt;Not every column deserves an index. Super low selectivity (like a boolean) usually wont help with a plain b‑tree.&lt;/p&gt;

&lt;p&gt;Writes pay the bill. Indexes speed reads, but inserts/updates get a bit slower—so pick the ones you actually use.&lt;/p&gt;

&lt;p&gt;The boring checklist that works&lt;br&gt;
Before you reach for sharding, a rewrite, or a shiny new DB:&lt;/p&gt;

&lt;p&gt;Profile the slow endpoint.&lt;/p&gt;

&lt;p&gt;EXPLAIN (ANALYZE, BUFFERS) the worst query.&lt;/p&gt;

&lt;p&gt;If it’s scanning a big table, add the smallest useful index.&lt;/p&gt;

&lt;p&gt;Re‑test. Ship. Sleep.&lt;/p&gt;

&lt;p&gt;It’s not flashy. It won’t wow anyone at a meetup. But it’ll make your app feel fast, which is what users care about anyway.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
