Why your DB audit always finds more than your inventory says

#postgres #supabase #devops #database

The ticket said two

Friday, May first, early afternoon. I open a baseline resync ticket that reports, on the basis of an honest CI diagnosis, "at least two missing objects" between production and the local schema. I start the way you start these things, iteratively. I find the first one in five minutes, replay it in a migration, move to the next. By the fifth, mistrust kicks in — I'm no longer correcting a finite list, I'm discovering it, one patch at a time, with no idea how many remain.

First iteration: a Postgres role agent_readonly absent from the repo. Second: a stripe_customer_id column added one evening to wire up a webhook. Third: a duplicated migration timestamp. Fourth: a missing DROP CASCADE. Fifth: a whole domain table. At that point I stop patching by hand. I dump the catalogs, I comm -23 by category, I produce the full list in ten minutes.

The mechanism

A database that has been alive for several months accumulates drift silently. A role added on a Monday via the web studio to unblock an analysis, a column posted one evening to plug Stripe, a trigger rewritten in a hotfix that was never reported into a migration. Each operation looks benign at the moment it's posted. None leaves a readable trace on the repo side. The operator's memory might hold the last two or three gestures; beyond that it confabulates or forgets. The only way to know the real gap between production and repo is to measure it, head-on, against the system catalogs.

The supabase_migrations.schema_migrations tracker confirms the scale. Fifty-eight versions on the repo side, one hundred and seventy-eight on the production side, zero rows in common. Three months of SQL operations passed through the web studio without being reported into a migration. The ticket said two. The cartography returned over a hundred. Order of magnitude: fifty.

The protocol

The block audit fits in a loop, one category at a time. You dump the production list from the system catalogs, dump the repo list from the migration files, take the difference with comm -23. Repeat for tables, columns, views, functions, triggers, policies, indexes, roles. Ten minutes in total.

# DB block audit — one category at a time
psql "$PROD_URL" -tAc \
  "SELECT tablename FROM pg_tables WHERE schemaname='public' ORDER BY 1" \
  > /tmp/prod-tables.txt

grep -hE '^CREATE TABLE ' supabase/migrations/*.sql \
  | sed -E 's/.*TABLE [^.]*\.?([a-z_]+).*/\1/' | sort -u \
  > /tmp/repo-tables.txt

comm -23 /tmp/prod-tables.txt /tmp/repo-tables.txt
# → tables present in prod, missing from the repo. Loop by category:
#   columns, policies, indexes, triggers, functions.

Once the list is on the table, patch in dependency order — roles first, then tables, columns, indexes, policies, triggers. No more surprises, and the scope of the work is known before you touch the first object.

The rule

Beyond three or four drifts found by iteration, switch to block audit. The cost is fixed, about thirty minutes to map every category. The benefit is knowing the exact scope before patching, rather than discovering the sixth drift after correcting the first five. The rule doesn't depend on the size of the database — it depends on how much time has passed between production and its inventory.

Closing

An inventory that says two and an audit that finds a hundred don't contradict each other. The inventory says what the operator remembers, the audit says what the database contains.

Block audit protocol script, pseudonymized:
github.com/michelfaure/rembrandt-samples/tree/main/db-audit-vs-inventory

Top comments (4)

toshihiro shishido • May 17

The discovering-not-correcting framing nails it. Same here on a Supabase project — Stripe webhook fixes added columns directly in production. We now run pg_dump --schema-only weekly in CI and diff against repo migrations. Anything new = hot-fix.

Sorry if my English sounds weird!!

Michel Faure • May 17

Thanks Toshihiro — that's a very clear description, and the weekly pg_dump --schema-only + diff against migrations is exactly the right safeguard for the structural side of the problem.

The article was probing the other half: a column that already exists starts holding values your TypeScript enum doesn't know about (a Stripe webhook quietly introduces payment_status = 'refunded_pending'). Schema diff stays green, code keeps shipping, audit eventually finds it months later.

Does your anything new = hot-fix rule cover new values in existing enum-ish columns, or only DDL deltas? Most teams treat the latter as in-scope and the former as application logic — which is exactly where divergence accumulates silently.

toshihiro shishido • May 18

Sorry if the English is weird, I used Google Translate!!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.