- Book: Database Playbook: Choosing the Right Store for Every System You Build
- My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
- Me: xgabriel.com | GitHub
A WHERE payload->>'status' = 'paid' query has been doing 800ms sequential scans for two months because there is no index on it, and the team that owns the table can't add one cleanly. They tried. The expression index built fine in staging and timed out behind a long-running transaction in production. The status field is buried inside a jsonb blob with eleven other keys, three of which are typed wrong, two of which nobody on the team remembers writing.
This is the migration. Out of the JSON column, into typed columns, with zero downtime and no broken reads. The pattern is called expand-and-contract: you add the new shape next to the old one, move traffic across in stages, and only drop the old shape once the new one has been serving production for long enough to trust.
The shape we're starting from:
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id bigint NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
payload holds { "status": "paid", "total_cents": 4200, "currency": "USD", ... }. We want status, total_cents, and currency as first-class columns with proper types, constraints, and indexes.
Step 1: Add target columns and backfill
The first instinct is ALTER TABLE orders ADD COLUMN status text NOT NULL. On a busy table, that rewrites every row and holds an ACCESS EXCLUSIVE lock for the duration. On a 200M-row table, that's an outage.
The Postgres-friendly version is to add the column nullable, with no default, then backfill in batches.
ALTER TABLE orders ADD COLUMN status text;
ALTER TABLE orders ADD COLUMN total_cents bigint;
ALTER TABLE orders ADD COLUMN currency text;
Three metadata-only operations. They take milliseconds regardless of table size because Postgres 11+ stores nullable column additions without a rewrite.
Now backfill in batches. Doing this in one statement locks the world; doing it in chunks keeps replication lag bounded and lets you pause if something starts misbehaving.
-- Run repeatedly until 0 rows update.
WITH batch AS (
SELECT id FROM orders
WHERE status IS NULL
ORDER BY id
LIMIT 5000
FOR UPDATE SKIP LOCKED
)
UPDATE orders o
SET
status = payload->>'status',
total_cents = (payload->>'total_cents')::bigint,
currency = payload->>'currency'
FROM batch
WHERE o.id = batch.id;
SKIP LOCKED keeps this loop from fighting application writes. Run it from a worker, sleep 100ms between batches, watch pg_stat_replication.replay_lag if you have replicas. A 50M-row table at 5k rows per batch and 100ms sleep finishes in roughly 17 minutes.
A note on bad data. The cast (payload->>'total_cents')::bigint will explode on the row where someone wrote "total_cents": "n/a" in 2022. Find those first:
SELECT id, payload->>'total_cents' AS bad_value
FROM orders
WHERE payload->>'total_cents' !~ '^[0-9]+$'
AND payload ? 'total_cents'
LIMIT 50;
Decide per row: fix, default, or quarantine. Don't let the backfill loop discover them at 3 a.m.
Step 2: Dual-write from the application
Backfill catches existing rows. New writes still go to payload only. Until the application writes to both shapes, the new columns drift the moment you stop the backfill.
In your write path, populate both:
def create_order(db, customer_id, status, total_cents, currency):
payload = {
"status": status,
"total_cents": total_cents,
"currency": currency,
}
db.execute(
"""
INSERT INTO orders (
customer_id, payload, status, total_cents, currency
)
VALUES (%s, %s, %s, %s, %s)
""",
(customer_id, Json(payload), status, total_cents, currency),
)
Updates need the same treatment. Every code path that writes payload must also write the typed columns. Grep for payload in your codebase before you ship this; the one update path you forget is the one that breaks parity.
If you can't trust the application to dual-write reliably, fall back to a trigger:
CREATE OR REPLACE FUNCTION sync_orders_columns()
RETURNS trigger AS $$
BEGIN
NEW.status := NEW.payload->>'status';
NEW.total_cents := (NEW.payload->>'total_cents')::bigint;
NEW.currency := NEW.payload->>'currency';
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER orders_sync
BEFORE INSERT OR UPDATE ON orders
FOR EACH ROW EXECUTE FUNCTION sync_orders_columns();
The trigger is the safety net, not the goal. Triggers add per-row overhead and hide the migration from anyone reading the application code six months later. Use it during the migration, drop it once dual-write is proven.
Step 3: Verify parity before flipping reads
Don't trust the migration just because the backfill finished. Compare the two shapes for every row:
SELECT count(*) AS mismatches
FROM orders
WHERE
status IS DISTINCT FROM payload->>'status'
OR total_cents IS DISTINCT FROM
NULLIF(payload->>'total_cents', '')::bigint
OR currency IS DISTINCT FROM payload->>'currency';
IS DISTINCT FROM treats NULL as comparable, which is what you want; status = NULL is NULL, not false. Any non-zero count is a row to investigate before you keep going.
Run this on a replica if you can. The query is a sequential scan and you don't want it competing with production writes for buffer cache.
The other check that matters: a sample of recent inserts to confirm the dual-write path is wired up. If the trigger is live this is redundant; if you removed the trigger and trust the application, it isn't.
SELECT id, status, payload->>'status' AS pstatus
FROM orders
WHERE created_at > now() - interval '1 hour'
AND status IS DISTINCT FROM payload->>'status';
Zero rows means dual-write is healthy. One row means somebody opened a code path you didn't grep.
Step 4: Add indexes and constraints, then flip reads
Now the typed columns can carry indexes and constraints. Build them concurrently. CREATE INDEX without CONCURRENTLY takes an ACCESS EXCLUSIVE lock for the duration and blocks every write to the table.
CREATE INDEX CONCURRENTLY idx_orders_status
ON orders (status);
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC);
CONCURRENTLY cannot run inside a transaction, takes longer (two table passes), and can leave an INVALID index behind if it fails. Check after every build:
SELECT indexrelid::regclass, indisvalid
FROM pg_index
WHERE indrelid = 'orders'::regclass;
If indisvalid is false, drop it and rebuild. Don't try to repair it.
The not-null constraint comes next. The naive form rewrites the table:
-- Avoid this on large tables.
ALTER TABLE orders
ALTER COLUMN status SET NOT NULL;
Postgres 12+ has a faster route. Add a CHECK constraint as NOT VALID first, validate it without the heavy lock, then promote.
ALTER TABLE orders
ADD CONSTRAINT orders_status_not_null
CHECK (status IS NOT NULL) NOT VALID;
ALTER TABLE orders
VALIDATE CONSTRAINT orders_status_not_null;
NOT VALID skips the existing-row check and takes a short lock. VALIDATE reads every row but only takes a SHARE UPDATE EXCLUSIVE lock, so concurrent reads and writes keep flowing. Once it passes, you can promote to a real NOT NULL (still rewrites in older versions; on Postgres 12+ it uses the validated check and finishes instantly):
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
ALTER TABLE orders DROP CONSTRAINT orders_status_not_null;
With indexes and constraints in place, flip the read path. In the application, change every WHERE payload->>'status' = 'paid' to WHERE status = 'paid'. Ship it behind a feature flag if you can; roll back is now a config change rather than a migration.
The 800ms scan that started this post becomes a 2ms index lookup the moment the new query path goes live.
Step 5: Drop the JSON column
Wait. Not next sprint, not next deploy. Wait long enough that you trust the new path under real load. A week of clean parity checks and zero rollbacks is a reasonable bar. Two weeks is better. The cost of waiting is a column on disk; the cost of dropping early is a missing field your billing job needed and nobody noticed in staging.
When you're ready, remove the dual-write code first. Then the trigger if you used one. Then the column.
DROP TRIGGER IF EXISTS orders_sync ON orders;
DROP FUNCTION IF EXISTS sync_orders_columns();
ALTER TABLE orders DROP COLUMN payload;
DROP COLUMN is metadata-only in Postgres. It marks the column dead and leaves the data on disk until the next VACUUM FULL or table rewrite. Fast, safe, reversible-ish (you can ALTER TABLE ... ADD COLUMN back, but the data is gone).
If payload had keys you didn't migrate, archive them first:
CREATE TABLE orders_payload_archive AS
SELECT id, payload, now() AS archived_at
FROM orders;
Cheap insurance. Drop the archive a quarter later when nobody has asked for it.
The hard part isn't the SQL. It's the discipline to leave the old column alone for two weeks while the new one earns its place.
If this was useful
Chapter 6 of the Database Playbook walks through schema-evolution patterns for Postgres, MySQL, and the document stores: when to model relationally, when jsonb actually pays off, and how to migrate between the two without taking the table offline. If your team is staring at a jsonb column that grew teeth, it's the chapter for you.

Top comments (0)