The naive version of "add a NOT NULL column to an existing Postgres table" is one of the classic ways to lock production for an extended window. The migration looks innocent. The downtime is real.
This is the step-by-step for doing it safely, the reason each step matters, and the Postgres version differences that change the answer.
Photo by Massimo Botturi on Unsplash
Step 0: Know Your Postgres Version
The single biggest factor in how risky this migration is, is the Postgres major version. The behavior of ALTER TABLE ... ADD COLUMN with a default value changed meaningfully in Postgres 11.
Before Postgres 11: Adding a column with any default value rewrites the entire table. The rewrite takes an ACCESS EXCLUSIVE lock for the duration. On a multi-million-row table, this is minutes of effective downtime.
Postgres 11 and later: Adding a column with a constant default does not rewrite the table. The default is stored in the catalog and applied to existing rows virtually. The operation is fast and takes a brief lock only.
Confirm your version with SELECT version(). If you are on 11 or later, the rest of this gets simpler. If you are on 10 or earlier, you need the full multi-step pattern.
The behavior is documented in the Postgres 11 release notes, and the broader rules are in the ALTER TABLE documentation.
The Safe Pattern (Works on All Versions)
The multi-step pattern that works regardless of version. Each step is one deploy or one separate operation. None of them holds a long lock.
Step 1: Add the Column as Nullable
ALTER TABLE users ADD COLUMN email_verified BOOLEAN;
No default. No NOT NULL. The column exists. Existing rows have NULL in it. The operation is a metadata-only change in modern Postgres versions and takes a brief lock.
This is the cheapest possible schema change. Run it whenever you want.
Step 2: Update the Application to Write the New Column
Deploy a version of the application code that writes a value to email_verified on every insert and on relevant updates. New rows from this point forward have a value. Old rows still have NULL.
If you are dual-writing to support a rollback, also keep writing to whatever column or system the new column is replacing. The dual-write stage of the broader migration playbook covers this.
Step 3: Backfill the Existing Rows in Batches
Update the existing NULL rows in batches. The batched script pattern:
DO $$
DECLARE
batch_size INT := 5000;
last_id BIGINT := 0;
max_id BIGINT;
rows_updated INT;
BEGIN
SELECT MAX(id) INTO max_id FROM users;
WHILE last_id < max_id LOOP
UPDATE users
SET email_verified = false
WHERE id > last_id AND id <= last_id + batch_size
AND email_verified IS NULL;
GET DIAGNOSTICS rows_updated = ROW_COUNT;
last_id := last_id + batch_size;
PERFORM pg_sleep(0.05);
END LOOP;
END $$;
In practice, this lives in a Python or Ruby script with progress tracking and replication-lag monitoring, not in a PL/pgSQL block. But the shape of the operation is the same: bounded range scan on the primary key, small sleep between batches, idempotent update.
The reason for the sleep: it gives the WAL stream a chance to catch up to read replicas and lets concurrent workloads get a fair share of the database's attention.
Step 4: Add the NOT NULL Constraint With NOT VALID
ALTER TABLE users
ADD CONSTRAINT users_email_verified_not_null
CHECK (email_verified IS NOT NULL) NOT VALID;
NOT VALID adds the constraint without scanning the existing data. The operation is fast and takes a brief lock. New rows are required to have a non-NULL value.
The constraint is technically a check constraint, not a NOT NULL column attribute. The behavior is similar (new rows must have a value) but the catalog representation is different. The next steps convert it to a proper NOT NULL attribute.
Step 5: Validate the Constraint
ALTER TABLE users
VALIDATE CONSTRAINT users_email_verified_not_null;
The validate step scans the table to confirm every row satisfies the constraint. It takes a SHARE UPDATE EXCLUSIVE lock, which allows reads and writes but blocks other schema changes. This is the longest operation in the sequence on a large table, but it is concurrent with normal traffic.
If the validation fails, it means the backfill missed rows. Find them, backfill them, re-run the validate.
Step 6: Convert the Constraint to a Column NOT NULL Attribute
This step is optional but cleaner. In Postgres 12 and later, once the CHECK constraint is validated, you can add the NOT NULL attribute to the column without a full table scan:
ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
Postgres uses the existing validated CHECK constraint to satisfy the requirement, so this takes a brief lock instead of a full scan. After that, you can drop the CHECK constraint:
ALTER TABLE users DROP CONSTRAINT users_email_verified_not_null;
On Postgres 11 and earlier, the SET NOT NULL step requires a full table scan. The safer alternative on older versions is to keep the CHECK constraint instead of converting.

Photo by Negative Space on Pexels
What Goes Wrong If You Skip Steps
Skipping the nullable add and trying to do it with a default in one shot, on an older Postgres version. Full table rewrite, ACCESS EXCLUSIVE lock for the duration, production downtime.
Skipping the application dual-write before the backfill. New rows arrive during the backfill with NULL in the new column. The backfill thinks it is done. The validate step fails on the rows that arrived after the backfill started.
Skipping the NOT VALID constraint approach and going straight to SET NOT NULL. Full table scan with ACCESS EXCLUSIVE on older versions. Extended lock.
Skipping the batched backfill and using a single UPDATE. Long lock, WAL flood, replication lag spike. The longer the table, the worse the symptoms.
Each of these is a real production incident pattern. The defensive process is to assume every step matters until proven otherwise on your specific version.
A Note on Default Values
If the new column has a sensible default (false, 0, empty string), include it in the backfill. The application code that writes the new column should also write the default for new rows.
If you want the database to enforce the default for new rows so application bugs do not write NULL, add the default in a separate ALTER:
ALTER TABLE users ALTER COLUMN email_verified SET DEFAULT false;
This is a metadata-only change in modern Postgres versions. It does not rewrite the table. It only affects rows inserted without an explicit value going forward.
Putting It Together
The whole sequence on a modern Postgres version:
- Add nullable column (cheap).
- Deploy app code that dual-writes.
- Backfill in batches.
- Add NOT NULL CHECK with NOT VALID (cheap).
- Validate the constraint (long but concurrent).
- Convert to column NOT NULL attribute (cheap).
- Drop the CHECK constraint.
Seven operations, none of them holding a long exclusive lock. The total wall-clock time is dominated by the backfill in step 3 and the validate in step 5, both of which run concurrent with production traffic.
The broader migration playbook this fits into (the dual-write stage, the gradual cutover, the contract step) is covered at https://137foundry.com/articles/how-to-plan-zero-downtime-database-schema-migration. The free 137Foundry web development overview at the services hub covers the broader engineering practice of putting these patterns in place, including the operational scaffolding around backfills and gradual rollouts.
For Postgres-specific deep dives on lock behavior, the pgexperts blog and PostgresWeekly newsletter archive are both good references that cover the version-by-version evolution of these operations.
The Single Takeaway
The migration that fails badly is the one that tries to do all of this in one statement. The migration that ships safely is the one that takes seven small, well-understood steps. The total engineering time is similar; the operational risk profile is wildly different.
The shape of database migration work in production is mostly choosing the boring path that does not hold long locks. Once that habit is routine, the migrations stop being interesting. Boring migrations are the goal.
Top comments (0)