Testing Database Migrations Safely with a Shadow Schema
Testing Database Migrations Safely with a Shadow Schema
Database migrations are one of the easiest places to introduce production risk, because a change can look correct in code but still fail on real data, indexes, locks, or rollback behavior. A shadow schema lets you test the migration against a copy of your structure and a representative slice of data before you run it on the live database.
Why shadow schema testing helps
A shadow schema is a separate database schema, database, or temporary environment that mirrors production enough to exercise your migration logic. It is useful because it reveals issues that unit tests miss, such as slow index creation, incompatible column types, unexpected NULL values, and data backfill failures.
This approach is especially valuable when a migration includes both schema changes and data transformation. It gives you a repeatable place to validate forward and backward migration behavior without waiting for a production incident.
What you should test
A good migration test suite checks more than “does the script run.” It should verify that the migration preserves data integrity, completes within acceptable time, and behaves correctly when reversed.
Focus on these checks:
- The migration applies cleanly to a production-like snapshot.
- Existing rows still satisfy required constraints after the change.
- Backfills produce the expected values.
- Queries used by the application still work after the migration.
- The rollback path restores the prior schema or leaves the system in a safe state.
A practical workflow
Use a three-phase process:
- Clone the schema into a shadow database.
- Load a representative dataset, ideally including edge cases.
- Run the migration and validate both schema and data expectations.
For small teams, this can be done in CI with a disposable PostgreSQL container. For larger systems, it can run as a separate pipeline stage before deployment.
Example migration
Imagine you want to split a single name field into first_name and last_name.
ALTER TABLE users ADD COLUMN first_name text;
ALTER TABLE users ADD COLUMN last_name text;
UPDATE users
SET first_name = split_part(name, ' ', 1),
last_name = NULLIF(split_part(name, ' ', 2), '');
ALTER TABLE users ALTER COLUMN first_name SET NOT NULL;
This looks simple, but it hides risk. Names with only one word, middle names, or extra spaces may not behave as expected. A shadow schema test helps you catch those cases before deployment.
A test harness in Python
Here is a lightweight example using Python and PostgreSQL to run the migration and assert results.
import psycopg2
conn = psycopg2.connect("dbname=shadow_db user=app password=secret host=localhost")
conn.autocommit = True
cur = conn.cursor()
migration_sql = """
ALTER TABLE users ADD COLUMN first_name text;
ALTER TABLE users ADD COLUMN last_name text;
UPDATE users
SET first_name = split_part(trim(name), ' ', 1),
last_name = NULLIF(split_part(trim(name), ' ', 2), '');
ALTER TABLE users ALTER COLUMN first_name SET NOT NULL;
"""
cur.execute(migration_sql)
cur.execute("""
SELECT id, name, first_name, last_name
FROM users
ORDER BY id
""")
rows = cur.fetchall()
assert rows == "Ada"
assert rows == "Grace"
The point is not just to run SQL. The point is to encode the expectations that matter to your application.
Add data assertions
Schema checks alone are not enough. You should also verify that the migrated data makes sense.
Examples:
- Count rows before and after the migration.
- Check that required fields are never null.
- Confirm that derived values match source data.
- Validate foreign keys still resolve correctly.
- Compare checksum-like aggregates when a full row-by-row comparison is too expensive.
A useful pattern is to calculate a small set of invariants that should remain true after every run.
SELECT
COUNT(*) AS total_users,
COUNT(first_name) AS named_users,
COUNT(*) FILTER (WHERE first_name IS NULL) AS null_first_names
FROM users;
If the migration is supposed to populate every row, null_first_names should be zero.
Test rollback behavior
Migration testing is incomplete if you only verify the “up” path. You also need to know what happens if deployment fails halfway through or if you must revert.
For reversible migrations:
- Create a matching down migration.
- Apply the up migration in the shadow schema.
- Run the down migration.
- Confirm the original schema is restored and key data is still usable.
For irreversible migrations:
- Document that the rollback is operational, not structural.
- Test that the service can fail safely if the deploy is stopped mid-migration.
- Prefer additive changes first, then remove old columns in a later release.
That staged approach reduces pressure on a single risky step.
Handle large tables carefully
Large tables need special attention because a migration can lock rows or run long enough to affect availability. Shadow schema tests should include realistic row counts and index sizes, not just a tiny fixture dataset.
Watch for:
- Full table rewrites caused by certain
ALTER TABLEoperations. - Long-running updates that hold locks.
- Index builds that compete with production traffic.
- Backfills that need batching instead of one giant update.
For example, update in chunks:
UPDATE users
SET first_name = split_part(trim(name), ' ', 1)
WHERE first_name IS NULL
AND id BETWEEN 1 AND 10000;
Then repeat in batches until the table is complete. This is slower to implement, but much safer at scale.
Put it in CI
The most reliable migration testing is automated. Add a pipeline job that spins up a fresh database, loads fixture data, runs the migration, and validates the expected results.
A typical CI sequence is:
- Start a database container.
- Apply the base schema.
- Load sample production-like data.
- Run the migration.
- Execute assertions.
- Tear everything down.
If the migration fails in CI, it should fail the build. That makes the safety check part of the delivery process instead of an optional manual step.
Common mistakes
Teams usually run into the same issues:
- Testing only empty databases.
- Using toy data that misses edge cases.
- Forgetting to test rollback.
- Assuming a migration is safe because it passed locally.
- Skipping performance checks on large tables.
The fix is to make the shadow schema resemble production as closely as is practical, especially in data shape and volume.
A simple checklist
Before shipping a migration, confirm:
- The migration runs from scratch on a shadow database.
- It passes with representative data.
- Key invariants still hold.
- Rollback has been tested or consciously documented as unavailable.
- The migration time and locking behavior are acceptable.
If all five are true, you have a much better chance of a safe production rollout.
Final shape
Shadow schema testing gives you a controlled place to prove that a migration works before real users depend on it. It is one of the most practical quality assurance techniques for teams that change databases regularly, because it catches structural, data, and operational failures together.
The best migrations are not just correct; they are testable, repeatable, and boring to deploy.
-
Rizwan Saleem | https://rizwansaleem.co
Top comments (0)