We migrated our MySQL database. It “worked,” until it didn’t: the new DB’s size didn’t match the old one. Same schema, same rows—different footprint. That tiny mismatch pushed us to build a real migration playbook: understand what’s happening, prove data equality, and leave a paper trail that stakeholders actually trust.
Here’s the journey—told as we lived it—with commands, expected outputs, and the why behind each step.
Step 1 — Capture More Than Just Rows
Before moving data, we grabbed the database’s shape and logic. Otherwise you carry the data but lose the rules that make it behave.
Command
mysqldump -h OLD_HOST -u root -p --no-data OLD_DB > schema_only.sql
mysqldump -h OLD_HOST -u root -p --routines --triggers --events OLD_DB > routines.sql
Expected Output
-
schema_only.sql
contains onlyCREATE TABLE ... statements
(noINSERT INTO
). - routines.sql contains
CREATE PROCEDURE, CREATE FUNCTION, CREATE TRIGGER, CREATE EVENT
. - You’ll likely see
DEFINER=
clauses—note them.
Why this matters
- Schema/routine parity prevents silent logic drift (e.g., missing trigger = missing audit row).
- Pro tip: keep these files in source control for diffs across migrations.
Step 2 — Dump & Restore (Safely and Predictably)
We wanted a consistent snapshot without table-level locks that freeze the app.
Command
mysqldump -h OLD_HOST -u root -p \
--single-transaction --quick \
--routines --triggers --events \
--default-character-set=utf8mb4 \
OLD_DB > old_db_dump.sql
mysql -h NEW_HOST -u root -p -e "CREATE DATABASE IF NOT EXISTS NEW_DB DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;"
mysql -h NEW_HOST -u root -p NEW_DB < old_db_dump.sql
Expected Output
-
old_db_dump.sql
is large and full ofINSERT INTO ...
lines. -
SHOW TABLES FROM NEW_DB;
lists the same tables as OLD_DB. - Critical tables pass a spot-check:
SELECT COUNT(*) FROM big_table;
(numbers line up or are close; exact checks later).
Why this matters
-
--single-transaction
gives a consistent InnoDB snapshot without blocking writers. -
--quick streams
rows to keep memory flat.
Now we had data in place—but sizes looked off. Time to measure, then normalize.
Step 3 — Snapshot the Raw Footprint
We measured where space was going before any tuning.
Command
SELECT table_name,
ENGINE,
TABLE_ROWS, -- estimate for InnoDB
ROUND((data_length + index_length)/1024/1024, 2) AS size_mb
FROM information_schema.tables
WHERE table_schema = 'NEW_DB'
ORDER BY size_mb DESC;
Expected Output (sample)
+------------+--------+-----------+---------+
| table_name | ENGINE | TABLE_ROWS| size_mb |
+------------+--------+-----------+---------+
| orders | InnoDB | 1000000 | 350.12 |
| users | InnoDB | 500000 | 80.45 |
Why this matters
- In InnoDB,
TABLE_ROWS
is an estimate; sizes reflect fragmentation and stale stats after bulk load. - Don’t conclude inequality yet—stats come next.
- The “aha” moment came when we ran
ANALYZE/OPTIMIZE.
Here’s the deeper why.
Step 4 — The Deep Dive: ANALYZE and OPTIMIZE (What They Do, and Why You Should Care)
After bulk inserts, InnoDB pages are fragmented and index statistics are stale. The optimizer can’t “see” reality, so it guesses—sometimes badly. Two tools fix that:
ANALYZE TABLE: refresh index statistics (cardinality)
- What it does: Re-samples index distributions and updates cardinality (estimated unique values per index).
- Why it matters: The optimizer chooses join order and index paths based largely on cardinality. Bad cardinality → bad plans.
- Where it lives: With
innodb_stats_persistent=ON
(default in MySQL 8), stats are stored persistently and survive restarts. - Histograms: MySQL 8 supports column histograms to model non-indexed predicates:
ANALYZE TABLE my_table UPDATE HISTOGRAM ON col1, col2 WITH 128 BUCKETS;
Check with:
SELECT * FROM information_schema.COLUMN_STATISTICS
WHERE SCHEMA_NAME='NEW_DB' AND TABLE_NAME='my_table';
(If the table is empty, enable show_compatibility_56=OFF
and ensure information_schema_stats_expiry
permits refresh.)
OPTIMIZE TABLE: rebuild and defragment
- What it does: For InnoDB, effectively rebuilds the table and its indexes (similar to
ALTER TABLE ... ENGINE=InnoDB
), compacting pages and reclaiming space. - Why it matters: You get a cleaner on-disk layout, tighter B-trees, and often a smaller file size that now resembles your source DB more closely.
- Locking/perf: On large tables, it’s heavy. In MySQL 8, many operations are in-place or “instant,” but plan it off-hours for big tables.
Commands
-- For a single table
ANALYZE TABLE my_table;
OPTIMIZE TABLE my_table;
-- Batch all tables
SELECT CONCAT('ANALYZE TABLE `', table_name, '`; OPTIMIZE TABLE `', table_name, '`;')
FROM information_schema.tables
WHERE table_schema='NEW_DB';
(Pipe the generator into mysql to execute in one go.)
Expected Output
+------------------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------------+----------+----------+----------+
| NEW_DB.my_table | analyze | status | OK |
| NEW_DB.my_table | optimize | status | OK |
Verifying Cardinality Improved
SHOW INDEX FROM my_table;
Expected Output (excerpt)
+----------+------------+----------+--------------+------------+
| Table | Non_unique | Key_name | Column_name | Cardinality|
+----------+------------+----------+--------------+------------+
| my_table | 0 | PRIMARY | id | 999800 |
| my_table | 1 | idx_cust | customer_id | 54012 |
- After
ANALYZE
, Cardinality should look realistic (not suspiciously tiny like 1 or 2 on huge tables). - If predicates rely on non-indexed columns, consider histograms (above). They don’t change cardinality but drastically improve selectivity estimates for those columns.
With stats refreshed and fragmentation reduced, our sizes converged. But “looks good” isn’t enough—we wanted proofs.
Step 5 — Prove Equality with Percona Toolkit (and More You Can Do)
We rely on Percona Toolkit because it’s built for production-grade checks.
pt-table-checksum: detect row-level differences
- How it works: Splits each table into chunks (ranges by PK), computes checksums (CRC32) per chunk on the source, then compares on the target (best with replication; otherwise compare results tables).
- Why it’s great: It scales. You get a precise answer without FULL TABLE SCAN everywhere.
Command (typical replication setup)
pt-table-checksum \
--host=PRIMARY_HOST --user=USER --password=PASS \
--databases NEW_DB \
--replicate=percona.checksums
Expected Output
TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE
... 0 0 1000000 0 20 0 5.3 NEW_DB.orders
- DIFFS = 0 across all rows/tables = ✅
No replication? Two independent servers?
- Option A: Run pt-table-checksum on both and diff the percona.checksums tables.
- Option B: Use pt-table-sync directly to compare and optionally fix.
pt-table-sync: generate the minimal fix
pt-table-sync \
--print --execute \
h=OLD_HOST,u=USER,p=PASS,D=OLD_DB \
h=NEW_HOST,u=USER,p=PASS,D=NEW_DB
Expected Output
- SQL INSERT/UPDATE/DELETE statements (and execution if
--execute
). - Use
--print
first, review, then add--execute
.
More Percona tools (worth having on every migration)
-
pt-query-digest
: Analyze slow logs/traces to find worst queries (post-migration). -
pt-duplicate-key-checker
: Identify redundant/overlapping indexes before/after migration. -
pt-index-usage
: See which indexes aren’t used (on sampled workload). -
pt-online-schema-change
: Safer online DDL for big tables without long lock times.
Data proven equal, we closed the loop on objects and behavior.
Step 6 — Don’t Forget the “Invisible” Pieces
Missing triggers or a changed collation can pass unnoticed—until a bug report lands.
Commands
SHOW TRIGGERS FROM NEW_DB;
SHOW PROCEDURE STATUS WHERE Db='NEW_DB';
SHOW FUNCTION STATUS WHERE Db='NEW_DB';
SELECT table_name, constraint_name, referenced_table_name
FROM information_schema.key_column_usage
WHERE table_schema='NEW_DB' AND referenced_table_name IS NOT NULL;
SELECT default_character_set_name, default_collation_name
FROM information_schema.schemata
WHERE schema_name='NEW_DB';
Expected Output
- Counts and names match OLD_DB for triggers/procs/functions.
- Foreign keys listed as expected.
- Charset/collation align with app expectations (e.g., utf8mb4_0900_ai_ci).
Why this matters
- A single missing trigger can silently break invariants (e.g., stock, audit, denormalized totals).
With the database validated, we packaged results for non-DBA stakeholders.
Step 7 — Report in Plain Language
Engineers love logs; stakeholders love summaries. Stakeholder Table (example)
Item | Old DB | New DB | Status |
---|---|---|---|
Row Count (orders) | 1,000,000 | 1,000,000 | ✅ Equal |
Index Count (orders) | 5 | 5 | ✅ Equal |
Size (MB) | 350 | 348 | ⚠ 0.6% diff (≤5% = OK) |
Checksums (Percona) | Match | Match | ✅ All chunks DIFFS=0 |
Triggers/Procedures | 2/3 | 2/3 | ✅ Parity |
Collation/Charset | utf8mb4/… | utf8mb4/… | ✅ Match |
Expected Output
- Clear pass/fail with small, explained variances.
- A link to the raw validation logs for auditors (appendix).
Migration doesn’t end at restore; we watch for regressions.
Step 8 — Watch the System Breathe (Post-Migration)
We enabled the slow log to catch new bad plans caused by fresh stats.
Commands
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SHOW VARIABLES LIKE 'slow_query_log%';
Expected Output
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| slow_query_log | ON |
Why this matters
-
ANALYZE
can change plans; 1–2 weeks of vigilance pays off. - Pair with
pt-query-digest
to summarize hotspots quickly.
⸻
What We’ll Keep Doing Next Time
- Run
ANALYZE
(and histograms when needed) to fix cardinality → good plans. - Run
OPTIMIZE
on big movers to defragment and align sizes. - Use Percona for truth (
pt-table-checksum
to detect,pt-table-sync
to fix). - Validate the “invisibles” (triggers, procs, FKs, collation).
- Report with thresholds (e.g., size diff ≤5% = acceptable).
- Monitor for 1–2 weeks to catch plan regressions early.
⸻
Appendix — Quick “Pass/Fail” Thresholds We Use
- Row count (critical tables): 100% match → PASS
- Percona checksum: DIFFS=0 for all chunks → PASS
- Size variance (post-OPTIMIZE): ≤5% → PASS
- Cardinality sanity (via SHOW INDEX): no suspiciously tiny values on high-cardinality columns → PASS
- Objects (triggers/procs/functions/FKs): full parity → PASS
Top comments (0)