DEV Community

Mafiree
Mafiree

Posted on

MySQL Schema Migration Without Downtime: A Real Fintech Case Study


Making schema changes on very large MySQL tables can easily interrupt production systems if done with standard DDL (Data Definition Language) operations, because those traditionally block reads and writes for the duration of the change. The blog presents a real-world case study of how Mafiree assisted a fintech customer to evolve their MySQL schema on a table holding over 500 million rows without any downtime — ensuring the application remained fully available throughout the migration process.
At a high level, the case study walks through the technical hurdles encountered, the three-phase technical strategy employed, the tools chosen for each phase, key configuration and performance considerations, and best practices that helped the team carry out the migration safely and reliably.
**

Why Zero Downtime Matters and the Challenge

**
When schema changes are applied directly to large MySQL tables, operations like adding columns, modifying indexes, or changing storage parameters typically require rebuilding the table. A full rebuild locks the table, which in high-traffic environments causes service interruptions — reads and writes can be blocked for minutes to hours, depending on size and load. In mission-critical systems, especially in fintech where transaction throughput and uptime are essential, even short interruptions can be unacceptable.
Traditional ALTER TABLE commands without special handling may put heavy locks and pause database operations while the table is rebuilt. This can seriously degrade user experience and disrupt ongoing transactions, particularly with production databases holding hundreds of millions of rows.
**

The Three-Phase Migration Strategy

**
To avoid blocking production traffic, the migration was structured into three distinct phases, each leveraging different tools and MySQL capabilities:
1. Native MySQL 8.0 Online DDL for Simple Changes
The first phase examined whether the desired schema changes were supported natively by MySQL 8.0 using INSTANT or INPLACE DDL operations.
MySQL 8.0 supports several online DDL operations that modify metadata without requiring a full table rebuild, meaning they complete quickly with minimal locking.

For changes that MySQL can handle in place, these operations were applied first because they avoid overhead entirely.

However, some schema changes — particularly those that fundamentally alter row layout or require rebuilding the table — are not supported as online native DDL. For those, external tools were necessary.
2. gh-ost for Large, Complex Alterations
For the more complex schema updates that MySQL native DDL cannot perform without locks, the team used gh-ost (GitHub Online Schema Transmogrifier):
gh-ost works by creating a shadow copy of the target table and incrementally copying data from the original table into the shadow table.

It listens to MySQL’s binary log to capture ongoing changes on the original table and applies these changes to the shadow copy.

Once the shadow copy is fully in sync with active production traffic, gh-ost performs a coordinated swap of table names in a way that incurs only a very brief lock — typically short enough that users do not notice any interruption.

This tool is well suited for extremely large tables where operations need to run without blocking writes or degrading performance drastically.
3. pt-online-schema-change for Compatibility and Specific Cases
In addition to gh-ost, the team also used pt-online-schema-change from the Percona Toolkit:
This tool also creates a shadow table, then mirrors insert/update/delete operations with triggers while copying data in the background.

It works across a broader range of MySQL versions and has strong compatibility with foreign keys, making it useful when certain constraints or environments wouldn’t easily support gh-ost.

By combining gh-ost, pt-online-schema-change, and native DDL, the team could flexibly choose the best approach for each kind of schema change, balancing safety, compatibility, and performance.
**

Performance, Monitoring, and Benchmarks

**
The case study also emphasizes the importance of configuration tuning and observability during a zero-downtime migration:
In the real migration, the 500M-row table transformation with conservative gh-ost throttling (nice-ratio = 0.5) completed in ~4.5 hours.

If the same migration were run with no throttling on dedicated hardware, it could complete in 1.5–2 hours, though aggressive settings are rarely safe in production.

Essential metrics to watch include replication lag across replicas, InnoDB buffer pool hit ratio, disk I/O load, query latency at the p95/p99 percentiles, and job progress outputs from the tools being used.

If any metric shows stress beyond acceptable thresholds, the migration pause mechanisms in gh-ost and pt-online-schema-change can be used to throttle or pause operations, protecting production workloads.

This surveillance and tuning help to maintain a smooth user experience throughout the change.
**

Best Practices and Operational Advice

**
From the case study’s experience, the following practices emerge as key parts of a successful zero-downtime migration:
Choose the Right Tool for the Job: Use MySQL’s native online DDL where possible. Use gh-ost for heavy, binlog-driven copies, and pt-online-schema-change for compatibility with older environments or foreign keys.

Benchmark and Throttle: Run realistic staging tests and use throttling parameters to balance speed against server load.

Monitor Continuously: Keep an eye on replication lag, buffer pools, I/O, and latency. React early if these start trending poorly.

Plan for Rollback: Even with careful planning, be prepared to abort or revert if issues arise that threaten availability or data stability.

**

Conclusion

**
The Mafiree case study demonstrates that even very large MySQL schema changes can be executed without affecting production uptime, as long as the migration plan is thoughtful and leverages the right mix of native capabilities and specialized tools. By combining MySQL 8.0’s online DDL, gh-ost, and pt-online-schema-change, and by closely monitoring performance and progress indicators, teams can evolve complex database schemas while maintaining seamless service availability.
Ready to unlock MySQL Schema Migration Without Downtime: A Real Fintech Case Study? Dive into the full blog:https://www.mafiree.com/blog/mysql-schema-migration-zero-downtime-case-study

Top comments (0)