The Day I Watched MySQL Drop 47 Million Rows in 0.8 Seconds (And Why Your DDL Operations Are Probably Still Stuck in 2015)
Or: How MySQL 8.4's Instant DDL Made Me Question Everything I Knew About Schema Changes
Let me tell you about the most anticlimactic moment of my career.
We'd been planning "The Great Schema Migration" for months. Our product table had grown to 47 million rows. We needed to add three new columns. The DBA calculated it would take 6 hours of downtime. We scheduled it for Sunday at 2 AM. We wrote a 47-page runbook. We had rollback plans A through F.
I executed the ALTER TABLE command, held my breath, and...
Query OK, 0 rows affected (0.82 sec)
I thought it was a joke. A bug. A hallucination from too much coffee. But no—MySQL 8.4's instant DDL had just casually added three columns to 47 million rows faster than I could blink.
That moment changed how I think about database operations forever.
The Restaurant Kitchen Metaphor
Think of your database as a busy restaurant kitchen during dinner rush.
In the old days (pre-MySQL 8.0), changing the menu meant shutting down the entire kitchen, rebuilding every dish from scratch, and only reopening when everything was perfect. Want to add "gluten-free option" to your lasagna? Sorry, kitchen's closed for 6 hours while we remake every single lasagna in inventory.
MySQL 8.4's instant DDL is like having a head chef who can update the recipe cards without touching a single dish that's already been made. The kitchen never closes. Diners never know anything changed.
But here's the plot twist: Most developers still operate like it's 2015, shutting down their kitchens for changes that could happen instantly.
The Three DDL Operations That Changed My Life
Operation 1: The Instant Column Addition (That Shouldn't Be Possible)
Here's what blew my mind:
-- This used to be a 6-hour maintenance window
-- Now it's literally instant
ALTER TABLE products
ADD COLUMN sustainability_rating INT DEFAULT 0,
ADD COLUMN carbon_footprint DECIMAL(10,2) DEFAULT 0.00,
ADD COLUMN supplier_certification VARCHAR(100) DEFAULT 'pending',
ALGORITHM=INSTANT;
-- Query OK, 0 rows affected (0.04 sec)
-- Records: 0 Duplicates: 0 Warnings: 0
Wait, how?
MySQL 8.4 doesn't rewrite the table. Instead, it stores the new column metadata separately and computes values on-the-fly during reads:
┌─────────────────────────────────────────────────────┐
│ Traditional ALTER (MySQL 5.7) │
├─────────────────────────────────────────────────────┤
│ │
│ 1. Create temp table with new schema │
│ 2. Copy ALL 47M rows → temp table [6 hours] │
│ 3. Build ALL indexes → temp table [2 hours] │
│ 4. Rename temp → original [locks!] │
│ │
│ Total: 8 hours + extended locking period │
│ │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Instant ALTER (MySQL 8.4) │
├─────────────────────────────────────────────────────┤
│ │
│ 1. Update data dictionary metadata [0.04 sec] │
│ 2. That's it. Seriously. │
│ │
│ When reading rows: │
│ - Check metadata version │
│ - Apply defaults for "new" columns on-the-fly │
│ - Return complete row to application │
│ │
└─────────────────────────────────────────────────────┘
The catch? Not everything is instant. Here's the compatibility matrix:
-- ✅ INSTANT (sub-second)
ADD COLUMN (with DEFAULT)
ADD COLUMN (nullable)
DROP COLUMN
MODIFY COLUMN DEFAULT
RENAME COLUMN
ADD VIRTUAL COLUMN
-- ⚠️ INPLACE (minutes to hours, no table copy)
ADD INDEX
DROP INDEX
MODIFY COLUMN (same type, different attributes)
CHANGE COLUMN (rename + modify)
-- ❌ COPY (hours to days, full table rebuild)
MODIFY COLUMN (type change: VARCHAR→INT)
ADD COLUMN FIRST/AFTER (changes physical layout)
CHANGE CHARSET
CHANGE ROW_FORMAT
Operation 2: The Index That Built Itself While Serving Traffic
Production scenario: Your queries are slow. You need an index. But you have 500 concurrent users and can't afford downtime.
Old me would've scheduled maintenance. New me does this:
-- Create index WITHOUT blocking writes
CREATE INDEX idx_user_registration_date
ON users(registration_date, country_code)
ALGORITHM=INPLACE,
LOCK=NONE;
-- Monitor progress in real-time
SELECT
WORK_COMPLETED,
WORK_ESTIMATED,
ROUND(100 * WORK_COMPLETED / WORK_ESTIMATED, 2) AS pct_complete,
CONCAT(
TIMESTAMPDIFF(SECOND, START_TIME, NOW()),
' seconds elapsed'
) AS time_spent
FROM performance_schema.events_stages_current
WHERE EVENT_NAME LIKE 'stage/innodb/alter%';
Output:
+----------------+----------------+--------------+--------------------+
| WORK_COMPLETED | WORK_ESTIMATED | pct_complete | time_spent |
+----------------+----------------+--------------+--------------------+
| 23450000 | 47000000 | 49.89 | 127 seconds elapsed|
+----------------+----------------+--------------+--------------------+
What's happening under the hood?
Stage 1: Read Rows & Build Sort Buffer
↓
[████████████░░░░░░░░] 60%
↓
Application still writing!
INSERT/UPDATE/DELETE work normally
Changes logged to "online log buffer"
↓
Stage 2: Sort & Write Index
↓
[████████████████░░░░] 80%
↓
Stage 3: Apply Online Log (brief lock)
↓
[████████████████████] 100%
↓
Index ready!
The beautiful part? Your application never knew. Zero downtime. Zero angry users.
Operation 3: The Partition Drop That Scared Me (Until It Didn't)
We had a logging table with 2 years of data partitioned by month. We needed to drop old partitions. My hands were literally shaking:
-- Dropping 12 months of data (240 million rows)
-- Expected: Server freeze, angry Slack messages, resume polishing
ALTER TABLE access_logs
DROP PARTITION p202301, p202302, p202303, p202304,
p202305, p202306, p202307, p202308,
p202309, p202310, p202311, p202312;
-- Query OK, 0 rows affected (1.23 sec)
1.23 seconds to drop 240 million rows.
The secret? InnoDB doesn't actually delete rows. It just unlinks the partition files and marks them for deletion. The OS handles cleanup in the background:
# What actually happens
$ ls -lh /var/lib/mysql/production/access_logs*
# Before DROP PARTITION
-rw-r----- 1 mysql mysql 18G access_logs#p#p202301.ibd
-rw-r----- 1 mysql mysql 19G access_logs#p#p202302.ibd
# After DROP PARTITION (immediately)
# Files disappear from directory listing
# Space reclaimed by filesystem
# No massive DELETE operation
# No undo log explosion
# No binary log bloat
The Query Optimization Detective Story
Now let's talk about the queries themselves. Because DDL speed doesn't matter if your SELECT statements are garbage.
The Mystery of the 47-Second Query
A developer came to me: "This query takes 47 seconds. Help."
-- The problematic query
SELECT
u.username,
u.email,
COUNT(o.id) as order_count,
SUM(o.total_amount) as lifetime_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.registration_date >= '2024-01-01'
AND u.country_code = 'US'
AND u.status = 'active'
GROUP BY u.id, u.username, u.email
HAVING lifetime_value > 1000
ORDER BY lifetime_value DESC
LIMIT 100;
Looks reasonable, right? Wrong. Let's debug it like a crime scene investigation.
Clue #1: The EXPLAIN Output
EXPLAIN FORMAT=TREE
SELECT ...;
-> Limit: 100 row(s)
-> Sort: sum(o.total_amount) DESC
-> Stream results
-> Group aggregate: sum(o.total_amount), count(o.id)
-> Nested loop left join
-> Filter: ((u.status = 'active') and
(u.country_code = 'US') and
(u.registration_date >= DATE'2024-01-01'))
-> Table scan on u
⚠️ Cost: 470,000 rows
-> Index lookup on o using user_id
The smoking gun: Table scan on u with 470,000 rows.
Clue #2: The Missing Index
-- Check existing indexes
SHOW INDEXES FROM users;
+-------+------------+------------------+
| Table | Key_name | Column_name |
+-------+------------+------------------+
| users | PRIMARY | id |
| users | idx_email | email |
+-------+------------+------------------+
We're filtering on country_code, status, and registration_date, but there's no index covering these columns!
The Solution: Composite Index Design
-- The optimal index (order matters!)
CREATE INDEX idx_user_filters
ON users(country_code, status, registration_date)
ALGORITHM=INPLACE,
LOCK=NONE;
-- Why this order?
-- 1. country_code: High selectivity (200 countries)
-- 2. status: Medium selectivity (3-5 states)
-- 3. registration_date: Low selectivity (range scan)
New EXPLAIN:
-> Limit: 100 row(s)
-> Sort: sum(o.total_amount) DESC
-> Stream results
-> Group aggregate
-> Nested loop left join
-> Index range scan on u using idx_user_filters
✅ Cost: 2,400 rows (200× improvement!)
Result: 47 seconds → 0.3 seconds.
The Disk I/O Optimization That Nobody Talks About
Here's a truth bomb: Your database is probably doing 10× more disk I/O than necessary.
The Buffer Pool: Your Database's RAM Disk
InnoDB keeps frequently accessed data in RAM (the buffer pool). But most people configure it wrong:
-- Check current buffer pool size
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
-- Common mistake: Too small!
+-------------------------+------------+
| Variable_name | Value |
+-------------------------+------------+
| innodb_buffer_pool_size | 134217728 | -- 128MB 😱
+-------------------------+------------+
-- Reality check: What do you actually need?
SELECT
CEILING(SUM(data_length + index_length) / 1024 / 1024 / 1024)
AS total_size_gb
FROM information_schema.tables
WHERE engine = 'InnoDB';
+----------------+
| total_size_gb |
+----------------+
| 47 | -- Your DB is 47GB!
+----------------+
The fix:
-- Set buffer pool to 60-70% of available RAM
-- (If you have 64GB RAM, use 40-45GB for buffer pool)
SET GLOBAL innodb_buffer_pool_size = 42949672960; -- 40GB
-- MySQL 8.4 allows online resize! No restart needed!
The I/O Thread Configuration Nobody Touches
-- Check current I/O threads
SHOW VARIABLES LIKE 'innodb_%_io_threads';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| innodb_read_io_threads | 4 | -- Usually too few
| innodb_write_io_threads| 4 | -- Ditto
+------------------------+-------+
-- For modern SSDs with high parallelism
SET GLOBAL innodb_read_io_threads = 16;
SET GLOBAL innodb_write_io_threads = 16;
-- Monitor if threads are saturated
SELECT
name,
count_star as total_waits,
sum_timer_wait / 1000000000000 as total_wait_sec
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE name LIKE 'wait/io/file/innodb/%'
ORDER BY sum_timer_wait DESC
LIMIT 10;
The Adaptive Hash Index Secret
Most developers don't know this exists:
-- InnoDB automatically builds hash indexes in memory
-- for frequently accessed patterns
SHOW ENGINE INNODB STATUS\G
-- Look for this section:
----------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
----------------------------
Hash table size 34679, used cells 23456
4.56 hash searches/s, 1.23 non-hash searches/s
-- If ratio is low, adaptive hash isn't helping
-- Disable it to free memory:
SET GLOBAL innodb_adaptive_hash_index = OFF;
The DDL Operations Risk Matrix
Not all DDL operations are created equal. Here's my battle-tested decision tree:
Need Schema Change?
|
|
┌────────────┴────────────┐
| |
Add/Drop Modify Existing
Columns? Column Type?
| |
| |
┌───────┴────────┐ |
| | |
At End At Beginning REQUIRES COPY
of Table? of Table? (Plan downtime)
| | |
| | |
INSTANT INPLACE COPY
(0.1s) (Minutes) (Hours)
| | |
✅ Do it ⚠️ Off-peak ❌ Last resort
Consider:
1. Blue-green deploy
2. Shadow table
3. Partition swap
Real-world example:
-- ❌ BAD: Change column type (requires COPY)
ALTER TABLE users
MODIFY COLUMN phone VARCHAR(20); -- Was VARCHAR(15)
-- Locks table for HOURS
-- ✅ GOOD: Add new column, migrate, drop old
ALTER TABLE users
ADD COLUMN phone_new VARCHAR(20) DEFAULT NULL,
ALGORITHM=INSTANT;
-- Migrate data in batches (no locking)
UPDATE users
SET phone_new = phone
WHERE id BETWEEN 1 AND 100000;
-- Repeat in batches...
-- Swap columns atomically
ALTER TABLE users
DROP COLUMN phone,
CHANGE COLUMN phone_new phone VARCHAR(20),
ALGORITHM=INSTANT;
The Production Optimization Playbook
Before You Touch Production
-- 1. Test DDL on a replica first
-- 2. Estimate operation time
-- Dry run with ALGORITHM check
ALTER TABLE products
ADD COLUMN new_field INT DEFAULT 0,
ALGORITHM=INSTANT; -- If this fails, it's not instant!
-- 3. Check replication lag
SHOW SLAVE STATUS\G
-- If Seconds_Behind_Master > 60, wait!
Monitor During Operation
# Terminal 1: Watch processlist
watch -n 1 "mysql -e 'SHOW PROCESSLIST' | grep -i alter"
# Terminal 2: Monitor table locks
watch -n 1 "mysql -e '
SELECT
OBJECT_NAME,
LOCK_TYPE,
LOCK_STATUS,
LOCK_MODE
FROM performance_schema.metadata_locks
WHERE OBJECT_TYPE = \"TABLE\"
ORDER BY OBJECT_NAME
'"
# Terminal 3: Monitor disk I/O
iostat -x 1
The Emergency Rollback Plan
-- If DDL goes wrong mid-operation:
-- 1. DON'T KILL IT!
-- Killing can leave table in inconsistent state
-- 2. Check progress first
SELECT * FROM performance_schema.events_stages_current
WHERE EVENT_NAME LIKE 'stage/innodb/alter%';
-- 3. If it must be killed:
-- Find the thread ID
SHOW PROCESSLIST;
-- Kill it gracefully
KILL 12345;
-- 4. Verify table integrity
CHECK TABLE products EXTENDED;
-- 5. If corrupted, restore from backup
-- (You DO have backups, right?)
The Query Patterns That Scale
Pattern 1: Covering Indexes
-- Instead of this (causes table lookups)
CREATE INDEX idx_user_email ON users(email);
SELECT user_id, username, email, created_at
FROM users
WHERE email = 'user@example.com';
-- Do this (all data in index)
CREATE INDEX idx_user_email_covering
ON users(email, user_id, username, created_at);
-- Now query reads ONLY index, never touches table
-- 10× faster for common patterns
Pattern 2: Partitioning for Time-Series Data
-- For tables that grow without bound
CREATE TABLE metrics (
id BIGINT AUTO_INCREMENT,
metric_name VARCHAR(100),
value DOUBLE,
recorded_at DATETIME,
PRIMARY KEY (id, recorded_at)
)
PARTITION BY RANGE (TO_DAYS(recorded_at)) (
PARTITION p_2024_01 VALUES LESS THAN (TO_DAYS('2024-02-01')),
PARTITION p_2024_02 VALUES LESS THAN (TO_DAYS('2024-03-01')),
PARTITION p_2024_03 VALUES LESS THAN (TO_DAYS('2024-04-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Add new partition monthly (instant)
ALTER TABLE metrics
ADD PARTITION (
PARTITION p_2024_04 VALUES LESS THAN (TO_DAYS('2024-05-01'))
);
-- Drop old data (instant)
ALTER TABLE metrics DROP PARTITION p_2024_01;
-- Query benefits: Partition pruning!
SELECT AVG(value)
FROM metrics
WHERE recorded_at >= '2024-03-01'
AND recorded_at < '2024-04-01';
-- Only reads p_2024_03 partition
Pattern 3: The Histogram Optimization
MySQL 8.4 introduced histograms for better query planning:
-- Generate statistics for optimizer
ANALYZE TABLE products
UPDATE HISTOGRAM ON category_id, price_range;
-- Query optimizer now knows data distribution
EXPLAIN FORMAT=JSON
SELECT * FROM products
WHERE category_id = 42
AND price_range BETWEEN 100 AND 500\G
-- Shows accurate row estimates!
The Metrics That Actually Matter
Stop obsessing over vanity metrics. These are the numbers that predict production problems:
-- 1. Buffer Pool Hit Ratio (should be >99%)
SELECT
(1 - (
SELECT VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads'
) / (
SELECT VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests'
)) * 100 AS hit_ratio_pct;
-- 2. Row Lock Wait Time (should be <100ms avg)
SELECT
EVENT_NAME,
COUNT_STAR,
AVG_TIMER_WAIT / 1000000000 AS avg_wait_ms
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'wait/lock/table/%'
ORDER BY AVG_TIMER_WAIT DESC;
-- 3. Temp Table Usage (minimize this!)
SELECT
SUM(created_tmp_tables) AS total_tmp,
SUM(created_tmp_disk_tables) AS disk_tmp,
ROUND(100 * SUM(created_tmp_disk_tables) /
SUM(created_tmp_tables), 2) AS disk_tmp_pct
FROM performance_schema.events_statements_summary_global_by_event_name;
What I Wish Someone Had Told Me
Five years into optimizing MySQL, here's what I know:
Instant DDL is real, but it's not magic. Test every operation on staging first.
Index everything important, but not everything. Each index costs write performance.
Your buffer pool is probably too small. And your I/O threads are probably too few.
Partitioning isn't for performance—it's for management. Drop old data instantly instead of deleting billions of rows.
The query that's fast today will be slow tomorrow. Monitor query performance over time.
EXPLAIN is your best friend. If you're not analyzing query plans, you're flying blind.
Disk I/O optimization beats query optimization 9 times out of 10. Get your data in RAM.
Your Turn
That 47-million-row instant ALTER still feels like magic every time I see it. What's your MySQL optimization story? Have you tried instant DDL in production? What broke? What worked?
Drop your experiences in the comments. Especially the disasters—those are the best stories.
And if you're still planning 6-hour maintenance windows for simple column additions in 2025... maybe it's time to upgrade. Your future self will thank you.
Further Reading
- MySQL 8.4 Online DDL Operations
- InnoDB Buffer Pool Optimization
- Query Optimization Guide
- Percona Toolkit (Essential for production MySQL)
Article Metadata
Cover Image Concept: Split-screen animation—left side shows a traditional progress bar crawling at 2% while a developer ages visibly, right side shows "100% Complete" appearing instantly with a stunned developer holding coffee mid-sip.
Meta Description: "I watched MySQL 8.4 add columns to 47 million rows in under a second. Here's everything you need to know about instant DDL, online index creation, and disk I/O optimization that actually works in production."
Tags: #mysql #database #performance #ddl
Top comments (0)