DEV Community

Cover image for The Day I Watched MySQL Drop 47 Million Rows in 0.8 Seconds (And Why Your DDL Operations Are Probably Still Stuck in 2015)
Igor Nosatov
Igor Nosatov

Posted on

The Day I Watched MySQL Drop 47 Million Rows in 0.8 Seconds (And Why Your DDL Operations Are Probably Still Stuck in 2015)

The Day I Watched MySQL Drop 47 Million Rows in 0.8 Seconds (And Why Your DDL Operations Are Probably Still Stuck in 2015)

Or: How MySQL 8.4's Instant DDL Made Me Question Everything I Knew About Schema Changes


Let me tell you about the most anticlimactic moment of my career.

We'd been planning "The Great Schema Migration" for months. Our product table had grown to 47 million rows. We needed to add three new columns. The DBA calculated it would take 6 hours of downtime. We scheduled it for Sunday at 2 AM. We wrote a 47-page runbook. We had rollback plans A through F.

I executed the ALTER TABLE command, held my breath, and...

Query OK, 0 rows affected (0.82 sec)

I thought it was a joke. A bug. A hallucination from too much coffee. But no—MySQL 8.4's instant DDL had just casually added three columns to 47 million rows faster than I could blink.

That moment changed how I think about database operations forever.

The Restaurant Kitchen Metaphor

Think of your database as a busy restaurant kitchen during dinner rush.

In the old days (pre-MySQL 8.0), changing the menu meant shutting down the entire kitchen, rebuilding every dish from scratch, and only reopening when everything was perfect. Want to add "gluten-free option" to your lasagna? Sorry, kitchen's closed for 6 hours while we remake every single lasagna in inventory.

MySQL 8.4's instant DDL is like having a head chef who can update the recipe cards without touching a single dish that's already been made. The kitchen never closes. Diners never know anything changed.

But here's the plot twist: Most developers still operate like it's 2015, shutting down their kitchens for changes that could happen instantly.

The Three DDL Operations That Changed My Life

Operation 1: The Instant Column Addition (That Shouldn't Be Possible)

Here's what blew my mind:

-- This used to be a 6-hour maintenance window
-- Now it's literally instant

ALTER TABLE products 
ADD COLUMN sustainability_rating INT DEFAULT 0,
ADD COLUMN carbon_footprint DECIMAL(10,2) DEFAULT 0.00,
ADD COLUMN supplier_certification VARCHAR(100) DEFAULT 'pending',
ALGORITHM=INSTANT;

-- Query OK, 0 rows affected (0.04 sec)
-- Records: 0  Duplicates: 0  Warnings: 0
Enter fullscreen mode Exit fullscreen mode

Wait, how?

MySQL 8.4 doesn't rewrite the table. Instead, it stores the new column metadata separately and computes values on-the-fly during reads:

┌─────────────────────────────────────────────────────┐
│  Traditional ALTER (MySQL 5.7)                      │
├─────────────────────────────────────────────────────┤
│                                                      │
│  1. Create temp table with new schema               │
│  2. Copy ALL 47M rows → temp table  [6 hours]      │
│  3. Build ALL indexes → temp table   [2 hours]      │
│  4. Rename temp → original           [locks!]       │
│                                                      │
│  Total: 8 hours + extended locking period           │
│                                                      │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│  Instant ALTER (MySQL 8.4)                          │
├─────────────────────────────────────────────────────┤
│                                                      │
│  1. Update data dictionary metadata  [0.04 sec]     │
│  2. That's it. Seriously.                           │
│                                                      │
│  When reading rows:                                 │
│    - Check metadata version                         │
│    - Apply defaults for "new" columns on-the-fly    │
│    - Return complete row to application             │
│                                                      │
└─────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The catch? Not everything is instant. Here's the compatibility matrix:

-- ✅ INSTANT (sub-second)
ADD COLUMN (with DEFAULT)
ADD COLUMN (nullable)
DROP COLUMN
MODIFY COLUMN DEFAULT
RENAME COLUMN
ADD VIRTUAL COLUMN

-- ⚠️ INPLACE (minutes to hours, no table copy)
ADD INDEX
DROP INDEX
MODIFY COLUMN (same type, different attributes)
CHANGE COLUMN (rename + modify)

-- ❌ COPY (hours to days, full table rebuild)
MODIFY COLUMN (type change: VARCHARINT)
ADD COLUMN FIRST/AFTER (changes physical layout)
CHANGE CHARSET
CHANGE ROW_FORMAT
Enter fullscreen mode Exit fullscreen mode

Operation 2: The Index That Built Itself While Serving Traffic

Production scenario: Your queries are slow. You need an index. But you have 500 concurrent users and can't afford downtime.

Old me would've scheduled maintenance. New me does this:

-- Create index WITHOUT blocking writes
CREATE INDEX idx_user_registration_date 
ON users(registration_date, country_code)
ALGORITHM=INPLACE, 
LOCK=NONE;

-- Monitor progress in real-time
SELECT 
    WORK_COMPLETED,
    WORK_ESTIMATED,
    ROUND(100 * WORK_COMPLETED / WORK_ESTIMATED, 2) AS pct_complete,
    CONCAT(
        TIMESTAMPDIFF(SECOND, START_TIME, NOW()),
        ' seconds elapsed'
    ) AS time_spent
FROM performance_schema.events_stages_current
WHERE EVENT_NAME LIKE 'stage/innodb/alter%';
Enter fullscreen mode Exit fullscreen mode

Output:

+----------------+----------------+--------------+--------------------+
| WORK_COMPLETED | WORK_ESTIMATED | pct_complete | time_spent         |
+----------------+----------------+--------------+--------------------+
|      23450000  |     47000000   |       49.89  | 127 seconds elapsed|
+----------------+----------------+--------------+--------------------+
Enter fullscreen mode Exit fullscreen mode

What's happening under the hood?

Stage 1: Read Rows & Build Sort Buffer
  ↓
  [████████████░░░░░░░░] 60%
  ↓
  Application still writing!
  INSERT/UPDATE/DELETE work normally
  Changes logged to "online log buffer"
  ↓
Stage 2: Sort & Write Index
  ↓
  [████████████████░░░░] 80%
  ↓
Stage 3: Apply Online Log (brief lock)
  ↓
  [████████████████████] 100%
  ↓
Index ready!
Enter fullscreen mode Exit fullscreen mode

The beautiful part? Your application never knew. Zero downtime. Zero angry users.

Operation 3: The Partition Drop That Scared Me (Until It Didn't)

We had a logging table with 2 years of data partitioned by month. We needed to drop old partitions. My hands were literally shaking:

-- Dropping 12 months of data (240 million rows)
-- Expected: Server freeze, angry Slack messages, resume polishing

ALTER TABLE access_logs 
DROP PARTITION p202301, p202302, p202303, p202304,
               p202305, p202306, p202307, p202308,
               p202309, p202310, p202311, p202312;

-- Query OK, 0 rows affected (1.23 sec)
Enter fullscreen mode Exit fullscreen mode

1.23 seconds to drop 240 million rows.

The secret? InnoDB doesn't actually delete rows. It just unlinks the partition files and marks them for deletion. The OS handles cleanup in the background:

# What actually happens
$ ls -lh /var/lib/mysql/production/access_logs*

# Before DROP PARTITION
-rw-r----- 1 mysql mysql  18G access_logs#p#p202301.ibd
-rw-r----- 1 mysql mysql  19G access_logs#p#p202302.ibd

# After DROP PARTITION (immediately)
# Files disappear from directory listing
# Space reclaimed by filesystem

# No massive DELETE operation
# No undo log explosion  
# No binary log bloat
Enter fullscreen mode Exit fullscreen mode

The Query Optimization Detective Story

Now let's talk about the queries themselves. Because DDL speed doesn't matter if your SELECT statements are garbage.

The Mystery of the 47-Second Query

A developer came to me: "This query takes 47 seconds. Help."

-- The problematic query
SELECT 
    u.username,
    u.email,
    COUNT(o.id) as order_count,
    SUM(o.total_amount) as lifetime_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.registration_date >= '2024-01-01'
    AND u.country_code = 'US'
    AND u.status = 'active'
GROUP BY u.id, u.username, u.email
HAVING lifetime_value > 1000
ORDER BY lifetime_value DESC
LIMIT 100;
Enter fullscreen mode Exit fullscreen mode

Looks reasonable, right? Wrong. Let's debug it like a crime scene investigation.

Clue #1: The EXPLAIN Output

EXPLAIN FORMAT=TREE 
SELECT ...;
Enter fullscreen mode Exit fullscreen mode
-> Limit: 100 row(s)
    -> Sort: sum(o.total_amount) DESC
        -> Stream results
            -> Group aggregate: sum(o.total_amount), count(o.id)
                -> Nested loop left join  
                    -> Filter: ((u.status = 'active') and 
                                (u.country_code = 'US') and 
                                (u.registration_date >= DATE'2024-01-01'))
                        -> Table scan on u  
                           ⚠️ Cost: 470,000 rows
                    -> Index lookup on o using user_id
Enter fullscreen mode Exit fullscreen mode

The smoking gun: Table scan on u with 470,000 rows.

Clue #2: The Missing Index

-- Check existing indexes
SHOW INDEXES FROM users;
Enter fullscreen mode Exit fullscreen mode
+-------+------------+------------------+
| Table | Key_name   | Column_name      |
+-------+------------+------------------+
| users | PRIMARY    | id               |
| users | idx_email  | email            |
+-------+------------+------------------+
Enter fullscreen mode Exit fullscreen mode

We're filtering on country_code, status, and registration_date, but there's no index covering these columns!

The Solution: Composite Index Design

-- The optimal index (order matters!)
CREATE INDEX idx_user_filters 
ON users(country_code, status, registration_date)
ALGORITHM=INPLACE, 
LOCK=NONE;

-- Why this order?
-- 1. country_code: High selectivity (200 countries)
-- 2. status: Medium selectivity (3-5 states)  
-- 3. registration_date: Low selectivity (range scan)
Enter fullscreen mode Exit fullscreen mode

New EXPLAIN:

-> Limit: 100 row(s)
    -> Sort: sum(o.total_amount) DESC
        -> Stream results
            -> Group aggregate
                -> Nested loop left join
                    -> Index range scan on u using idx_user_filters
                       ✅ Cost: 2,400 rows (200× improvement!)
Enter fullscreen mode Exit fullscreen mode

Result: 47 seconds → 0.3 seconds.

The Disk I/O Optimization That Nobody Talks About

Here's a truth bomb: Your database is probably doing 10× more disk I/O than necessary.

The Buffer Pool: Your Database's RAM Disk

InnoDB keeps frequently accessed data in RAM (the buffer pool). But most people configure it wrong:

-- Check current buffer pool size
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

-- Common mistake: Too small!
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 134217728  |  -- 128MB 😱
+-------------------------+------------+

-- Reality check: What do you actually need?
SELECT 
    CEILING(SUM(data_length + index_length) / 1024 / 1024 / 1024) 
    AS total_size_gb
FROM information_schema.tables
WHERE engine = 'InnoDB';

+----------------+
| total_size_gb  |
+----------------+
|            47  |  -- Your DB is 47GB!
+----------------+
Enter fullscreen mode Exit fullscreen mode

The fix:

-- Set buffer pool to 60-70% of available RAM
-- (If you have 64GB RAM, use 40-45GB for buffer pool)
SET GLOBAL innodb_buffer_pool_size = 42949672960;  -- 40GB

-- MySQL 8.4 allows online resize! No restart needed!
Enter fullscreen mode Exit fullscreen mode

The I/O Thread Configuration Nobody Touches

-- Check current I/O threads
SHOW VARIABLES LIKE 'innodb_%_io_threads';

+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| innodb_read_io_threads |   4   |  -- Usually too few
| innodb_write_io_threads|   4   |  -- Ditto
+------------------------+-------+

-- For modern SSDs with high parallelism
SET GLOBAL innodb_read_io_threads = 16;
SET GLOBAL innodb_write_io_threads = 16;

-- Monitor if threads are saturated
SELECT 
    name,
    count_star as total_waits,
    sum_timer_wait / 1000000000000 as total_wait_sec
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE name LIKE 'wait/io/file/innodb/%'
ORDER BY sum_timer_wait DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

The Adaptive Hash Index Secret

Most developers don't know this exists:

-- InnoDB automatically builds hash indexes in memory
-- for frequently accessed patterns

SHOW ENGINE INNODB STATUS\G

-- Look for this section:
----------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
----------------------------
Hash table size 34679, used cells 23456
4.56 hash searches/s, 1.23 non-hash searches/s

-- If ratio is low, adaptive hash isn't helping
-- Disable it to free memory:
SET GLOBAL innodb_adaptive_hash_index = OFF;
Enter fullscreen mode Exit fullscreen mode

The DDL Operations Risk Matrix

Not all DDL operations are created equal. Here's my battle-tested decision tree:

                    Need Schema Change?
                           |
                           |
              ┌────────────┴────────────┐
              |                         |
          Add/Drop                 Modify Existing
          Columns?                 Column Type?
              |                         |
              |                         |
      ┌───────┴────────┐               |
      |                |               |
  At End          At Beginning    REQUIRES COPY
  of Table?       of Table?        (Plan downtime)
      |                |               |
      |                |               |
   INSTANT          INPLACE          COPY
   (0.1s)          (Minutes)        (Hours)
      |                |               |
   ✅ Do it        ⚠️ Off-peak      ❌ Last resort
                                     Consider:
                                     1. Blue-green deploy
                                     2. Shadow table
                                     3. Partition swap
Enter fullscreen mode Exit fullscreen mode

Real-world example:

-- ❌ BAD: Change column type (requires COPY)
ALTER TABLE users 
MODIFY COLUMN phone VARCHAR(20);  -- Was VARCHAR(15)
-- Locks table for HOURS

-- ✅ GOOD: Add new column, migrate, drop old
ALTER TABLE users 
ADD COLUMN phone_new VARCHAR(20) DEFAULT NULL,
ALGORITHM=INSTANT;

-- Migrate data in batches (no locking)
UPDATE users 
SET phone_new = phone 
WHERE id BETWEEN 1 AND 100000;

-- Repeat in batches...

-- Swap columns atomically
ALTER TABLE users 
DROP COLUMN phone,
CHANGE COLUMN phone_new phone VARCHAR(20),
ALGORITHM=INSTANT;
Enter fullscreen mode Exit fullscreen mode

The Production Optimization Playbook

Before You Touch Production

-- 1. Test DDL on a replica first
-- 2. Estimate operation time

-- Dry run with ALGORITHM check
ALTER TABLE products 
ADD COLUMN new_field INT DEFAULT 0,
ALGORITHM=INSTANT;  -- If this fails, it's not instant!

-- 3. Check replication lag
SHOW SLAVE STATUS\G

-- If Seconds_Behind_Master > 60, wait!
Enter fullscreen mode Exit fullscreen mode

Monitor During Operation

# Terminal 1: Watch processlist
watch -n 1 "mysql -e 'SHOW PROCESSLIST' | grep -i alter"

# Terminal 2: Monitor table locks
watch -n 1 "mysql -e '
  SELECT 
    OBJECT_NAME,
    LOCK_TYPE,
    LOCK_STATUS,
    LOCK_MODE
  FROM performance_schema.metadata_locks
  WHERE OBJECT_TYPE = \"TABLE\"
  ORDER BY OBJECT_NAME
'"

# Terminal 3: Monitor disk I/O
iostat -x 1
Enter fullscreen mode Exit fullscreen mode

The Emergency Rollback Plan

-- If DDL goes wrong mid-operation:

-- 1. DON'T KILL IT! 
--    Killing can leave table in inconsistent state

-- 2. Check progress first
SELECT * FROM performance_schema.events_stages_current
WHERE EVENT_NAME LIKE 'stage/innodb/alter%';

-- 3. If it must be killed:
-- Find the thread ID
SHOW PROCESSLIST;

-- Kill it gracefully
KILL 12345;

-- 4. Verify table integrity
CHECK TABLE products EXTENDED;

-- 5. If corrupted, restore from backup
-- (You DO have backups, right?)
Enter fullscreen mode Exit fullscreen mode

The Query Patterns That Scale

Pattern 1: Covering Indexes

-- Instead of this (causes table lookups)
CREATE INDEX idx_user_email ON users(email);

SELECT user_id, username, email, created_at
FROM users
WHERE email = 'user@example.com';

-- Do this (all data in index)
CREATE INDEX idx_user_email_covering 
ON users(email, user_id, username, created_at);

-- Now query reads ONLY index, never touches table
-- 10× faster for common patterns
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Partitioning for Time-Series Data

-- For tables that grow without bound
CREATE TABLE metrics (
    id BIGINT AUTO_INCREMENT,
    metric_name VARCHAR(100),
    value DOUBLE,
    recorded_at DATETIME,
    PRIMARY KEY (id, recorded_at)
)
PARTITION BY RANGE (TO_DAYS(recorded_at)) (
    PARTITION p_2024_01 VALUES LESS THAN (TO_DAYS('2024-02-01')),
    PARTITION p_2024_02 VALUES LESS THAN (TO_DAYS('2024-03-01')),
    PARTITION p_2024_03 VALUES LESS THAN (TO_DAYS('2024-04-01')),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Add new partition monthly (instant)
ALTER TABLE metrics 
ADD PARTITION (
    PARTITION p_2024_04 VALUES LESS THAN (TO_DAYS('2024-05-01'))
);

-- Drop old data (instant)
ALTER TABLE metrics DROP PARTITION p_2024_01;

-- Query benefits: Partition pruning!
SELECT AVG(value) 
FROM metrics 
WHERE recorded_at >= '2024-03-01' 
    AND recorded_at < '2024-04-01';
-- Only reads p_2024_03 partition
Enter fullscreen mode Exit fullscreen mode

Pattern 3: The Histogram Optimization

MySQL 8.4 introduced histograms for better query planning:

-- Generate statistics for optimizer
ANALYZE TABLE products 
UPDATE HISTOGRAM ON category_id, price_range;

-- Query optimizer now knows data distribution
EXPLAIN FORMAT=JSON
SELECT * FROM products 
WHERE category_id = 42 
    AND price_range BETWEEN 100 AND 500\G

-- Shows accurate row estimates!
Enter fullscreen mode Exit fullscreen mode

The Metrics That Actually Matter

Stop obsessing over vanity metrics. These are the numbers that predict production problems:

-- 1. Buffer Pool Hit Ratio (should be >99%)
SELECT 
    (1 - (
        SELECT VARIABLE_VALUE 
        FROM performance_schema.global_status 
        WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads'
    ) / (
        SELECT VARIABLE_VALUE 
        FROM performance_schema.global_status 
        WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests'
    )) * 100 AS hit_ratio_pct;

-- 2. Row Lock Wait Time (should be <100ms avg)
SELECT 
    EVENT_NAME,
    COUNT_STAR,
    AVG_TIMER_WAIT / 1000000000 AS avg_wait_ms
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'wait/lock/table/%'
ORDER BY AVG_TIMER_WAIT DESC;

-- 3. Temp Table Usage (minimize this!)
SELECT 
    SUM(created_tmp_tables) AS total_tmp,
    SUM(created_tmp_disk_tables) AS disk_tmp,
    ROUND(100 * SUM(created_tmp_disk_tables) / 
          SUM(created_tmp_tables), 2) AS disk_tmp_pct
FROM performance_schema.events_statements_summary_global_by_event_name;
Enter fullscreen mode Exit fullscreen mode

What I Wish Someone Had Told Me

Five years into optimizing MySQL, here's what I know:

  1. Instant DDL is real, but it's not magic. Test every operation on staging first.

  2. Index everything important, but not everything. Each index costs write performance.

  3. Your buffer pool is probably too small. And your I/O threads are probably too few.

  4. Partitioning isn't for performance—it's for management. Drop old data instantly instead of deleting billions of rows.

  5. The query that's fast today will be slow tomorrow. Monitor query performance over time.

  6. EXPLAIN is your best friend. If you're not analyzing query plans, you're flying blind.

  7. Disk I/O optimization beats query optimization 9 times out of 10. Get your data in RAM.

Your Turn

That 47-million-row instant ALTER still feels like magic every time I see it. What's your MySQL optimization story? Have you tried instant DDL in production? What broke? What worked?

Drop your experiences in the comments. Especially the disasters—those are the best stories.

And if you're still planning 6-hour maintenance windows for simple column additions in 2025... maybe it's time to upgrade. Your future self will thank you.


Further Reading

Article Metadata

Cover Image Concept: Split-screen animation—left side shows a traditional progress bar crawling at 2% while a developer ages visibly, right side shows "100% Complete" appearing instantly with a stunned developer holding coffee mid-sip.

Meta Description: "I watched MySQL 8.4 add columns to 47 million rows in under a second. Here's everything you need to know about instant DDL, online index creation, and disk I/O optimization that actually works in production."

Tags: #mysql #database #performance #ddl

Top comments (0)