Taufiq Abdullah

Posted on Feb 25

When Disk Space is Not Enough: A Lesson in Challenging MySQL Database Cleansing

#database #tutorial #mysql #devops

Database maintenance is rarely a walk in the park, especially when dealing with high-volume transactional data. Recently, We was tasked with cleansing a MySQL database that had ballooned to 220GB. The mission was simple: remove data from 2018 to 2023 across four tables to reclaim space and improve performance.

However, as any Dev knows, "simple" tasks often hide complex traps. Here is how we navigated storage limits and technical hurdles to get the job done.

The Challenge: The 200GB "Elephant" in the Room

Out of the 220GB total size, one single table accounted for 200GB. This table was heavy with transactional records and—the real culprit—BLOB attachments. Our goal was to keep only the data from 2024 onwards.

Phase 1: The "Clone and Swap" Strategy (The Ideal Plan)

Our first approach was a classic "blue-green" table migration:

Clone: Create a clone of the existing tables with a new name.
Filter: Insert only the 2024–present data into the new tables.
Swap: Rename the existing tables and swap them with the new ones.
Cleanup: Drop the old tables once verified.

The Pros: It’s safe. If something goes wrong, the original data remains untouched as a "checkpoint."
The Result in Dev: Flawless. Despite the time required for backup and restore, it worked exactly as expected.

The Production Reality Check

When we moved to production, we hit a brick wall. The production server had a 500GB total capacity, but only 100GB of free space remained. Trying to clone a 200GB table into a 100GB hole is mathematically impossible. We needed a Plan B.

Phase 2: The "Chunked Delete" Strategy (The Safe Bet?)

We pivoted to a more granular approach: deleting data directly from the existing tables. To prevent long-running queries or locking the database, we followed these steps:

Deleted data in year-by-year ranges (2018, 2019, etc.).
Used Bash scripts to automate chunked deletions with LIMIT clauses to keep the system responsive and avoid "hanging" the DB.

The Trap: The `OPTIMIZE TABLE` Paradox

The deletion worked, but MySQL doesn't automatically shrink the file size on disk after a DELETE (due to the "High Water Mark" and data fragmentation). To actually reclaim the 100GB+ of space, we needed to run OPTIMIZE TABLE.

The Critical Oversight: OPTIMIZE TABLE works by creating a temporary copy of the table. Even though we had deleted half the data, the operation still required enough free space to rewrite the entire table. Once again, our 100GB of free space was the bottleneck. We were stuck.

Phase 3: The "Off-Site Surgery" (The Final Solution)

Desperate times call for creative architecture. We decided to perform the "heavy lifting" on a separate environment.

Export: We dumped the massive 200GB table and moved it to a secondary server with enough storage.
Cleanse & Shrink: On the secondary server, we performed the deletions and ran OPTIMIZE TABLE.
The Result: The optimized table dropped significantly below 100GB.
Import: We transferred the now-compacted table back to the production server under a new name.
The Final Swap: Since the new table was now small enough to fit in the 100GB free space, we successfully swapped it with the original and dropped the old 200GB giant.

Key Takeaways for us

Free Space is Your Best Friend: Always check your df -h before planning a migration. Most MySQL maintenance operations (like OPTIMIZE or ALTER) require at least 2x the size of your largest table in free space.
Optimize with Caution: Remember that OPTIMIZE TABLE is not an "in-place" fix; it is a rebuild.
Think Outside the Box: If your local disk is full, use a secondary staging server to process the data before bringing it back home.

Database cleansing is as much about managing infrastructure as it is about writing SQL. Have you ever faced a storage "deadlock" like this? I’d love to hear how you handled it in the comments!

DEV Community

When Disk Space is Not Enough: A Lesson in Challenging MySQL Database Cleansing

The Challenge: The 200GB "Elephant" in the Room

Phase 1: The "Clone and Swap" Strategy (The Ideal Plan)

The Production Reality Check

Phase 2: The "Chunked Delete" Strategy (The Safe Bet?)

The Trap: The `OPTIMIZE TABLE` Paradox

Phase 3: The "Off-Site Surgery" (The Final Solution)

Key Takeaways for us

Top comments (0)

The Challenge: The 200GB "Elephant" in the Room

Phase 1: The "Clone and Swap" Strategy (The Ideal Plan)

The Production Reality Check

Phase 2: The "Chunked Delete" Strategy (The Safe Bet?)

The Trap: The OPTIMIZE TABLE Paradox

Phase 3: The "Off-Site Surgery" (The Final Solution)

Key Takeaways for us

The Trap: The `OPTIMIZE TABLE` Paradox