Database maintenance is rarely a walk in the park, especially when dealing with high-volume transactional data. Recently, We was tasked with cleansing a MySQL database that had ballooned to 220GB. The mission was simple: remove data from 2018 to 2023 across four tables to reclaim space and improve performance.
However, as any Dev knows, "simple" tasks often hide complex traps. Here is how we navigated storage limits and technical hurdles to get the job done.
The Challenge: The 200GB "Elephant" in the Room
Out of the 220GB total size, one single table accounted for 200GB. This table was heavy with transactional records and—the real culprit—BLOB attachments. Our goal was to keep only the data from 2024 onwards.
Phase 1: The "Clone and Swap" Strategy (The Ideal Plan)
Our first approach was a classic "blue-green" table migration:
- Clone: Create a clone of the existing tables with a new name.
- Filter: Insert only the 2024–present data into the new tables.
- Swap: Rename the existing tables and swap them with the new ones.
- Cleanup: Drop the old tables once verified.
- The Pros: It’s safe. If something goes wrong, the original data remains untouched as a "checkpoint."
- The Result in Dev: Flawless. Despite the time required for backup and restore, it worked exactly as expected.
The Production Reality Check
When we moved to production, we hit a brick wall. The production server had a 500GB total capacity, but only 100GB of free space remained. Trying to clone a 200GB table into a 100GB hole is mathematically impossible. We needed a Plan B.
Phase 2: The "Chunked Delete" Strategy (The Safe Bet?)
We pivoted to a more granular approach: deleting data directly from the existing tables. To prevent long-running queries or locking the database, we followed these steps:
- Deleted data in year-by-year ranges (2018, 2019, etc.).
- Used Bash scripts to automate chunked deletions with
LIMITclauses to keep the system responsive and avoid "hanging" the DB.
The Trap: The OPTIMIZE TABLE Paradox
The deletion worked, but MySQL doesn't automatically shrink the file size on disk after a DELETE (due to the "High Water Mark" and data fragmentation). To actually reclaim the 100GB+ of space, we needed to run OPTIMIZE TABLE.
The Critical Oversight:
OPTIMIZE TABLEworks by creating a temporary copy of the table. Even though we had deleted half the data, the operation still required enough free space to rewrite the entire table. Once again, our 100GB of free space was the bottleneck. We were stuck.
Phase 3: The "Off-Site Surgery" (The Final Solution)
Desperate times call for creative architecture. We decided to perform the "heavy lifting" on a separate environment.
- Export: We dumped the massive 200GB table and moved it to a secondary server with enough storage.
- Cleanse & Shrink: On the secondary server, we performed the deletions and ran
OPTIMIZE TABLE. - The Result: The optimized table dropped significantly below 100GB.
- Import: We transferred the now-compacted table back to the production server under a new name.
- The Final Swap: Since the new table was now small enough to fit in the 100GB free space, we successfully swapped it with the original and dropped the old 200GB giant.
Key Takeaways for us
-
Free Space is Your Best Friend: Always check your
df -hbefore planning a migration. Most MySQL maintenance operations (likeOPTIMIZEorALTER) require at least 2x the size of your largest table in free space. -
Optimize with Caution: Remember that
OPTIMIZE TABLEis not an "in-place" fix; it is a rebuild. - Think Outside the Box: If your local disk is full, use a secondary staging server to process the data before bringing it back home.
Database cleansing is as much about managing infrastructure as it is about writing SQL. Have you ever faced a storage "deadlock" like this? I’d love to hear how you handled it in the comments!




Top comments (0)