We had a wide MySQL table — 8 million rows, 8 GB on disk, mostly index — about to grow 10× from a historical backfill. The instinct was to normalise: pull repeated strings into dimension tables. Classic relational hygiene. But twenty minutes spent measuring cardinality and average length per column pointed somewhere different.
The biggest single column by raw bytes was a high-cardinality, near-unique click identifier from ad platforms. Normalising it would have moved the bytes around without removing them. Meanwhile InnoDB's ROW_FORMAT=COMPRESSED finds patterns within strings — particularly devastating on indexed URL-shaped columns — and it ships in a one-line ALTER TABLE with ALGORITHM=INPLACE, LOCK=NONE, no application code changes, fully reversible.
Result on the production table: 8.18 GB → 3.63 GB, with the index side compressing 59%. The full post covers the cardinality measurements, why "won't reads get slower?" usually goes the right way for analytics workloads, and when to actually reach for normalisation.
Originally published at andreasbergstrom.dev — read the full post there.
Top comments (0)