DEV Community

Cover image for How we caught a silent IO storm before it hit production 🌩️
Flora Brandão for Upsun

Posted on

How we caught a silent IO storm before it hit production 🌩️

Migrating to Debian 12 should have been a routine update but we were met with a silent IO storm that slammed our disks with over 500 MB/s of writes. MariaDB stalled and Redis began complaining about slow fsync as our storage ground to a halt.

  • The kernel changed how it divides dirty page budgets across containers.
  • Moving from cgroup v1 to v2 altered the writeback path in unexpected ways.
  • A tiny 0.6 MB limit in the budget became a total production catastrophe.
  • We had to dig into how dirty page writeback actually works to find the fix.

It was a stressful way to learn about cgroup v2 and kernel defaults. If you are planning a migration or seeing weird IO behavior, this might save your production environment.

Read the full article here:

Instant data cloning was a bet. AI agents are the payoff. - Upsun Developer

How Upsun's data-first architecture, built a decade ago for CMS and e-commerce needs, turns out to be the exact infrastructure AI agents need: instant, isolated, production-identical environments with real data.

favicon developer.upsun.com

Top comments (0)