DEV Community

Discussion on: Cascade of doom: JIT, and how a Postgres update led to 70% failure on a critical national service

Collapse
 
xenatisch profile image
Pouria Hadjibagheri • Edited

We have all that in place, and it didn't help. It's a bit more complicated when you get 50-60 million requests a day, of which 90% succeed. The issue was noticeable at release time. There were alerts regarding some outliers, but nothing out of the ordinary. You don't get the whole back-pressure issue until you sustain heavy traffic.

Also remember that most services were fine for a few days... until we had an increase in demand... and the increase was going from 45 million a day to 50 million a day.

We also did load testing, and it didn't catch it. As I said, it's a bit more complicated than it might appear.