DEV Community

Discussion on: Cascade of doom: JIT, and how a Postgres update led to 70% failure on a critical national service

 
xenatisch profile image
Pouria Hadjibagheri • Edited

I don't disagree with you at all. All I'm saying is that there were mitigating circumstances. We had issues that we couldn't resolve in the previous version, so it was a catch-22, and a risk that on balance, I considered to be reasonable. For what it is worth, I stand by my decision to do the upgrade. Despite the issue, we have have significant improvements and we have managed to reduce the running cost quite substantially.

We have million of test cases - literally. But not matter how much we try, there would always be cases that we couldn't anticipate. That's why major services with a whole lot more resource than that which we can imagine break every once in a while.

Unfortunately, things happen, and all we can do is learn from them.

Thank you again.