A good reminder that we need to be ready/aware for/of possible chaos. I res about that recently in DevOps handbook. It really makes sense if you have many microservices running.



See recently I got Kafka/Zookeeper on K8s running and things went smooth. Now I played that Chaos and went ahead and deleted random zk kf pods. Things were good until I found out about wait times and rebalancing etc. Now imagine introducing a lag of 1-2min in prod with say 500 pods. Disaster. :)

