A list of my resiliency related blog posts.
Series on Resilient Architecture
Resilient systems embrace the idea that failures are typical, and that it’s entirely OK to run applications in what we call partially failing mode. While not suitable for life-critical applications, running in a partially failing mode is a viable option for most web applications. Of course, I’m not saying it doesn’t matter if your system fails. It does, and it might result in lost revenue. But, it’s probably not life-critical.
Building resilient architectures has had its ups-and-downs, some 1 am wake-up calls, some Christmases spent debugging, some “I’m done, I quit” … but most of all, it’s been an incredible learning experience and journey.
This blog post is a collection of tips and tricks that have served me well throughout this journey, and I hope they will help you well too.
Part 1: Embracing failure at scale
In part 1 of this series, I focus on the infrastructure layer, redundancy, immutability, and the concept of infrastructure as code.
Patterns for Resilient Architecture — Part 1
Part 2 — Avoiding Cascading Failures
In part 2, I focus on cascading failure prevention. Cascading failure happen when one part of a system experiences a local failure and takes down the entire system through inter-connections and failure propagation.
Patterns for Resilient Architecture — Part 2
Part 3 — Preventing Service Failures with Health Check
In part 3, I discuss the importance and the challenge of health checks — striking a balance between failure detection and reaction.
Patterns for Resilient Architecture — Part 3
Part 4 — Caching for Resiliency
In part 4, I talk about caching. While caching is often associated with accelerating content delivery, it is also essential from a resiliency standpoint.
Top comments (0)