The Resilient Architecture Collection

#aws #sre #computerscience #devops

A list of my resiliency related blog posts.

Series on Resilient Architecture

Resilient systems embrace the idea that failures are typical, and that it’s entirely OK to run applications in what we call partially failing mode. While not suitable for life-critical applications, running in a partially failing mode is a viable option for most web applications. Of course, I’m not saying it doesn’t matter if your system fails. It does, and it might result in lost revenue. But, it’s probably not life-critical.

Building resilient architectures has had its ups-and-downs, some 1 am wake-up calls, some Christmases spent debugging, some “I’m done, I quit” … but most of all, it’s been an incredible learning experience and journey.

This blog post is a collection of tips and tricks that have served me well throughout this journey, and I hope they will help you well too.

DEV Community

The Resilient Architecture Collection

Series on Resilient Architecture

Part 1: Embracing failure at scale

Part 2 — Avoiding Cascading Failures

Part 3 — Preventing Service Failures with Health Check

Part 4 — Caching for Resiliency

Top comments (0)

Read next

Building the Backbone: Entities Part 2, Agent

AWS re:Invent 2024 is a history now

Create Map Images Using the Static Map Function of Amazon Location Service v2

Amazon Q Developer Tips: No.15 CHat Orientated Programming (CHOP)