From Bedroom Disasters to Cloud Resilience: Explaining AWS DR Strategies To Anyone

#aws #resilience #cloud #beginners

Disclaimer: While this article simplifies DR strategies, it's important to remember that DR is a complex topic with many nuances.

In recent years, I've come to realise that concepts many of us in information technology take for granted aren't always common knowledge among everyone working in the technology industry. It highlights an ongoing need in the tech industry to explain complex ideas clearly to broad audiences with little to no exposure to the ever-evolving technology world.

I have recently started writing articles to explain complex ideas in a way everyone can easily understand. My goal is to make technical subjects understandable to anyone. In this article, I aim to explain Amazon Web Services (AWS) disaster recovery strategies through a household analogy. By relating cloud computing concepts to familiar scenarios, I hope to present AWS concepts in a simple yet insightful way for all readers. I hope to bridge knowledge gaps and demystify cloud technology for the non-techie person.

To begin with, below are different DR strategies that can be implement in an AWS cloud. Namely:

Backup and Restore
Pilot Light
Warm Standby
Multi-site Active/Active
Multi AZ deployment

Ref

Now we will go through the house analogy to explain them individually. We will start with the Multi AZ deployment strategy.

Single AWS Region - Multi AZ Deployment Strategy

If you are wondering what a region is, I have written an article called "AWS Is A Zoo: Anyone Can Navigate the Cloud Jungle!" It explains what a region is using the Zoo analogy.

Back to our DR strategy, Imagine you have a house with three bedrooms. You and your partner occupy one room, while your two kids have their separate rooms. Unfortunately, while you were all out for dinner, the roof of one of your child's bedrooms collapses (as shown in the image above). Luckily, as your house has three rooms, with some adjustments like moving a bed or using an air mattress, your children can share a room temporarily while the repairs are being made to the other room. You won't have to make many changes to address the issue of where your kid would sleep.

This scenario resembles having a single AWS region with multiple Availability Zones (AZ). In case of a disaster such as a natural calamity or a technical failure that destroys one physical data centre (AZ), having your workload distributed across various Availability Zones within the same AWS Region can assist you in weathering the storm, whether it's a natural disaster or a technical glitch. If one of the AZs is unavailable, at least your workload can continue operating in the other AZs.

Multi AWS Region Strategy

Let's imagine you have two homes: a primary residence and a vacation home. These will be used to explain the remaining AWS disaster recovery strategies.

Backup and Restore Strategy

In your primary house, you have a fireproof vault where you regularly keep copies of your important documents such as birth certificate, driving licence and passport. In the event of a fire, you can retrieve these documents to identify yourself or use them for other purposes, such as home insurance claims.

Similarly, on AWS, the back and restore (DR) strategy involves regularly backing up your data to the cloud. This involves regularly backing up your data to the cloud. In the event of any disasters, you can use these backups to restore lost data and applications. This strategy can be applied to both single-region and multi-region implementations. For a single region, you can copy the data to a different account, while for a multi-region, you can copy the data into your secondary region and retrieve it when needed.

Pilot Light Strategy

Imagine a vacation home that you plan to use only occasionally. It's not fully furnished or equipped for daily living; it has a minimal/basic structure in place, just like in the picture above. You could make it habitable with some effort, but it's not ready for immediate occupancy.

This analogy closely resembles the pilot light disaster recovery (DR) strategy on AWS. The infrastructure in the recovery region is minimalistic, similar to the basic structure of a vacation home. It provides the core components needed to run your applications but lacks the fully configured environment required for production workloads. When a disaster occurs, you must provision additional resources, configure applications, and perform other manual tasks before the recovery region can fully handle production traffic.

Warm Standby Strategy

With the warm standby approach, your vacation home is fully built but needs to be fully furnished and stocked. It is ready for immediate use, though it may not have all the amenities and personal touches of your primary residence.

Similarly, the warm standby DR strategy on AWS involves maintaining a scaled down, but fully functional, copy of your production environment in the recovery region. This replica is kept up-to-date with data changes, but it may not be running at full capacity or serving all application components. When a disaster occurs, you can quickly failover to recovery region, minimising downtime.

Multi-Site Active/Active Strategy

With the active/active approach, both your primary residence and vacation home are fully equipped and ready for daily use. You can comfortably switch between them depending on your needs or preferences. You can have your family living in your vacation home while you reside in your primary home or vice versa.

The multi-site active/active DR strategy on AWS mirrors this concept. You have two production environments running simultaneously in different regions, each serving a portion of your application traffic. This strategy offers the highest availability and lowest recovery time objective but comes with the highest cost and complexity.

Closing Thoughts

In this article, I aimed to explain key AWS disaster recovery concepts in an easy-to-understand manner. By comparing cloud computing strategies to everyday household scenarios, my goal was to make complex technical ideas simple and accessible. Though concepts such as Availability Zones and recovery time objectives may be familiar to those with a technology background, they can be challenging for those without a tech background.

I hope that these simplified explanations shed light on AWS disaster recovery in a way that resonates with everyone. If this piece has helped demystify even one core concept for you, then it has achieved its purpose.