DEV Community

Cover image for AWS DevOps Pro Certification Blog Post Series: High Availability, Fault Tolerance and Disaster Recovery
Mark Sta Ana
Mark Sta Ana

Posted on • Originally published at on

AWS DevOps Pro Certification Blog Post Series: High Availability, Fault Tolerance and Disaster Recovery

Photo by Emiel Molenaar on Unsplash

What does the exam guide say?

To pass this domain, you'll need to know the following:

  • Determine appropriate use of multi-AZ versus multi-region architectures
  • Determine how to implement high availability, scalability, and fault tolerance
  • Determine the right services based on business needs (e.g., RTO/RPO, cost)
  • Determine how to design and automate disaster recovery strategies
  • Evaluate a deployment for points of failure

This domain is 16% of the overall mark for the exam.

What whitepapers are relevant?

According to the AWS Whitepapers page we should look at the following documents:

What services and products covered in this domain?

  • AWS Single Sign-On is Amazon's managed SSO service allow your users to sign in to AWS and other connected services using your existing Microsoft Active Directory (AD).
  • Amazon CloudFront is a managed Content Delivery Network (CDN) service.
  • Autoscaling resources - Amazon has two offerings Amazon Autoscaling and Amazon EC2 Auto Scaling
  • Amzon Route 53 is a managed Domain Name Service (DNS).
  • Databases
    • Amazon RDS is a managed relational database service with a large choice of engines: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database and SQL Server.
    • Amazon Aurora is part of the RDS offering but is unique in that it provides compatibility with MySQL and PostgreSQL engines whilst outperforming them considerably (5x for MySQL and 3x for PostgreSQL).
    • Amazon DynamoDB is a managed NoSQL (non-relational) database service that can be used for storing key-value pairs or document based records.

What about other types of documentation?

If you have the time, by all means, read the User Guides, but they are usually a couple of hundred pages.

Alternatively, get familiar with the services using the FAQs:

You're all expected to know the APIs

Before you panic, you'll start to spot a pattern with the API verbs.

And the CLI commands

As with the API, there are patterns to the commands.

High Availability, Fault Tolerance and Disaster Recovery, oh my!

Let's the basics out of the way and discuss the core concepts around this domain.

I'm going to use an excellent example provided by Patrick Benson in his blog post: The Difference Between Fault Tolerance, High Availability, & Disaster Recovery

An airplane has multiple engines and can operate with the loss of one or more engines. The design of the airplane has been made it resilient to falling out of the sky because of engine failure. This design is fault tolerant.

In terms of infrastructure, this is likely to be a managed service like RDS, where under the hood the database engine has multiple disks and CPUs to cope with catastrophic failure.

Whereas spare tire in car, isn't fault tolerant i.e. you have to stop change the tire, but having the spare tire in the first place makes the car still highly available. In terms of infrastructure is any type of technology like an autoscaling group.

It's very common for a solution to implement a system that is fault tolerant (resilience) and highly available (scalable).

Finally, ejector seats in Fighter aircraft are disaster recovery (DR) measure. The goal is to preserve the pilot, or in our case, the service after all other measures have failed (Fault Tolerance and HA).

Often in terms of infrastructure, this might be a standby infrastructure or database replica in a different AWS region and using Route 53 to point to the stand by infrastructure. Whilst it's still common for DR strategies to be manual, for this domain we'll be expected to provide an automated solution.

Unsplash path (what terms I used to get to the cover image): airplane

To go to the next part of the series, click on the grey dot below which is next to the current marker (the black dot).

Top comments (0)