Rajan Prasad

Posted on Oct 5, 2020

5 Pillars of Well-Architected Framework

#aws #bestpractises #architecture

Creating a software system is a lot like constructing a building. If the foundation is not solid, structural problems can undermine the integrity and function of the building.
In this article, we're going to talk about the design principles we can follow to build a future proof large scale software. The concepts are from *AWS Well-Architected framework * whitepaper. This whitepaper inspires to learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. It provides a way to consistently measure your architectures against best practices and identify areas for improvement. I'll be trying to summarize the Whitepaper

So let's first quickly sum up the Guiding Design Principles:

Stop guessing capacity needs: Scale up & Down as required
Automate everything: Automated systems ensure consistency & reliability
Test at scale: Test an accurate replica of production on-demand
Adapt & Evolve: Adapt the architecture as needed to meet new challenges

The framework is based on 5 pillars:

1). Operational Excellence
2). Cost optimization
3). Reliability
4). Performance Efficiency
5). Security

Operational Excellence

The main emphasis of this pillar is: Does your architecture work ? Will it continue to ?
Let's look at this pillar specific principles:

All operations are code
Document is updated automatically
Make smaller changes you can roll back
Iterate...a lot
Expect things to go sideways

Cost Optimization

Emphasis: Spend only what you have to
Pillar specific principles:

Consumption based pricing
Measure efficiency constantly

Reliability:

Emphasis: ** Will this system work consistently & recover quickly ?**
Pillar specific principles:

Recover from issues automatically
Scale horizontally first for resilience
Reduce idle resources
Manage change through automation

Performance Efficiency

Emphasis: Remove bottlenecks, reduce waste
Pillar specific principles:

Reduce latency
Serverless

Security

Emphasis: *Does this system work only as intended? *
Pillar specific principles:

Automate security tasks
Encrypt data in transit and at rest
Know who did what when
Identities have the least privileges required

Operational Excellence In Depth

Operational excellence is the ability to run systems and gain insights into their operations in order to deliver business value, and to continuously improve supporting processes and procedures. The 3 Phases of Operational Excellence

Prepare-Prioritize: Prioritize to align with business priorities

What is the business goal ?
What are the critical pieces need to meet that goal ?
Any compliance restrictions/requirements ?
Dependencies between services ?

Design your architecture to support business Priorities

Is the design observable ?
Are your logs & observations actionable ?

Is your workload ready to go live ?

Are your processes consistent ?
Is operational code properly managed ?
Are tests in place ?
Anticipate failure ?
Ensure your workload is actually working

Shit happens. Be ready.

Anticipate planned & unplanned events
Respond in code
Connect observations with 3rd party tools as needed

Evolve

Learn from success & failure
Post-event, have runbooks changed ?
Test assumptions
Experiment early and often find better solutions

Cost

Use the appropriate resources & configurations
Provision to current needs with an eye to future
Right size to lowest resource that meets needs
Use data to choose purchase options
Optimize by geography
Optimize data transfer
Know how much you're spending and where
Continuously work to maximize value delivered
Align utilization with requirements
Report and validate findings
Evaluate new services for value

** Awareness of spend is key to maximizing value **

Reliability

Reliability is the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions.

Scale horizontally first for resilience
Reduce idle resources
Manage change through automation

Limit: Understand default & requested resources limit
Networking: Understand topology, bandwidth & latency
Availability: Ensure your application is ready for business use

Ensure your application is ready for business use

Can users access your application
Deploy without issue
Can you push issue to planned downtime
Can your application withstand portal outages ?

Performance Efficiency

Selection:

Is this the optimal solution for this workload ?
What type of compute best suits ?
Which data store is ideal for this workload ?
Does your network design complement compute & data store choices ?

Review:

Continuously ensure choices work for your workload
Is infrastructure stored as code ?
Are deployments simple & automated ?
Can benchmarks be taken automatically ?

Monitoring:

Use active & passive monitoring where appropriate
Understand the five phases of monitoring (Generation, Aggregation, Real-time Processing, Storage, Analysis)
Create actionable metrics

Trade of -> You can't have it all

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

DEV Community

5 Pillars of Well-Architected Framework

Top comments (0)

A Workflow Copilot. Tailored to You.

Read next

How to connect to AWS OpenSearch or Elasticsearch clusters using python

Deploying a Globally Accessible Web Application with Disaster Recovery

5 Signs You’ve Built a Secretly Bad Architecture (And How to Fix It)

Interview Questions on AWS Networking: VPC, Load Balancers, and Auto Scaling