DEV Community

Sauveer Ketan
Sauveer Ketan

Posted on • Originally published at Medium

Building Better on AWS: A Practical Guide to the Well-Architected Framework

AWS is huge, hundreds of services and thousands of features. So many services, so many possibilities, and hence, so many ways to mess things up.

A misconfigured S3 bucket. That's all it took for Capital One to lose 100 million customer records in 2019. What if someone had asked the right questions about their architecture?

What if there is a free tool to systematically review your entire environment and workloads against AWS best practices? To top it up, what if AWS pays you to fix these risks?

How do we build better on AWS? That's where the AWS Well-Architected Framework comes in.

What Actually Is This Framework?

Think of the AWS Well-Architected Framework as your architectural guardrails — a set of battle-tested best practices that AWS has compiled from working with thousands of customers. It's not some rigid rulebook that you have to follow to the letter. Instead, it's more like having a seasoned architect sitting next to you, asking the right questions about your workloads and environment and providing you feedback and action plan.

The framework is built on six pillars, and each one addresses a crucial aspect of your cloud architecture. Each of these contain design principles, questions, and best practices. Design principles are high level guidelines and best practices are actual recommendations.

Operational Excellence is about deploying, running and monitoring systems to deliver business value. The operational excellence pillar contains best practices for organizing your team, deploying your workload, operating it at scale, and evolving it over time. You will see DevOps and ITIL processes here.

Security covers protecting data, systems, and environments. This isn't just about checking compliance boxes — it's about actually understanding your entire security spectrum — preventive and detective. In one recent case, a Palo Alto ec2 server was down and no one knew why — AWS had sent the health notification to stop and start it because of degraded hardware, but no one was receiving those emails. It was going to a single person who had left the organization!

Reliability ensures your workload performs its intended function correctly and consistently. This includes the ability to operate and test the workload through its total lifecycle with resilience. Latest Oct 2025 AWS outage must have made everyone realize the importance of baking reliability into critical workloads.

Performance Efficiency is the ability to use cloud resources efficiently to meet performance requirements, and to maintain that efficiency as demand changes and technologies evolve. A normally highly performant system might face performance bottlenecks during peak demand period, if not planned properly. For example, Pre-warming is one of the ways to handle this.

Cost Optimization means avoiding unnecessary costs. This pillar has saved organizations literal thousands of dollars. We have found EC2 instances from a proof-of-concept that had been running forgotten for eight months. We have found DMS instances running 2 years after migration completed. We have found thousands of unattached EBS volumes and unnecessary snapshots. These are just a few examples. As the current wisdom says, while architecting workloads, cost should always be considered as a non-functional requirement.

Sustainability pillar focuses on minimizing the environmental impact of your cloud workloads, especially energy consumption and efficiency. It's about being smart with resource usage — not just for the planet, but for your wallet too.** For example,** AWS Graviton-based Amazon EC2 instances use up to 60% less energy than comparable EC2 instances for the same performance. They also provide the best price performance for cloud workloads running on Amazon EC2. Over 70,000 customers have used AWS Graviton to build efficient and performant workloads as of now (2025).

AWS has provided an excellent mind-map for this, where different entities are clickable and lead to relevant documentation.

Check it at Map of the AWS Well-Architected Framework

Enter the AWS Well-Architected Tool

Now, you might be thinking, "This all sounds great, but how do I actually apply this to my architecture?" There are 6 pillars, 57 different questions, and multiple best practices against each of these questions. That's where the AWS Well-Architected Tool (WA Tool) becomes your best friend.

The WA Tool is a free service available in the AWS console that helps you review your workloads against these pillars. It's basically an interactive questionnaire that walks you through each pillar, asking you specific questions about your architecture.

How It Actually Works

Here's what the experience looks like in practice:

You start by defining a workload. This could be anything — a microservice, an entire application, or even a data pipeline. The tool then presents you with a series of questions for each pillar. These aren't yes/no questions; they're thoughtful, sometimes challenging questions that make you really think about your design decisions.

For example, under Security, you might get asked: "How do you protect your network resources?" The tool then offers multiple choice answers based on best practices, and you select what applies to your workload.

What I really appreciate is that for each question, there's context. The tool explains why the question matters and what the implications are of different approaches. It's educational.

Important Features of WA Tool

Consolidated Reports: Not only for your individual workloads, but if you're managing multiple workloads, you can generate reports across all of them. This is invaluable for getting an organizational view of your cloud architecture health.

Risk Identification: After you complete the review, you can generate a report which shows high-risk issues (HRIs) and medium-risk issues (MRIs). These aren't generic warnings — they're specific to what you told the tool about your architecture. On the dashboard, we can see visualization for all workloads also. Seeing those red flags visualized really helps prioritize what to tackle first.

Improvement Plans: The tool doesn't just point out problems; it suggests remediation steps. Each identified risk comes with links to documentation, whitepapers, and specific AWS services that can help address the issue.

Milestones: You can save snapshots of your reviews over time. This is fantastic for tracking improvements. For example, we can run quarterly reviews and would be able to show the executives how we've systematically reduced our high-risk items from 12 to 2 over last quarter. This will make the investment in improvements really tangible.

Lenses: Beyond the standard six pillars included in default WAF lens, AWS offers specialized lenses for specific use cases. There's a DevOps lens, Serverless Lens, a SaaS Lens, and several others. There are industry specific lenses like Healthcare and Financial services, which are very helpful considering compliance requirements of these industries. Of course, there is a Generative AI lens now, the hottest IT industry buzzword right now. These include relevant questions and best practices for their areas.

Custom Lenses: If your organization has specific standards or requirements, you can create custom lenses. For example, enterprises can use this to encode their security policies or compliance requirements directly into the review process. These custom lenses can also be shared with other accounts or your entire AWS organization.

Review Templates: These help in standardization. You can create review templates in AWS WA Tool that contain pre-filled answers for Well-Architected Framework and custom lens best practice questions. Well-Architected review templates reduce the need to manually fill in the same answers for best practices that are common across multiple workloads when performing a Well-Architected review, and they help drive consistency and standardization of best practices across teams and workloads. You can create a review template to answer common best practice questions or create notes, which can be shared with another IAM user or account, or an organization or organizational unit in the same AWS Region. You can define a workload from a review template, which helps scale common best practices and reduce redundancy across your workloads.

Profiles: You can create profiles to provide your business context, and identify goals you'd like to accomplish when performing a Well-Architected review. AWS Well-Architected Tool uses the information gathered from your profile to help you focus on a prioritized list of questions that are relevant to your business during the workload review. Attaching a profile to your workload also helps you see which risks are prioritized for you to address with your improvement plan.

AWS Funding: AWS pays you for fixing your issues! Businesses can receive $5,000 in AWS credits to offset the cost of remediating issues identified during an AWS Well-Architected Framework (WAF) Review as of this writing. To qualify, you must partner with a certified Well-Architected Partner to conduct the review. Check with your AWS partner or AWS TAM on this.

Real-World Insights

Let me share some practical wisdom from actually using this framework:

Start Small: Don't try to review your entire infrastructure in one sitting. Pick one critical workload and go through the exercise thoroughly. Start with non-prod environment to get a hang of it. Maybe your newest project where you can actually implement changes quickly.

The First Review Is Always Humbling: Even systems designed by experienced architects will have gaps. That's okay — that's the point. The framework represents the collective wisdom of thousands of AWS architects. It's supposed to teach you something. Also, cloud is always evolving and bringing in better ways to do things.

Make It a Team Activity: Running through the questions with your team is way more valuable than having one person fill it out alone. For example, the discussion around "How do you test reliability?" might reveal that the developers thought they had comprehensive testing, but ops was manually verifying deployments. This insight alone can prevent a future incident.

High-Risk Doesn't Always Mean "Drop Everything": Context matters. For example, tool can flag a development environment for not having multi-region failover. Technically a risk, but for a dev environment? Not worth the complexity. Use your judgment.

The 80/20 Rule Applies: Pareto principle comes into play here also, just like almost everywhere else. About 20% of the recommendations typically address 80% of your actual risk. Focus on the high-risk items first, especially around security and reliability. You can optimize costs and performance iteratively.

Revisit Regularly: Your architecture isn't static, and neither should your Well-Architected reviews be. It is recommended to conduct quarterly reviews for production workloads. New features get added, traffic patterns change, and AWS releases new services that might better address your needs. For example, security groups can be shared across VPCs and accounts now, making their centralized management possible.

Use It for New Projects: Here's a pro tip — run through the relevant questions before you build something new. Use the Well-Architected questions as a checklist during design phases. It's way easier to build security in from the start than to retrofit it later.

A Real Example

Let me tell you about a project from some time back. A client had migrated one of their data centers to cloud 3 years back, but there was no proper cloud team and very few things were properly configured. For example, they did not even have default EBS encryption enabled, which can be done easily in seconds at account level. Their monthly AWS bill was creeping up, and they couldn't figure out why. I was mainly engaged for cost optimization, with secondary emphasis on everything else.

We ran a Well-Architected review focusing heavily on the Cost Optimization first. The review revealed several issues:

  • Multiple instances in shut down state for months or years.
  • Their RDS instances were over-provisioned for average load.
  • Hundreds of unattached EBS volumes.
  • Their Backup was being retained forever (20 TB of snapshots). They had no such compliance requirements.

The Cost Optimization pillar helped us identify these issues systematically. Within two months, we had:

  • Decommissioned unused servers
  • Right-sized EC2 and RDS instances
  • Deleted unattached EBS volumes
  • Modified backup retention policy to 3 weeks

The result? Their monthly bill showed good improvement. More importantly, systems and a regular rhythm was put in place. During this process, we enabled Compute Optimizer, Trusted Advisor and Cost Optimization Hub. We set up budget alerts and cost anomaly detection alerts. Teams started receiving notifications to gain visibility into their bills. A bi-weekly call was set up in place to constantly review the findings and assign action items.

After this, we did a review for other pillars and created a comprehensive action plan. The pillars were customized so that instead of individual workloads, we were focusing on the entire landing zone. Now they have a well architected landing zone, lower bill, and fewer incidents.

Getting Started Today

If you want to try this out, here's what you should do:

  1. Log into the AWS Console and search for "Well-Architected Tool"
  2. Click "Define workload" and pick something meaningful but manageable
  3. Set aside 2–3 hours with your team
  4. Go through one or two pillars thoroughly rather than rushing through all six
  5. Focus on understanding the "why" behind each question
  6. Generate your report and prioritize the high-risk items
  7. Create tickets or action items for addressing the gaps
  8. Schedule your next review in 3–6 months

The framework isn't magic — it won't automatically fix your architecture. But it will give you a systematic way to think about your systems, identify blind spots, and continuously improve. And that's exactly what separates good cloud architectures from great ones.

Resources for Deep Dive


Have you used the Well-Architected Framework? Have you found any surprising issues in your WA reviews? I'd love to hear about it in the comments.

Top comments (0)