After several years working with AWS services in multiple projects with different roles from developer, operator to architect, when I design or evaluate a solution, I am using two approaches: AWS in DevOps model and Well-Architected Framework. They are very useful for both real work and all AWS certified examinations (specially for AWS Solutions Architect/DevOps Professional).
Development and operation processes in each organization are different, but they often contain plan, design, development, deployment and operation stages. Leveraging AWS services in our processes should be considered seriously to maximize benefits.
Plan stage: In this stage, we often collect and evaluate requirements, then select appropriate AWS services. It requires basic knowledge and experience on many AWS services across domains from network, computation, storage, management, security to developer tools. If the target is a production environment, PaaS should be preferred over IaaS, such as Aurora over MySQL installed on EC2. However, do not limit yourself and avoid vendor lock-in, sometimes combination with other SaaS (e.g., Slack or open source solution (e.g., Kubernetes can create great solutions.
Design stage: In this stage, we often configure services and decide how they communicate together. Multi-tier, serverless and micro-services are very common architectures which an architect should be familiar with. There are three common types of design task: new solutions, existing solutions improvement and migration. The first one, design for new solutions, we start from zero, but we may enjoy flexibility in decisions. We should determine requirements from multiple views including security & control, scalability & availability & reliability, and performance & cost. The second one, continuous improvement for existing solutions, we often troubleshoot and determine improvement strategy based on Well-Architected Framework. The last one, migration plan, we may start with a legacy application on-premises, and such tasks are sometime not easy due to a lot of unmaintained resources. We need to measure current workloads, select migration tools & services, and implement 6R’s strategies (rehosting, replatforming, refactoring / re-architecting, retire and retain)
Development stage: In this stage, functional requirements and designs are translated into application code and configuration. To develop and debug cloud-based applications, we need to have knowledge at application level and follow best practices of each service to leverage the power of cloud. Besides, we will find that AWS CLI and SDK (e.g. boto3) are very useful for our work.
Deployment stage: Deployment/release stage should be performed quickly and has minimum downtime and human interaction. AWS provides us a set of flexible services to realize DevOps best practices such as Infrastructure as Code (IaC), Continuous Integration (CI) and Continuous Deployment - Delivery (CD). Depending on tasks, we can use other tools (e.g. Jenkins to support our complex scenarios. To minimize downtime, we also should to be familiar with different deployment strategies, such as rolling updates, blue-green and canary release.
Operation stage: In this stage, we try to reduce operational complexity, optimize cost, and increase monitoring insights. Specially in a large organization, we should determine scalable operations with cross-account authentication & authorization & networking & monitoring & encryption and logging. Suppose that we have thousands resources for each business project, and hundreds projects across hundreds accounts, we may be unaware of unnecessary resources and waste a lot of money. We should define a cost-effective pricing model (e.g. on-demand, reservation, spot) based on requirements, cost reduction suggestion, budget notification and bill analysis to ensure cost optimization. Besides, in large application a monitoring system with a remediate and health recovery strategy should be designed carefully.
AWS published Well-Architected Framework, that describes key concepts, design principles, and architectural best practices for designing and running workloads in the cloud. It is a valuable resource, specially for an AWS solution architect, in order to design architecture, evaluate workloads and identify risk issues. Design principles are summarized as below.
Operational Excellence Pillar: the ability to support development and run workloads effectively, gain insight into their operations, and to continuously improve supporting processes and procedures to deliver business value.
There are 5 design principles for operational excellence in the cloud.
- Perform operations as code: implement infrastructure | middleware | application configuration | operational procedures as code, and automatic executions by event triggers.
- Make frequent, small, reversible changes: frequently release in small (CI/CD) and reversible changes (blue-green, canary).
- Refine operations procedures frequently: regularly review, validate and evolve procedures appropriately as workload evaluation.
- Anticipate failure: simulate - test failure scenarios (e.g. Chaos Monkey), validate their impacts, and prepare responses (notification, diagnosis and auto-recovery).
- Learn from all operational failures: learn from all operational events and share knowledge across teams.
Security Pillar: the ability to protect data, systems and assets while taking advantage of cloud technologies to improve your security.
There are 7 design principles for security in the cloud.
- Implement a strong identity foundation: implement least privilege, duty separation (authorization) and centralized authentication management; eliminate long-term static credentials.
- Enable traceability: monitor, alert, and audit actions and changes in real time. Integrate log and metric to automatically analyze and take actions.
- Apply security at all layers: deep security control from network, computing, storage, OS, code to application.
- Automate security best practices
- Protect data in transit and at rest: classify data into sensitivity levels and use mechanisms, such as encryption, tokenization, and access control.
- Keep people away from data: reduce or eliminate the need for direct access or manual processing of data.
- Prepare for security events: prepare incident management, investigation processes, and mitigation-recovery actions.
Reliability Pillar: the ability of a workload to perform its intended function correctly and consistently when it’s expected to. Furthermore, the ability to operate and test the workload through its total lifecycle.
There are 5 design principles for reliability in the cloud.
- Automatically recover from failure: perform automatic notification, tracking of failures, and recovery processes by monitor KPIs measuring business value (not technical aspects) of services.
- Test recovery procedures: simulate different failures, then test and validate recovery strategies.
- Scale horizontally to increase aggregate workload availability: distribute requests across multiple and smaller resources to reduce the impact of a single failure.
- Stop guessing capacity: automatically add or remove resources to maintain the optimal level to satisfy demand without over- or under-provisioning.
- Manage change in automation: manage changes with version control, and apply changes by using automation mechanism.
Performance Efficiency Pillar: the ability to use resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
There are 5 design principles for performance efficiency in the cloud.
- Democratize advanced technologies: delegate complex tasks to your cloud vendor, and consider consuming the technology as a service.
- Go global in minutes
- Use serverless architectures: removes the operational burden of managing physical servers, and can lower transactional costs
- Experiment more often
- Consider mechanical sympathy: use the technology approach that aligns best with your workload goals
Cost Optimization Pillar: the ability to run systems to deliver business value at the lowest price point.
There are ﬁve design principles for cost optimization in the cloud.
- Implement cloud financial management
- Adopt a consumption model: pay only for the computing resources that you require and increase or decrease usage depending on business requirements, not by using elaborate forecasting
- Measure overall efficiency: measure the business output of the workload and the costs associated with delivering it
- Stop spending money on undifferentiated heavy lifting
- Analyze and attribute expenditure: identify the usage and cost of systems, which allows transparent attribution of IT costs to individual workload owner.