Khaing Zin

Posted on May 27

Building a Self-Healing and Scalable Infrastructure on AWS

#architecture #aws #devops #infrastructure

When building applications in the cloud, one of the biggest challenges is handling unpredictable traffic. An application may work perfectly fine with a few users, but what happens when thousands of users suddenly visit at the same time?

This is a common real-world problem for startups, e-commerce websites, SaaS platforms, and modern web applications.

Imagine running a simple web application on AWS using only one EC2 instance. During normal days, everything works smoothly. Pages load fast, the server responds quickly, and users have a good experience.

But during a promotion campaign, viral social media post, or seasonal sale, traffic suddenly increases. The single EC2 instance starts receiving too many requests. CPU usage spikes, memory usage increases, response times become slower, and eventually the server may crash completely.

This is exactly where AWS Auto Scaling becomes powerful.

According to the uploaded slides, the purpose of Auto Scaling is to automatically create additional EC2 instances whenever system load increases and remove unnecessary instances when traffic decreases. This allows the system to remain stable even during sudden traffic spikes.

Why High Availability Matters

In modern cloud infrastructure, downtime is expensive.

If an application goes offline:

Customers cannot access services
Businesses lose revenue
User trust decreases
Performance becomes unreliable

High availability means designing systems that continue running even when failures happen.

Instead of depending on a single server, AWS encourages distributing workloads across multiple servers and multiple Availability Zones. This creates fault tolerance and improves reliability.

The architecture presented in the slides follows this exact principle.

Understanding the Architecture

The architecture starts with users sending requests to the application.

However, instead of directly connecting users to an EC2 instance, requests first go through an Application Load Balancer (ALB).

The Load Balancer acts like an intelligent traffic controller. Its responsibility is to:

Receive incoming requests
Check which servers are healthy
Distribute traffic evenly across EC2 instances

Behind the Load Balancer, EC2 instances are managed by an Auto Scaling Group (ASG).

The Auto Scaling Group continuously monitors system demand. If traffic increases, new EC2 instances are launched automatically. If traffic decreases, unnecessary instances are terminated to save cost.

This creates an infrastructure that can dynamically adapt to changing workloads without requiring manual intervention.

What makes this architecture powerful is its ability to remain operational even when failures occur. If one instance becomes unhealthy, traffic is redirected to healthy servers automatically.

This is one of the core ideas behind cloud-native infrastructure design.

Implementation

The entire setup process can be divided into several stages:

Creating a Launch Template
Configuring User Data
Creating a Target Group
Creating an Application Load Balancer
Configuring Auto Scaling Group
Testing the Infrastructure

Let’s explore each step in detail.

Step 1 — Creating a Launch Template

The first step is creating a Launch Template.

A Launch Template works like a reusable blueprint for EC2 instances. Instead of manually configuring every new server, AWS uses the template whenever Auto Scaling launches a new instance.

The template defines:

Which AMI to use
Instance type
Security Groups
Storage settings
Startup scripts

Amazon Linux 2023 is selected
t2.micro is used as the instance type
HTTP traffic on port 80 is allowed through the Security Group

Allowing port 80 is important because the application will serve web traffic using Nginx.

This approach introduces one of the biggest advantages of cloud infrastructure: standardization.

Every EC2 instance launched from the template will have the exact same configuration, reducing human error and making infrastructure predictable.

Automating Server Setup with User Data

One of the most interesting parts of the implementation is the use of User Data.

User Data allows developers to automatically execute scripts whenever a new EC2 instance launches.

This means:

Every new instance automatically becomes a working web server
No manual configuration is required
Scaling becomes fully automated

This is a fundamental DevOps concept called Infrastructure Automation.

Instead of manually logging into servers and configuring them one by one, infrastructure becomes programmable and repeatable.

In real-world production environments, User Data scripts are often used to:

Install application dependencies
Pull application code
Configure monitoring tools
Start Docker containers
Connect servers to databases

Automation like this significantly reduces operational overhead.

Creating the Target Group

After the Launch Template is ready, the next step is creating a Target Group.

A Target Group defines where the Load Balancer should forward traffic.

When users access the application:

Requests arrive at the Load Balancer
The Load Balancer checks healthy targets
Traffic is distributed to healthy EC2 instances

This creates intelligent traffic routing inside the infrastructure.

Without a Target Group, the Load Balancer would not know where to send incoming requests.

Building the Load Balancer

Next comes the Application Load Balancer (ALB).

The ALB acts as the public entry point for the application.

During setup:

The default VPC is selected
Multiple subnets are configured
HTTP listener on port 80 is attached

One of the most important concepts introduced here is the Multi-AZ setup.

AWS regions contain multiple Availability Zones (AZs), which are physically separate data centers.

If one Availability Zone experiences failure:

The application continues running from other zones
Downtime is minimized
Infrastructure becomes fault tolerant

This is a key principle behind designing resilient cloud systems.

Instead of placing everything inside one server or one data center, workloads are distributed across multiple zones.

Configuring the Auto Scaling Group

Once the Load Balancer is ready, the Auto Scaling Group can be configured.

In the slides:

Minimum capacity is set to 1
Desired capacity is set to 1
Maximum capacity is set to 2

This means:

At least one instance will always remain active
The infrastructure initially starts with one server
AWS can automatically create another instance if demand increases

The Auto Scaling Group continuously monitors metrics such as:

CPU utilization
Network traffic
Request count

When thresholds are exceeded, scaling policies automatically launch additional EC2 instances.

This allows the infrastructure to grow dynamically depending on workload.

Self-Healing Infrastructure with Health Checks

Another critical feature highlighted in the slides is Health Checks.

The Load Balancer continuously checks whether instances are healthy.

If an instance fails:

Traffic is no longer routed to it
The Auto Scaling Group detects the unhealthy state
A replacement instance is launched automatically

This creates a self-healing system.

One of the biggest advantages of cloud computing is that infrastructure can recover automatically without human intervention.

In traditional infrastructure environments, engineers often had to manually replace failed servers. Cloud-native systems automate this entire process.

Testing the Infrastructure

After completing the setup, the Load Balancer provides a DNS endpoint.

Opening the DNS address in a browser displays the default Nginx page, confirming that the web server is running successfully.

The most interesting part comes during load testing.

When traffic increases:

CPU utilization rises
Auto Scaling policies trigger
New EC2 instances launch automatically
Traffic gets distributed across multiple servers

This demonstrates the true power of Auto Scaling.

The infrastructure automatically adapts to workload changes in real time.

Real-World Impact of Auto Scaling
**

Auto Scaling is used everywhere in modern cloud systems.

E-Commerce Platforms

Traffic spikes heavily during flash sales and seasonal promotions.

Streaming Services

User activity fluctuates depending on time and trending content.

Gaming Platforms

Player traffic can suddenly increase during events or updates.

SaaS Applications

Businesses require consistent uptime and performance for customers worldwide.

Without scalable infrastructure, these platforms would struggle during high traffic periods.

Final Thoughts

This implementation demonstrates one of the most important concepts in cloud computing: building scalable and resilient infrastructure.

By combining:

EC2
Application Load Balancer
Auto Scaling Group
Multi-AZ deployment

AWS enables applications to automatically adapt to changing workloads while maintaining high availability.

Instead of manually managing servers, engineers can focus on building reliable systems that scale automatically.

For anyone learning Cloud Computing, DevOps, or Solution Architecture, understanding Auto Scaling is an essential foundational skill because modern applications are expected to remain fast, available, and reliable regardless of traffic conditions.

DEV Community