Sheersh Sinha

Posted on Oct 26

My DevOps Journey: Part 11 - Building Scalable and Cost-Effective AWS Infrastructure (Real-World Problem DevOps Solution)

#devops #aws #linux #career

Every DevOps journey reaches that point where the system starts talking back to you - not in words, but in CPU spikes, slow response times, and growing user traffic.

For me, that moment came one Friday afternoon.

My personal project, hosted on a single EC2 instance, suddenly started slowing down during peak hours.

Logs looked fine. Network was stable. But users were waiting.

That's when I realized - it's not about fixing bugs anymore; it's about designing for growth.

The Real Problem - When One Server Isn't Enough

My single EC2 instance was doing everything:

Serving frontend + backend
Handling API requests
Logging activities
Running cron jobs

It was the perfect "one-man army" - until it wasn't.

Then came the symptoms:

Latency spiked above 700ms
CPU utilization touched 90%
One deployment crash took down the entire site

That's when I understood - in DevOps, resilience > perfection.

I didn't need a stronger machine.

I needed multiple lightweight servers working together, scaling up and down as traffic demanded.

Step 1 - Architecting a Scalable System

I started small - with what a real company would do in this scenario:

"Scale horizontally, not vertically."

Here's the system design I implemented:

This architecture allowed:

Traffic distribution using an Application Load Balancer (ALB).
Automated scaling based on CPU or memory usage via Auto Scaling Groups (ASG).
Health monitoring through CloudWatch Metrics.

Step 2 - Implementation Journey & Challenges

Creating a Launch Template

A launch template is your blueprint for EC2 - defining AMI, instance type, and startup configuration.

It's what ensures every new instance behaves exactly as you intend.

Lesson learned: Templates remove the human error of "I forgot to install X on that instance."

Creating a Load Balancer

Once the ALB was configured (internet-facing on port 80), I connected it to my Target Group where all EC2 instances register automatically.

AWS started routing requests evenly.

No more single-server bottlenecks.

No more downtime during deployment.

"If one instance fails, the others don't even notice - that's the DevOps way."

Setting Up Auto Scaling

Then came automation.

Instead of guessing how many servers I'd need, I let AWS decide.

Scaling Policy:

Minimum Instances → 1
Desired → 2
Maximum → 4
Scale out when CPU > 70%

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name webapp-asg \
  --policy-name scale-out \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{"PredefinedMetricSpecification":{"PredefinedMetricType":"ASGAverageCPUUtilization"},"TargetValue":70.0}'

Now, new instances appeared automatically during load peaks and shut down when traffic cooled off.

That's the moment I realized - true DevOps is not manual control, it's intelligent automation.

Step 3 - The Cost-Effectiveness Perspective

I learned early that "scale" doesn't mean "spend more."

To stay cost-efficient:

Used t2.micro (free-tier) for tests.
Configured scaling cooldowns to prevent unnecessary spin-ups.
Added CloudWatch alarms to shut down idle EC2s.
Reserved spot instances for low-priority workloads.

Result:

System scaled automatically but stayed 40% cheaper than running fixed instances.

That's how you build a production-like system on a student budget.

Step 4 - Remedy & Prevention

When things go wrong, DevOps thinking kicks in.

Here's what I learned to always keep ready:

Problem	Remedy	Preventive Measure
High CPU utilization	Scale horizontally using ASG	Add CloudWatch alarms for CPU > 70%
Instance failure	Health checks via ALB	Enable ELB-based instance replacement
Configuration drift	Use Launch Templates	Version control templates via Git
Cost surge	Review billing dashboard weekly	Set AWS Budgets & alerts

Step 5 - Real-World Architecture (VPC Layer Included)

Here's what my updated system design looked like:

Each subnet, route, and NAT was designed to keep public entry restricted and internal services isolated - something real organizations practice every day.

Step 6 - My Vision as a DevOps Engineer

This experience shifted my perspective forever.

I stopped seeing AWS as a set of services - and started seeing it as an ecosystem.

Every design decision - from choosing an instance type to writing a cron job - impacts cost, scalability, and security.

"DevOps isn't about tools; it's about foresight - knowing what breaks tomorrow, and fixing it today."

This is what I strive for in my journey - not just deploying applications, but designing systems that can handle the unknown.

What's Next (Day 12 - AWS Networking & VPC Deep Dive)

Now that I've learned how to distribute and scale workloads with load balancers and auto scaling groups, it's time to look under the hood - the network layer that powers it all.

In Day 12, I'll explore:

Virtual Private Cloud (VPC) - the foundation of your private AWS network
Route Tables - directing traffic inside your cloud
Security Groups and Network Access Control Lists (NACLs) - understanding inbound/outbound control
Subnets (Public and Private) - isolating workloads securely
Internet Gateway (IGW) - bridging private cloud to the public internet
NAT Gateway - controlled outbound access for private instances
Elastic IP (EIP) - fixed IPs for stable external communication

"If Load Balancing is about distributing requests, Networking is about defining where those requests can even go."

DEV Community