DEV Community

Cover image for Building a Complete AWS VPC with Load Balancer: A Step-by-Step Journey
Deepanshu
Deepanshu

Posted on

Building a Complete AWS VPC with Load Balancer: A Step-by-Step Journey

A comprehensive guide to creating a production-ready AWS infrastructure including all the roadblocks I hit and how to solve them

Introduction

Hey there, fellow cloud explorer! Have you ever tried setting up an AWS infrastructure and found yourself scratching your head, wondering, “Why isn’t this working?!” I’ve been there too! In this down-to-earth guide, I’ll walk you through building a full AWS Virtual Private Cloud (VPC) with an Application Load Balancer (ALB), EC2 instances, and more while sharing every stumble I made along the way and how I fixed them.

This isn’t your typical polished tutorial. It’s the real deal complete with those “uh-oh” moments and their practical solutions. By the end, you’ll have a production ready setup and the know how to troubleshoot like a pro. Let’s dive in!

What We’re Building

Imagine we’re setting up “Dublin Delights,” a small online store selling Irish goodies. Our final architecture will include:

  • A VPC with public and private subnets
  • An Internet Gateway for public access
  • A NAT Gateway for private subnet internet access
  • An Application Load Balancer to distribute traffic
  • 3 EC2 instances (1 public, 2 private)
  • Tight security groups and routing
  • Working web servers behind the load balancer

Ready? Let’s get started!

Phase 1: Setting Up the VPC Foundation

Creating the VPC

First, let’s build the foundation our Virtual Private Cloud:

  1. Head to the AWS Console > Search “VPC” > Click “VPC” > “Create VPC”.
  2. Choose “VPC and more” (this sets up everything in one go).
  3. Configuration:
    • Name: dublin-delights-vpc
    • IPv4 CIDR: 10.0.0.0/16 (gives us 65,536 IP addresses)
    • Number of Availability Zones (AZs): 2 (e.g., US-east-1a, US-east-1b)
    • Public subnets: 1 (e.g., 10.0.1.0/24)
    • Private subnets: 2 (e.g., 10.0.2.0/24, 10.0.3.0/24)
    • NAT gateways: 1 (in the public subnet)
    • VPC endpoints: None
  4. Hit “Create VPC” and wait a minute for it to spin up.

What This Creates:

  • A VPC with the 10.0.0.0/16 CIDR block
  • 1 public subnet (10.0.1.0/24)
  • 2 private subnets (10.0.2.0/24, 10.0.3.0/24)
  • An Internet Gateway
  • A NAT Gateway in the public subnet
  • Route tables (we’ll tweak these later)
The First Problem: DNS Resolution Woes

After creating the VPC, I launched instances but hit a snag DNS wasn’t working! Updates failed with “temporary failure resolving” errors. Here’s the fix:

  1. VPC Console > “Your VPCs” > Select dublin-delights-vpc.
  2. “Actions” > “Edit VPC settings”.
  3. Enable “DNS hostnames” and “DNS resolution” (both checkboxes).
  4. Save changes.

Why This Matters: Without DNS hostnames, instances can’t resolve domain names (e.g., sudo-apt-get update fails). This little toggle saved my day!

Phase 2: Launching EC2 Instances

Creating the Instances

Let’s launch our three servers:

  • Public Instance (public-web-1):

    1. EC2 Console > “Launch Instance”.
    2. Configuration:
      • Name: public-web-1
      • AMI: Ubuntu Server 22.04 LTS
      • Instance type: t2.micro (Free Tier eligible)
      • Key pair: Select or create devops-key (download .pem file)
      • VPC: dublin-delights-vpc
      • Subnet: public-subnet-1
      • Auto-assign Public IP: Enable
      • Security group: Create new “public-sg”
    3. Launch and wait for “Running”.
  • Private Instances (private-web-2, private-web-3):

    • Repeat the process, but:
    • Name: private-web-2 and private-web-3
    • Subnet: private-subnet-1 and private-subnet-2
    • Auto-assign Public IP: Disable
    • Security group: Create new “private-sg”
    • Launch both.
The Second Problem: Connection Timeouts

I couldn’t SSH to my instances got “ssh: connect to host 52.23.157.242 port 22: Connection timed out.” Ouch! The culprit? Security groups.

  • Solution - Fix Security Groups:
    • public-sg:
    • Inbound: SSH (22) from “My IP”, HTTP (80) from 0.0.0.0/0
    • Outbound: All traffic to 0.0.0.0/0
    • private-sg:
    • Inbound: SSH (22) from public-sg, HTTP (80) from ALB security group (later)
    • Outbound: All traffic to 0.0.0.0/0
    • Apply and test SSH: ssh -i devops-key.pem ubuntu@public-ip.
The Third Problem: Session Manager Not Working

Tried AWS Session Manager got “SSM Agent is not online.” The route table was the issue my “public” subnet wasn’t really public!

  • Solution - Fix Route Tables:
    1. VPC Console > “Route Tables” > Find the one for public-subnet-1.
    2. “Routes” tab > “Edit routes” > Add: Destination 0.0.0.0/0, Target “Internet Gateway” (dublin-igw).
    3. Save.
  • Result: SSH, Session Manager, and internet access worked!

Phase 3: Setting Up the Application Load Balancer

Creating the ALB
  1. EC2 Console > “Load Balancers” > “Create Load Balancer” > “Application Load Balancer”.
  2. Configuration:
    • Name: dublin-alb
    • Scheme: Internet-facing
    • IP address type: IPv4
    • VPC: dublin-delights-vpc
    • Availability Zones: Select both AZs with public-subnet-1
    • Security group: Create new “alb-sg”
  3. Create and wait 5 minutes.
Creating the Target Group
  1. EC2 Console > “Target Groups” > “Create target group”.
  2. Configuration:
    • Target type: Instances
    • Name: web-targets
    • Protocol: HTTP
    • Port: 80
    • VPC: dublin-delights-vpc
    • Health check path: /
  3. Register targets: Add all 3 instances.
  4. Create.
The Fourth Problem: 504 Gateway Timeout

Accessing the ALB gave “504 Gateway Time-out,” with targets “Unhealthy” (Target.Timeout).

  • Root Cause #1: No Web Servers

    • Solution: Install Apache on each instance:
    sudo apt update
    sudo apt install -y apache2
    sudo systemctl start apache2
    sudo systemctl enable apache2
    echo "<h1>Web Server - $(hostname)</h1>" | sudo tee /var/www/html/index.html
    curl http://localhost
    
  • Root Cause #2: ALB Security Group Outbound Rules Missing

    • Even with web servers, 504 persisted. The ALB’s security group had no outbound rules!
    • Critical Fix:
    • EC2 Console > “Security Groups” > alb-sg.
    • “Outbound rules” > “Edit outbound rules” > Add: Type “All traffic”, Destination 0.0.0.0/0.
    • Save.
  • Why This Matters: ALB needs outbound rules to send health checks and traffic.

Phase 4: Fine-Tuning and Testing

Proper Security Group Configuration
  • alb-sg:
    • Inbound: HTTP (80) from 0.0.0.0/0
    • Outbound: All traffic to 0.0.0.0/0
  • public-sg:
    • Inbound: SSH (22) from “My IP”, HTTP (80) from alb-sg
    • Outbound: All traffic to 0.0.0.0/0
  • private-sg:
    • Inbound: SSH (22) from public-sg, HTTP (80) from alb-sg
    • Outbound: All traffic to 0.0.0.0/0
Testing the Complete Setup
  • Test 1: Individual Instance Access
    • From public instance: curl http://10.0.2.135 and curl http://10.0.3.150.
  • Test 2: Load Balancer
    • curl http://dublin-alb-689675767.eu-west-1.elb.amazonaws.com
    • Loop: for i in {1..5}; do curl http://dublin-alb-689675767.eu-west-1.elb.amazonaws.com; done (shows rotation).
  • Test 3: NAT Gateway Functionality
    • From private instances: curl ifconfig.me (shows NAT IP), sudo apt update (works).

Phase 5: Common Issues and Solutions

  • Issue 1: “Instance is not in public subnet”
    • Fix: Add 0.0.0.0/0 → Internet Gateway in route table.
  • Issue 2: “Referenced group id for existing IPv4 CIDR rule”
    • Fix: Add new rules instead of editing existing ones.
  • Issue 3: Private instances can’t reach internet
    • Fix: Route 0.0.0.0/0 → NAT Gateway in private route table.
  • Issue 4: Health checks failing with “Target.Timeout”
    • Fix: Install web server, fix security groups, ensure /var/www/html/index.html.

Final Architecture Overview

                    🌐 INTERNET
                         |
                    ┌────▼────┐
                    │   IGW   │ dublin-igw
                    └────┬────┘
                         |
              ┌──────────▼──────────┐
              │  Application LB     │ dublin-alb
              │     (alb-sg)        │
              └──┬────────┬────────┬┘
                 |        |        |
    ┌────────────▼─┐  ┌───▼───┐  ┌─▼────────────┐
    │ AZ: us-east-1a│  │us-east│  │ AZ: us-east-1c│
    │               │  │  -1b  │  │               │
    │ ┌───────────┐ │  │┌─────┐│  │ ┌───────────┐ │
    │ │Public Sub │ │  ││Priv ││  │ │Private Sub│ │
    │ │10.0.1.0/24│ │  ││Sub 1││  │ │10.0.3.0/24│ │
    │ │           │ │  ││10.0.││  │ │           │ │
    │ │┌─────────┐│ │  ││2.0/ ││  │ │┌─────────┐│ │
    │ ││public-  ││ │  ││24  ││  │ ││private- ││ │
    │ ││web-1    ││ │  ││     ││  │ ││web-3    ││ │
    │ ││(public- ││ │  ││┌───┐││  │ ││(private-││ │
    │ ││sg)      ││ │  │││web│││  │ ││sg)      ││ │
    │ │└─────────┘│ │  │││-2 │││  │ │└─────────┘│ │
    │ │           │ │  ││└───┘││  │ │           │ │
    │ │┌─────────┐│ │  │└─────┘│  │ │           │ │
    │ ││NAT GW   ││ │  └───────┘  │ │           │ │
    │ │└─────────┘│ │             │ │           │ │
    │ └───────────┘ │             │ └───────────┘ │
    └───────────────┘             └───────────────┘
                |                         |
                └─────────┬─────────────────┘
                         |
                    ┌────▼────┐
                    │   IGW   │ (for NAT traffic)
                    └────┬────┘
                         |
                    🌐 INTERNET

Enter fullscreen mode Exit fullscreen mode
  • Network Flow: Inbound: Internet → IGW → ALB → Instances. Outbound: Public via IGW, Private via NAT.
  • Security: Public SSH from specific IP, private via public, ALB open on 80.

Key Lessons Learned

  1. Security Groups Are Stateful: Inbound allows return traffic, but outbound rules are needed for initiated connections.
  2. “Public” Subnet Needs Routing: Name doesn’t matter route tables make it public.
  3. ALB Needs Outbound Rules: Missing rules cause 504 errors.
  4. NAT Gateway Placement: Must be in a public subnet.
  5. Health Checks Are Critical: Unhealthy targets stop traffic.

Cost Optimization Tips

  • Costs: 3 t2.micro (~$25/month), NAT Gateway (~$32/month), ALB (~$16/month), data variable. Total ~$73/month.
  • Savings: Use NAT Instance (~$0), Network Load Balancer, terminate unused instances, or Reserved Instances (30-60% off).

Production Readiness Checklist

  • Security: Restrict SSH, enable VPC Flow Logs, CloudTrail, IAM roles, encryption.
  • Monitoring: CloudWatch alarms, ALB logs, target health, SNS notifications.
  • High Availability: Multi-AZ, Auto Scaling, health checks, disaster recovery.
  • Performance: Right-size instances, ALB stickiness, CloudFront CDN.

Troubleshooting Commands Reference

  • Connectivity Issues:
    • telnet <ip> <port>
    • aws ec2 describe-security-groups --group-ids sg-xxxxxxxxx
    • nslookup google.com
    • ip route show
  • Web Server Issues:
    • sudo systemctl status apache2
    • sudo netstat -tlnp | grep :80
    • curl http://localhost
    • sudo tail -f /var/log/apache2/error.log
  • ALB Issues:
    • curl -I http://your-alb-dns-name.elb.amazonaws.com
    • aws elbv2 describe-target-health --target-group-arn <arn>

Conclusion

Building AWS infrastructure is like assembling a puzzle every piece (VPC, subnets, security groups, ALB) must fit. My mistakes taught me to start with basics, test incrementally, and read error messages closely. Each “why isn’t this working?” turned into a “now I get it!” moment.

What’s Next?
Explore Auto Scaling Groups, RDS, CloudFront, Route 53, ACM, or Infrastructure as Code (CloudFormation/Terraform). Share your AWS adventures in the comments let’s learn together!


Top comments (0)