DEV Community: Shivam

Building AWS Infrastructure the Sane Way: A Terraform VPC Guide

Shivam — Thu, 04 Sep 2025 20:26:04 +0000

So, terraform is a way to build the infra without having to go on the aws website and click buttons. The primary reason? This automates the provisioning process, allowing you to reliably and repeatedly create complex infrastructure without the risk of manual error.

Imagine how amazing this is - instead of having to go on the sometimes complex AWS Management Console and configure on different pages and be confused by their interface, just write a set of resource configs and press enter on terraform apply and suddenly the whole infra starts to build. Reminds me of the township game where everybody goes on their job when I click on apply for the buildings. Well, let's move.

How Terraform Works
So how does terraform work? A very straightforward way actually - you choose the resource, specify the basic configs that are absolutely necessary or say you want it to be specified, and we write the whole document.

Now we usually follow a 3 step process:

Terraform Validate - It's basically to check if there's any syntax error

Terraform Plan - It's to see the whole infra on how things are gonna be built on the field. We can check what the defaults have to say and if we feel like changing it, we can at that point itself

Terraform Apply - We finally put it all together and apply it. This command in one click starts building the whole plan

Enter VPC: Your Private Digital Land
So how do we tell terraform where to build what? Here comes VPC.

Virtual Private Cloud is a separate place on the servers we can ask for from cloud providers - this is our own private field. Okay, let's take the analogy of an empty ground. We first ask for a VPC from the cloud provider, here we are taking it from AWS.

Some Concepts Before Moving to VPC
CIDR - Okay so CIDR (Classless Inter Domain Routing - nobody cares for the full form, I promise) is like the ask we can make for the IPs. It's to allocate the IPs for routing purposes, a bit of calculation goes on it and we will try this before we move ahead.

So for standard VPC we use 10.0.0.0/16. Here the first four elements tell us the IP while the 16 after the slash is to tell the CIDR our ask for IPs.

The CIDR block 10.0.0.0/16 provides our VPC with a total of 65,536 private IP addresses. The /16 suffix means the first 16 bits of the address are fixed, leaving the remaining 16 bits for our internal network, while the left ones can have numbers ranging from 0 to 255 for both. So there can be 256×256 combinations we can make, which means we can have 65,536 number of IPs. This is actually a lot and we use this for our VPC.

Another example - for smaller parts on the VPC we say to the CIDR that we can have 10.0.0.0/24. Now what this means is the first 3 elements of the IP have to be constant and now we have only the last element to make combinations - that's 256 IPs, that's still a lot for each subnet on the VPC.

Wait, Subnet?

Subnet - Okay so we gotta divide the field into parts. This is done for better management, involves a bit of philosophy where we rant against monoliths and a lot of fault tolerance in the head. We divide the 65,536 IPs into small parts. Usually we do /24 as it gives us 256 IPs for a subnet - that's enough.

Building Our VPC
Okay so we will first build our VPC:

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"

  tags = {
    Name = "cloudbudget-vpc"
  }
}

Here is the general format of terraform's resource allocation: you can see the two words besides resource - one being the resource we need, the other the name we keep of the resource for calling in the code itself.

In the resource block, we have first and foremost mentioned the CIDR block - you can think of it like you were told to take a block of land on the ground, so the first thing you do is to make a fencing right? That's what CIDR does here, it makes the fence that we need at least this much land.

The tags block inside is to put up things which the world can see, say the AWS website shows you this.

Dividing the Land
We need to divide the land into parts so that next time there is an issue in some part we can go to other parcel and continue with our lives. Also it's for better security - here so that the private and public are at different locations too - public is in north-1a while private is on 1b. The CIDR we get from the vpc is 65k IPs but we need to divide it, so now we give 256 IPs to private and public each.

Creating the Public Subnets

resource "aws_subnet" "public_a" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "eu-north-1a"
  map_public_ip_on_launch = true

  tags = {
    Name = "cloudbudget-public-subnet-a"
  }
}

resource "aws_subnet" "public_b" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.4.0/24"
  availability_zone       = "eu-north-1b"
  map_public_ip_on_launch = true

  tags = {
    Name = "cloudbudget-public-subnet-b"
  }
}

Here, you can see the general way of building a subnet. We have mentioned the zone in which we want to keep our subnet - we mention this so we can keep different subnets on different parts so that when there is a disaster on one zone, we can have the other subnets working.

There is one interesting part of this block: map_public_ip_on_launch = true. This tells AWS if we want the subnet to be publicly accessible or not. Here we make the public subnet open for the world, but this won't be the same always.

Creating the Private Subnet


resource "aws_subnet" "private_a" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.2.0/24"
  availability_zone = "eu-north-1a"

  tags = {
    Name = "cloudbudget-private-subnet-a"
  }
}

resource "aws_subnet" "private_b" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.3.0/24"
  availability_zone = "eu-north-1b"

  tags = {
    Name = "cloudbudget-private-subnet-b"
  }
}

Say there is a room in the field in which we want to keep the gold - do we keep the doors open? No, what we do is make sure it's all locked. So when we create a private subnet specifically for certain infra like databases, we make sure that the map_public_ip_on_launch is false (which is the default) so that no one can access till we wish some other components to interact with each other.

Building the Roads
Okay so now we have a field with fence, we have marked the land for different infra. What's left? The roads. Since the field is big, the VPC is bigger, so we need a very well defined way to make these routes.

How do we do that? We need to first understand why does a subnet exist:

For example: Public subnet exists so that we can put the frontend on the internet. Now if this subnet needs to talk to the world, we need to configure a road from the public subnet to the front gate of the field which exposes the public subnet to the internet. In technical terms, we call this front gate the Internet Gateway.

Internet Gateway
We build the IGW using resources too:

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "cloudbudget-igw"
  }
}

We specify what vpc_id (field) do we need the front gate on.

Now, we need to build roads/routes. How do we do that?

Route Tables!
So the land is divided, now we need roads to reach the other parts of the land or the front door of the whole field:

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "cloudbudget-public-rt"
  }
}

We attach the VPC ID to the route table then we make the whole internet connect to the aws_internet_gateway.main.id so that after walking out of the front door we see the whole internet.

But there is a missing piece now - we have installed the gate, there is a subnet (the land parcel), there is a road to reach the gate too, but how do we tell the server to go from subnet to the front door?

The three pieces exist: The road, the land parcel, the gate. But how do we attach the road to the land parcel? With a walkway that joins the home to the road.

Route Table Association
We need to associate a route with a subnet and this is exactly what we do with this block:

resource "aws_route_table_association" "public_a" {
  subnet_id      = aws_subnet.public_a.id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "public_b" {
  subnet_id      = aws_subnet.public_b.id
  route_table_id = aws_route_table.public.id
}

This resource has two major components: the subnet_id and the route_table_id.

Yay! So we have our public subnet connected to the world.

The Private Side: NAT Gateway Setup
Do we really have subnets talking only to the world? No, we have to save some, so we make a way to keep some subnets to save and secure themselves from the world.

For this we can take this analogy: There is a company right, there are workers in a secure back office who shouldn't be revealing their identity. So one of them wants coffee, so she cannot just go on and ask for the coffee. So what she does is she calls the receptionist and tells her to order the coffee, and the receptionist makes a call to the world (that's the internet) and then receives it and then sends the coffee to the worker. This way the ID of the worker is secure. The receptionist is the NAT - Network Address Translator.

NAT Gateways (Yes, Multiple Ones!)
Like the IGW, we have the EIP on which we have complete control. But here's the thing - if we only put one receptionist and she gets sick, nobody in the back office can order coffee! So we put receptionists in both buildings:

resource "aws_eip" "nat_a" {
  domain = "vpc"

  tags = {
    Name = "cloudbudget-nat-eip-a"
  }
}

resource "aws_eip" "nat_b" {
  domain = "vpc"

  tags = {
    Name = "cloudbudget-nat-eip-b"
  }
}

resource "aws_nat_gateway" "main_a" {
  allocation_id = aws_eip.nat_a.id
  subnet_id     = aws_subnet.public_a.id

  tags = {
    Name = "cloudbudget-nat-gw-a"
  }
}

resource "aws_nat_gateway" "main_b" {
  allocation_id = aws_eip.nat_b.id
  subnet_id     = aws_subnet.public_b.id

  tags = {
    Name = "cloudbudget-nat-gw-b"
  }
}

Private Route Tables
We already have private subnets in the field and now we need the roads and then the markings on them, but now we need to not talk to the world but only with the NAT, right? So we give each private subnet its own route to its nearest receptionist:

resource "aws_route_table" "private_a" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main_a.id
  }

  tags = {
    Name = "cloudbudget-private-rt-a"
  }
}

resource "aws_route_table" "private_b" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main_b.id
  }

  tags = {
    Name = "cloudbudget-private-rt-b"
  }
}

Then we can go again and make the walkways to connect our private subnets to these route tables:

resource "aws_route_table_association" "private_a" {
  subnet_id      = aws_subnet.private_a.id
  route_table_id = aws_route_table.private_a.id
}

resource "aws_route_table_association" "private_b" {
  subnet_id      = aws_subnet.private_b.id
  route_table_id = aws_route_table.private_b.id
}

Security Guards for Your Subnets
But I live not on the field - how does this stay secure? We need to put security guards on these subnets so that requests that are not wanted don't get to enter the subnet. And guess what? We got AWS giving us these security guards too, and of course, terraform makes a resource out of this too so that we can secure our infra through code.

Here we build two security groups - one for db and other for app:

Database Security Group

resource "aws_security_group" "db" {
  name        = "cloudbudget-db-sg"
  description = "Allow traffic only from the app"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  tags = {
    Name = "cloudbudget-db-sg"
  }
}

The db has to talk only to the app and won't take any orders from the world, so we have this ingress object which takes requests only and only on 5432 (which is the PostgreSQL port) and it allows security with the badge of app.id only.

App Security Group

resource "aws_security_group" "app" {
  name        = "cloudbudget-app-sg"
  description = "Allow inbound HTTP and all outbound traffic"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "cloudbudget-app-sg"
  }
}

We have app security group - the app has to take in requests from the whole world, so ingress takes everything on the webserver which is port 80 and allows from everywhere (that is 0.0.0.0/0). While when going outside, it has to send messages to the db as well as the world, so instead of doing anything we put it to 0 so that it listens at any port required, make the protocol anything, and give it to the world so the app can talk to anybody.

It's important to note that AWS Security Groups are stateful. This means if you allow an inbound request (like on port 80 for our app), the corresponding outbound response is automatically allowed, regardless of the egress rules. Our permissive egress rule 0.0.0.0/0 is for connections initiated from within the app, such as API calls or database queries.

Putting Infrastructure in the Subnets
Alright! So we have our land marked with the subnet, the routes, the associations, the gateway! But what goes in these subnets? We put our infra in them - for example, we put our RDS database in the private subnets.

Database Subnet Group
We divide the database into different subnets for better fault tolerance. As we cannot now say that "take the db in this room", we make the dbs into a group and assign it to the RDS:


resource "aws_db_subnet_group" "main" {
  name       = "cloudbudget-db-subnet-group"
  subnet_ids = [aws_subnet.private_a.id, aws_subnet.private_b.id]

  tags = {
    Name = "cloudbudget-db-subnet-group"
  }
}

The subnet_ids array takes all the subnet IDs which have the databases.

RDS Database

resource "aws_db_instance" "main" {
  identifier               = "database-1"
  engine                   = "postgres"
  instance_class           = "db.t3.micro"
  allocated_storage        = 20
  skip_final_snapshot      = true
  publicly_accessible      = false
  storage_encrypted        = true
  availability_zone        = "eu-north-1a"
  vpc_security_group_ids   = [aws_security_group.db.id]
  db_subnet_group_name     = aws_db_subnet_group.main.name
  username                 = "postgres"
  password                 = "YourSecurePassword123"
}

The database will have the identifier database-1, and it will use the PostgreSQL engine. The instance type chosen is db.t3.micro, which is a small and cost-efficient option. It will have 20 GB of allocated storage.

The setting skip_final_snapshot = true means that if the database is deleted, no backup snapshot will be created. The database is set to not be publicly accessible, so it can only be accessed from within the VPC. Storage is encrypted for security.

We are telling it that server room in the VPC should have the db security group and the db subnet group in which we divide the whole db is main.name.

Also, In a production environment, NEVER hardcode passwords. Use a secrets manager like AWS Secrets Manager or HashiCorp Vault. for demonstration, we'll use a variable marked as sensitive. password = var.db_password

What We've Built
In summary, here's what we created:

We created a VPC (our private field)
We divided the VPC into subnets (land parcels)
We added an IGW (front gate to the internet)
Built route tables for connectivity (roads)
Route table association for connection between subnets and route table (walkways)
NAT Gateway for private subnet internet access (the receptionist)
Security groups (security guards)
RDS database in the private subnets (the gold vault)
Here, we have completed the VPC setup. This gives you a solid foundation for hosting applications where your frontend can be publicly accessible while your database remains secure in the private subnets, only accessible through your application servers.

The beauty of this setup is that once you have this Terraform configuration, you can spin up this entire infrastructure in any AWS region with just a few commands. No more clicking through endless AWS console pages - just code, version control, and automation.

Why Was My Localhost SSH Taking 3 Seconds? A Deep Dive.

Shivam — Thu, 24 Jul 2025 18:55:27 +0000

It was one of those moments for me when a simple task gets you into a 4-hour rabbit hole that teaches you more than months of reading. I was just trying to SSH into my own machine (yep, localhost), and suddenly, ofc nothing worked right.

It began as "why can't I SSH into my own computer?" turned into an unexpected masterclass in network debugging. Here's how I was able to create my framework for debugging these issues

When a Simple Task Goes Wrong
I was setting up a local development environment with Docker. One of my containers needed to create an SSH tunnel back to a database running on my host machine. It's a common setup. The command inside the container would look something like this:

# From inside the container, tunnel to host services
ssh -L 5432:localhost:5432 user@host.docker.internal

Before involving the container, I wanted to test the connection on my host machine first. A simple SSH to myself should happen quickly, right?

ssh shivam@localhost

It worked, but it felt slow. Really slow. An instant connection took several seconds. If my containers were going to use this for database connections, that lag would hurt performance. Something was off, and I needed to find out why.

**Step 1: Checking the Basics (The Network Itself)
The first rule of troubleshooting is to check the obvious. Is my machine even communicating properly with itself?

I started with the most basic tools.

ping localhost


PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.078 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.067 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.052 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2075ms
rtt min/avg/max/mdev = 0.052/0.065/0.078/0.010 ms

The response was immediate, with times around 0.05 ms. So, basic connectivity was perfect. Next, I checked the route.

traceroute localhost


traceroute to localhost (127.0.0.1), 30 hops max, 60 byte packets
 1  localhost (127.0.0.1)  0.395 ms  0.337 ms  0.318 ms

It showed a single hop, as expected. This indicated that the problem wasn't at the basic network layer. The pipes were clear.

Lesson #1: Always check the basics first. ping and traceroute can quickly tell you if you have a real network routing issue or something else.

Step 2: The Port Detective and a Confusing Detour
Okay, the network is fine. What about the SSH service itself? Is it listening on the correct port? I used ss to check.

ss -ltn | grep :22


LISTEN 0      4096               *:22               *:*

The output confirmed that a service was listening on port 22. Good. I remembered that I did play with config files and it felt like I may have messed up something so I checked the SSH config file.

sudo grep -i port /etc/ssh/sshd_config

Output:


 # configuration must be re-generated after changing Port, AddressFamily, or
#Port 2222

And yeah, as I felt ,the running service was on port 22, but the config file said it should be on port 2222. This was a classic distraction. After restarting the SSH service (sudo systemctl restart ssh) and seeing it still running on port 22, I realized the config file change was outdated and had never been applied correctly. The running service was the actual source of truth.

Lesson #2: Config files can be misleading. Always verify what the service is actually doing, not just what the config says it should do.

Step 3: Finding the real issue
I set aside the port confusion and re-focused on the original issue: the slowness.

The connection worked; it was just slow. This usually indicates that the problem isn't with the network connection itself but with the application-level processes on top of it. To see what was happening during the connection, I ran the SSH command with the verbose flag.

ssh -v shivam@localhost

As I watched the output scroll by, I saw it. The delay was occurring during the security checks specifically host key verification. SSH was going through its full security handshake, which is unnecessary for a trusted localhost connection.

The solution was to tell SSH to skip these checks for this specific case.

time ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null shivam@localhost "echo 'test'"

And it was a success the connection was now instant. The slowness was never a network issue; it was an SSH application feature. For my Docker tunnel, I could now use an optimized command:

ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -N -L 5432:localhost:5432 shivam@localhost

Lesson #3: Application protocols can have their own overhead. A perfect network connection can still feel slow if the application on top is doing extra work.

The Real Lesson: A Method to the Madness
I fixed the problem that night. But the real gain wasn't just the solution; it was learning a systematic way to think. Instead of randomly trying commands, I worked my way through the layers:

Network Layer: Is there a connection? (ping, traceroute)
Transport Layer: Is the port open and listening? (ss, netstat)
Application Layer: Is the service configured correctly, and what is it doing? (ssh -v, config files)

This structured approach is what separates guessing from true debugging.I think I would apply the same method to systematically find the root cause again

So my Go-To Network Debugging Checklist
Here's the simple playbook I now use for any connection issue.

Check Reachability: Can I see the machine?
ping $hostname
Check the Path: Is the network route clear?
mtr $hostname
Check DNS: Is the name resolving to the correct IP?
dig $hostname
Check the Port: Is the service listening?
ss -ltn | grep $port or netstat -tulnp | grep $port
Check the Application: Can I connect, and what is the app doing?
curl -v $protocol://$host:$port or ssh -v $user@$host
Go Deeper (If Needed): Look at the raw packets.
tcpdump -i any host $hostname -n

Network debugging isn't magic. It's a process of elimination. That frustrating evening spent on a "simple" localhost issue gave me a solid framework that I now use to tackle complex production problems.

Sometimes, the best lessons come from problems that seem too small to matter.