Atul Vishwakarma

Posted on Jun 19

Building a Production-Grade 3-Tier AWS Architecture with Terraform: Design Decisions, Trade-offs, and Lessons Learned

#terraform #aws #devops #cloud

Repo: https://github.com/vatul16/terratier — full Terraform source, module docs, and architecture diagram.

When I set out to build this project, I didn't want another "deploy a VM and call it infrastructure" tutorial repo. I wanted something that would force me to think through the same questions a platform team actually argues about: how many subnet tiers do you really need, where do secrets live, how do you let engineers SSH in without handing out keys forever, and what's the cheapest way to stay highly available without going broke on NAT Gateway bills.

The result is TerraTier — a small Go/Node.js goal-tracking app, deployed across a fully isolated, auto-scaling, four-tier network on AWS, provisioned entirely through modular Terraform. The app itself is deliberately boring (it's a CRUD list of goals). The infrastructure underneath it is the actual point of the project, and this article walks through why it looks the way it does.

The problem with most "3-tier Terraform" examples

Search for AWS 3-tier Terraform examples and you'll find a lot of repositories that split a VPC into public, private, and database subnets, drop a web server in private, and call it done. That's a reasonable starting point, but it collapses two very different concerns into one "private" tier: the stateless web/API layer that talks to the internet (indirectly, via a load balancer) and the application layer that's allowed to talk to the database. If your web tier gets compromised, in that model, it's sitting in the same subnet — and often the same security group — as anything that can reach your data.

I wanted the network topology itself to enforce a stricter rule: nothing can reach the database except the backend tier, and nothing can reach the backend tier except the frontend tier and the internal load balancer. So the VPC here has four subnet tiers instead of three, each duplicated across two Availability Zones:

Public — Internet Gateway route, NAT Gateways, the public-facing ALB, and the bastion host.
Frontend private — the Node.js Express tier, reachable only from the public ALB.
Backend private — the Go API tier, reachable only from an internal ALB that the frontend talks to.
Database isolated — RDS PostgreSQL, with no route to the internet at all, reachable only from the backend's security group.

That extra split is a small addition in Terraform — one more aws_subnet resource block, one more security group, one more ALB — but it changes the blast radius of a compromised frontend instance from "can reach the database" to "can reach exactly one internal load balancer on one port."

Two ALBs instead of one

This is probably the single decision in the repo that most resembles a real production pattern rather than a tutorial shortcut. The public ALB load-balances browser traffic across the frontend Auto Scaling Group on port 3000. The frontend, in turn, doesn't call backend instances directly — it calls a second, internal-only ALB, which load-balances across the backend Auto Scaling Group on port 8080.

The alternative — having the frontend call backend instances directly via private IPs, or through a Cloud Map service registry — would save the cost of a second ALB (roughly $16–20/month plus LCU charges). I chose the internal ALB anyway, for a few reasons that matter more once you have more than one backend instance: it gives the backend tier the same health-checked, load-balanced semantics as the frontend tier; it means backend instances can scale, fail, and get replaced without the frontend needing to know anything about individual instance IPs; and it gives me a single, consistent mental model — "every tier that has more than one instance sits behind an ALB" — instead of two different patterns for two tiers that conceptually do the same kind of horizontal scaling.

Secrets Manager, not environment variables baked into an AMI

The RDS master password is generated once, at apply time, with Terraform's random_password resource — 16 characters, with a curated set of special characters that won't break a Postgres connection string. It's written to a single Secrets Manager secret ({environment}-{project}-db-credentials) as a JSON blob containing the username, password, host, port, and database name together, so the backend only ever needs one secret ARN, not five separate values to wire through.

At boot, the backend's user-data script calls aws secretsmanager get-secret-value, parses the result with jq, and passes the individual fields into the Docker container as environment variables. The instance's IAM role grants exactly one Secrets Manager permission — GetSecretValue and DescribeSecret, scoped to that one secret's ARN, nothing else. No password ever gets written to a Dockerfile, a Docker image layer, or a .env file checked into git.

I'll be upfront about the limitation here, because it's the kind of thing an interviewer will probe and you should be ready to discuss it honestly: the password still flows through a Terraform variable (var.db_password), which means it exists in plan output and state, even though the variable itself is marked sensitive = true. The cleaner pattern is to have RDS generate and manage its own master password natively (manage_master_user_password = true, an RDS feature that creates and rotates the secret for you, with Terraform never touching the plaintext at all). I built it the way I did first because I wanted to understand the full credential lifecycle by hand before reaching for the feature that hides it.

Bastion host and SSM, deliberately redundant

Every EC2 instance in this stack — bastion, frontend, backend — gets the same IAM instance profile, which includes the AmazonSSMManagedInstanceCore managed policy. That alone is enough to aws ssm start-session --target <instance-id> into any instance with no SSH key, no open port 22 from the internet, and a full audit trail in CloudTrail of who connected and when.

So why keep the bastion at all? Two reasons. First, pragmatically: SSM Session Manager occasionally has friction in CI environments, narrow corporate proxy setups, or when you specifically need to forward a local port (aws ssm start-session ... --document-name AWS-StartPortForwardingSession) and just want a plain ssh -L tunnel instead of remembering the SSM syntax. Second, for this project specifically: a bastion host is the pattern most reviewers and interviewers will recognize immediately, and I wanted the repo to demonstrate both the "traditional" approach and the modern, keyless approach side by side, with the security trade-offs of each visible in the Terraform itself (the bastion's security group only allows SSH from var.allowed_ssh_cidrs, which defaults to "change this" rather than 0.0.0.0/0).

Picking the cheaper failure mode: single NAT Gateway

NAT Gateways are billed per-hour and per-GB processed, and they're one of the easiest places for a demo environment's AWS bill to quietly balloon. The VPC module supports both single_nat_gateway = true (one NAT Gateway, shared by both AZs' private subnets) and false (one NAT Gateway per AZ, fully redundant). The dev environment defaults to true.

That default is an explicit cost/availability trade-off, not an oversight: if the AZ hosting the single NAT Gateway has an outage, outbound internet access from the other AZ's private subnets breaks too — even though those subnets' EC2 instances are otherwise healthy. For a portfolio project that's torn down between demos, that's an acceptable risk for roughly half the NAT cost. For an actual production workload, flipping the flag to false is a one-line terraform.tfvars change, because the module was written to support both from day one rather than hardcoding the cheap option.

What happens when an instance boots

The launch templates for both ASGs run a user-data script that does the same rough sequence of things, and getting this script right was where I spent most of my actual debugging time on this project:

Install Docker and the AWS CLI v2.
(Backend only) Pull database credentials from Secrets Manager, with retry logic — because the very first time an ASG instance boots, RDS and the backend's DNS record might genuinely not be resolvable yet, and a script that fails fast on a transient DNS hiccup will throw the instance into a boot-loop of CrashLoopBackOff-style ASG churn.
(Frontend only) Poll the internal ALB's hostname and port with nc -z in a retry loop before starting the frontend container, so the frontend doesn't come up, fail its first few requests to a backend that isn't ready yet, and confuse anyone watching the ALB's health checks.
Pull the application's Docker image and run it with docker run --restart unless-stopped.
Install and configure the CloudWatch Agent to ship the user-data log and basic CPU/memory metrics.
Drop a cron entry that independently re-checks the container's health every 5 minutes and restarts Docker/the container if it's unhealthy — a cheap, ASG-independent self-healing layer on top of the ALB's own health checks.

The retry loops in steps 2 and 3 are the unglamorous but important part. The first version of this script didn't have them, and the very first terraform apply after a from-scratch deploy failed about a third of the time, simply because RDS or the internal ALB's DNS hadn't fully propagated by the time the EC2 instances finished booting — a classic race condition in any "spin up dependent infrastructure simultaneously" deployment. Adding bounded retry loops (with logging at every attempt, so you can actually see what happened in CloudWatch Logs afterward) turned that into a non-issue.

Observability: metrics, logs, and three layers of health checking

The Go backend exposes Prometheus-format metrics at /metrics — request counters labeled by path, and dedicated counters for goal-add and goal-remove operations — using the official prometheus/client_golang library. That's not wired up to a Prometheus server in this repo (there's no managed Prometheus or Grafana here yet), but the endpoint exists and is ready to be scraped, which matters more than it sounds: instrumenting an application for metrics is a decision you make in the application's code, and it's far easier to do it from the start than to retrofit it later.

Health checking happens at three independent layers, deliberately overlapping rather than relying on a single mechanism: the ALB target group's own health check (GET /health, every 30 seconds, 2 successes to mark healthy / 3 failures to mark unhealthy); a cron-based self-check on each instance every 5 minutes that restarts the container if it's failing locally; and CloudWatch Alarms watching aggregate CPU utilization and unhealthy target counts, wired to an alarm_actions list that's empty by default but ready to point at an SNS topic.

What I'd build next

A few things didn't make it into v1, on purpose — I'd rather ship something complete at a smaller scope than something half-finished at a larger one. In rough priority order:

A CI/CD pipeline is the most obvious gap. Right now, deploying a new image means running build_and_push.sh and then manually triggering (or waiting for) an ASG instance refresh. A GitHub Actions workflow that runs terraform plan on every pull request, builds and pushes images on merge, and triggers a rolling instance refresh would turn this from "infrastructure I deploy by hand" into "infrastructure that deploys itself," which is really the whole point of the discipline.

Moving from Docker Hub to Amazon ECR removes both the anonymous-pull rate limiting that public Docker Hub images are subject to and the need to pass Docker Hub credentials into instance user-data at all — ECR authentication can ride entirely on the existing IAM instance profile.

And finally, remote state. The S3 backend block is already scaffolded and commented out in provider.tf, because local state is fine for solo development but becomes a real liability the moment more than one person — or one CI pipeline plus one person — needs to run terraform apply against the same environment.

Closing thoughts

None of the individual pieces here are exotic — VPCs, ALBs, ASGs, RDS, Secrets Manager, and IAM are about as standard an AWS toolkit as exists. What I think is actually worth showing in an interview isn't any single resource block; it's the reasoning behind where the boundaries are drawn — which tier can talk to which, where a secret lives versus where it's read, what happens in the 90 seconds between "instance is running" and "instance is actually ready to serve traffic" — and being able to articulate the trade-off in each decision rather than just the decision itself.

The full code is on GitHub at https://github.com/vatul16/terratier, along with a deeper architectural breakdown in ARCHITECTURE.md and auto-generated input/output documentation for every Terraform module. I'm currently looking for Cloud/DevOps Engineer roles — feel free to reach out on LinkedIn if you'd like to talk through any part of this in more depth.

DEV Community