augusthottie

Posted on Mar 10

I Built a 3-Tier AWS Architecture with Terraform Modules, ECS Fargate, RDS, and ElastiCache

#aws #ecs #terraform #devops

My last project was a CI/CD pipeline with blue/green deployments. It taught me CodeDeploy, CodePipeline, and a lot about IAM. But it ran on EC2 instances in a default VPC, no custom networking, no containers, no database tier.

This time I wanted to build what companies actually run in production: a 3-tier architecture with proper network isolation, serverless containers, a managed database, and an in-memory cache. All codified in Terraform modules.

What I Built

A Node.js API running on ECS Fargate that talks to PostgreSQL (RDS) and Redis (ElastiCache), deployed inside a custom VPC with public and private subnets:

Internet → ALB (public subnets)
              ↓ :3000
         ECS Fargate (private subnets)
         Bun + Express API
              ↓ :5432          ↓ :6379
         RDS PostgreSQL    ElastiCache Redis
         (private subnets) (private subnets)

The ALB is the only thing exposed to the internet. ECS, RDS, and Redis all sit in private subnets with no public IP addresses. Each tier's security group only allows traffic from the tier above it. The entire infrastructure is defined in 6 Terraform modules — 37 resources created with one command.

Why This Architecture Matters

If you're interviewing for DevOps or cloud engineering roles, "I deployed an app to EC2" doesn't differentiate you. Interviewers want to know:

Can you design a VPC from scratch with proper subnet segmentation?
Do you understand why databases belong in private subnets?
Can you explain the difference between an Internet Gateway and a NAT Gateway?
Have you actually worked with ECS Fargate, not just read about it?

This project answers all of those with working code.

The Network Layer

This was the foundation everything else depended on. I created a VPC with 10.0.0.0/16 split across two availability zones:

Subnet	CIDR	Tier	Internet Access
Public 1a	10.0.0.0/20	ALB, NAT Gateway	Direct via IGW
Public 1b	10.0.16.0/20	ALB (multi-AZ)	Direct via IGW
Private 1a	10.0.32.0/20	ECS, RDS	Outbound only via NAT
Private 1b	10.0.48.0/20	ECS, ElastiCache	Outbound only via NAT

The key design decision: everything except the ALB goes in private subnets. The ECS tasks need outbound internet access (to pull images from ECR), so they route through a NAT Gateway in the public subnet. But nothing on the internet can reach them directly.

Each Terraform module is self-contained. The VPC module outputs subnet IDs and the VPC ID. Other modules consume those outputs without knowing anything about how the network is built.

Security Group Boundaries

This is the part that makes this a real 3-tier architecture, not just "three things in the same VPC." Each tier has its own security group, and the rules enforce strict boundaries:

Security Group	Allows Inbound	From
alb-sg	TCP 80	0.0.0.0/0 (the internet)
ecs-sg	TCP 3000	alb-sg only
rds-sg	TCP 5432	ecs-sg only
redis-sg	TCP 6379	ecs-sg only

No security group references a CIDR block except the ALB. Everything else references another security group. This means if an ECS task gets compromised, it can only reach the database and cache, not the internet, not other subnets, not other services.

This is how production environments are designed, and explaining it in an interview immediately signals you understand network security beyond "I opened port 22."

ECS Fargate(Containers Without Servers)

I used Fargate instead of EC2 for the compute layer. No instances to patch, no AMIs to maintain, no Auto Scaling Groups to configure. You define a task (CPU, memory, container image, environment variables) and Fargate runs it.

The task definition connects the app to both RDS and Redis through environment variables:

DB_HOST     → RDS endpoint (injected by Terraform)
DB_PASSWORD → Secrets Manager ARN (resolved at task launch by ECS)
REDIS_HOST  → ElastiCache endpoint (injected by Terraform)

The database password never touches Terraform state as plaintext and never appears in environment variable logs. ECS resolves it from Secrets Manager at runtime using the task execution role's IAM permissions.

One thing I enabled that's worth mentioning: the deployment circuit breaker with rollback. If a new task definition fails to start (bad image, crash loop, health check failure), ECS automatically stops the deployment and rolls back to the last working version. Same concept as the CodeDeploy auto-rollback from my first project, but built into ECS.

The Application(Proving the Architecture Works)

I built a fresh Express API specifically designed to demonstrate all three tiers working together. The key endpoint is GET /items:

First request: queries PostgreSQL, caches the result in Redis for 30 seconds, returns "source": "database".

Second request (within 30s): returns the cached data from Redis, "source": "cache", with 1ms latency.

Any write operation (POST, PUT, DELETE) invalidates the Redis cache so the next read gets fresh data from PostgreSQL. This is a standard cache-aside pattern used in production systems.

The /health endpoint checks both database and cache connectivity. If either is down, it returns a 503, which the ALB detects and stops routing traffic to that task.

{
  "status": "healthy",
  "services": {
    "database": { "connected": true, "time": "2026-03-09T14:50:37.136Z" },
    "cache": { "connected": true, "latency": 1 }
  }
}

Terraform Modules(Reusable Infrastructure)

Instead of one giant Terraform file, I split everything into 6 modules:

modules/
├── vpc/             # Network foundation
├── security-groups/ # Tier boundaries
├── alb/             # Load balancing
├── ecs/             # Container orchestration
├── rds/             # Database
└── elasticache/     # Caching

Each module has its own variables.tf, main.tf, and outputs.tf. The root main.tf wires them together:

module "ecs" {
  source            = "./modules/ecs"
  private_subnet_ids = module.vpc.private_subnet_ids
  security_group_id  = module.security_groups.ecs_sg_id
  target_group_arn   = module.alb.target_group_arn
  db_host            = module.rds.endpoint
  redis_host         = module.elasticache.endpoint
  db_secret_arn      = module.rds.secret_arn
  container_image    = "${aws_ecr_repository.app.repository_url}:latest"
}

The advantage of modules: you can reuse the VPC module for a completely different project, or create dev/staging/prod environments by calling the same modules with different variables. That's the next iteration.

The Problems I Hit

exec format error

I built the Docker image on my Mac (Apple Silicon = ARM) and pushed it to ECR. Fargate runs x86_64. The container started and immediately crashed with exec format error, no other context.

The fix: docker build --platform linux/amd64. Always specify the platform when building for Fargate.

no pg_hba.conf entry

RDS PostgreSQL requires SSL by default. My app was connecting without it. The error message is a PostgreSQL internals reference that doesn't mention SSL at all.

The fix: add ssl: { rejectUnauthorized: false } to the connection pool config.

CannotPullContainerError

I deployed the ECS service before pushing the Docker image to ECR. Fargate couldn't find the image, retried 7 times, and tripped the circuit breaker. After pushing the correct image, new deployments still failed because the breaker was already tripped.

The fix: aws ecs update-service --force-new-deployment resets the circuit breaker and triggers a fresh deployment.

Target type: ip vs instance

Fargate requires target_type = "ip" on the ALB target group. EC2-based services use "instance". Using the wrong one causes silent registration failures where ECS reports the task as running but the ALB never sees it.

Cost Breakdown

For anyone worried about the AWS bill:

Resource	Monthly Cost
NAT Gateway	~$32
ALB	~$16
ElastiCache	~$12
ECS Fargate	~$9
RDS db.t3.micro	Free tier
ECR + Secrets Manager	Minimal
Total	~$70/month

The NAT Gateway is the biggest surprise, it's more expensive than the ALB. In production you'd need it, but for learning, terraform destroy when you're not working saves real money.

What I'd Do Differently

Add HTTPS from the start. ACM + Route53 would make this production-ready. HTTP-only is fine for a demo but wouldn't pass a security review.

Use Terraform workspaces for multi-environment. Right now it's a single environment. The module structure supports dev/staging/prod, just pass different variables. That's the next iteration.

Auto Scaling for ECS. One task is fine for a demo, but production needs scaling policies based on CPU and request count.

CI/CD integration. This project deploys manually with docker push and ecs update-service. Connecting it to CodePipeline (from Project 1) would complete the picture.

What This Proves on a Resume

This project covers territory that most junior/mid-level candidates don't demonstrate:

Custom VPC design: with proper public/private subnet segmentation
ECS Fargate: serverless containers, not just EC2
Multi-tier security: security groups referencing other security groups, not CIDRs
Managed data services: RDS + ElastiCache with proper secret handling
Terraform modules: reusable, composable infrastructure, not flat files
Real debugging: ARM vs x86, SSL requirements, circuit breakers, NAT Gateway necessity

If an interviewer asks "tell me about a complex AWS architecture you've built," this project gives you 20 minutes of material.

DEV Community