DEV Community

Adil Khan
Adil Khan

Posted on

How I Built a Production-Grade Multi-Tier Application on AWS ECS Fargate (A Complete Case Study)

I recently completed a full end-to-end deployment of a multi-tier application on AWS ECS Fargate. What began as a simple “let’s deploy a React app and a Node.js API” turned into a complete production-style cloud architecture that tested everything I’ve learned about DevOps, AWS networking, and container orchestration.

This article is a complete technical breakdown of the project: how the architecture works, the services involved, what went wrong, what I fixed, and how the final system now behaves like a real microservices deployment running inside a production VPC.

I’m sharing this as a learning milestone and a reference for others trying to move from theory to real-world cloud builds.

project architecture

Project Summary

The system is a simple two-service architecture:

• React frontend served by Nginx
• Node.js backend API (/api/message)
• Public ALB for frontend
• Internal ALB for backend
• Two ECS Fargate services with separate task definitions
• ECR repositories for image storage
• 4-subnet VPC (2 public, 2 private)
• SG-to-SG communication for isolation
• CloudWatch logging for both tasks

The result: a fully private backend and a publicly accessible frontend communicating securely inside the VPC.

High-Level Architecture

Below is the same architecture used in many real production microservices deployments:

• Public ALB → Receives internet traffic
• Frontend Fargate tasks in public subnets → Serves UI
• Internal ALB → Receives API calls only from frontend
• Backend Fargate tasks in private subnets → Serve API
• SG chaining → Only frontend → backend allowed
• ECR → Stores container images
• IAM Execution Role → Grants ECS permission to pull images
• CloudWatch Logs → Task logs + debugging
• VPC Endpoints (optional) → To avoid NAT costs

The backend has zero exposure to the public internet. All calls go through the internal ALB.

Network Design

VPC
10.0.0.0/16

Subnets
• Public Subnets (2) → ALB + frontend tasks
• Private Subnets (2) → Backend tasks

Route Tables
• Public subnets → Internet Gateway
• Private subnets → NAT Gateway / VPC Endpoints

Security Groups
• Public ALB SG: Allows HTTP from anywhere
• Frontend SG: Allows ALB → port 80
• Internal ALB SG: Allows frontend SG → port 80
• Backend SG: Allows internal ALB → port 5001

Traffic path:
Internet → Public ALB → Frontend → Internal ALB → Backend

Containers & Dockerfiles

Frontend (Nginx multi-stage build)
• Build React app
• Serve via Nginx
• Expose port 80

Backend (Node.js)
• Express server
• /api/message endpoint
• Expose port 5001

Both built locally → pushed to ECR.

ECR + IAM Setup

Two repositories: frontend, backend.

IAM Role had permissions for:
• ecr:GetAuthorizationToken
• ecr:BatchGetImage
• ecr:BatchCheckLayerAvailability
• logs:CreateLogStream
• logs:PutLogEvents

VPC Endpoints were added for:
• ECR API
• ECR DKR
• S3
• CloudWatch Logs

This fixed image pull timeouts in private subnets.

ECS Design

Cluster
1 ECS cluster (Fargate only).

Task Definitions
One for frontend, one for backend.
Includes CPU/memory, ports, logs, awsvpc mode, IAM roles.

Services
• frontend-service
• backend-service
Desired count: 2 each.

Rolling deployments were used for updates.

Load Balancing & Routing

Frontend
• Public ALB
• Listener: HTTP 80
• Target group: frontend-tg (port 80)
• Health check: /

Backend
• Internal ALB
• Listener: HTTP 80
• Target group: backend-tg (port 5001)
• Health check: /api/message

Frontend → Backend
Frontend uses internal ALB DNS for API calls.

Rolling Deployments

Flow for new image rollout:
1. Push new image
2. Create new task definition revision
3. ECS launches new tasks
4. ALB health checks them
5. Traffic shifts
6. Old tasks drain and stop

I tested multiple updates to see real ENI provisioning, ALB registration, logs, and draining behavior.

Key Metrics From the Deployment

• 0 public IPs on ECS tasks
• Backend remained fully private
• <15ms frontend → backend latency
• 3-minute build → push → deploy cycle
• Zero downtime rolling deployments
• 100% successful health checks
• Multiple revisions without breakage

Challenges & Fixes
1. Target group 404
Cause: wrong path
Fix: use /api/message
2. ECR pull timeout
Cause: tasks in private subnet
Fix: add VPC endpoints for ECR/S3/Logs
3. Frontend couldn’t reach backend
Cause: hardcoded IP
Fix: use internal ALB DNS
4. Rolling update issues
Cause: invalid deployment settings
Fix: correct minHealthyPercent & maxPercent

What I Learned

• How Fargate attaches ENIs inside private subnets
• How ALB target groups determine readiness
• How internal ALBs handle microservice communication
• How IAM least privilege affects ECR/ECS
• How routing works in multi-tier VPCs
• How rolling deployments behave in real time
• How containers, networking, IAM, and load balancing combine to form real systems

Why This Project Mattered

This wasn’t just a deployment. It was a deep dive into how real cloud systems work — with failures, debugging, routing decisions, IAM restrictions, and architecture redesigns.

It brought together:
• VPC networking
• IAM
• ECS
• ECR
• Docker
• Load balancing
• Rolling deployments
• Private service-to-service communication
• Monitoring & logging

GitHub Repository
https://github.com/adil-khan-723/node-app-jenkins1.git

Conclusion

Anyone learning DevOps or AWS should attempt a project like this. It forces you to think like an engineer designing real systems, not just someone running commands. It also builds confidence that you can architect and debug production-style systems from scratch.

If you’re working on similar projects or want to discuss cloud architectures, feel free to reach out.

Top comments (0)