End-to-End Application Deployment Using GitHub, Docker, Terraform, AWS, ECS, CI/CD, Monitoring, and Security
When you used GitHub Pages (github.io), you did not deploy containers, Kubernetes, or ECS.
The workflow was much simpler:
Student
|
V
GitHub Repository
|
V
GitHub Pages
|
V
Public Website
What happened:
- Student created website files
index.html
style.css
app.js
- Student pushed code to GitHub.
git add .
git commit -m "website"
git push
GitHub Pages detected the files.
GitHub Pages copied the static files to GitHub's web hosting infrastructure.
GitHub served those files directly to visitors.
When someone visits:
https://username.github.io
GitHub simply returns:
index.html
style.css
app.js
to the browser.
Is Docker involved?
Possibly internally inside GitHub's infrastructure, but your students do not use Docker.
Students never create:
docker build
docker run
Is Kubernetes involved?
Possibly internally inside GitHub's infrastructure, but students do not use Kubernetes.
Students never create:
kubectl apply
kubectl get pods
What is GitHub Pages good for?
Static websites:
HTML
CSS
JavaScript
Images
Examples:
- Portfolio
- School website
- Documentation
- Landing page
What GitHub Pages cannot do
It cannot run:
NodeJS
Python
Java
.NET
Databases
APIs
For example:
Frontend
Backend API
PostgreSQL
cannot run on GitHub Pages.
If you want to learn real DevOps
The next step after GitHub Pages is:
GitHub
|
GitHub Actions/Jenkins
|
Docker Build
|
ECR
|
ECS Fargate
|
ALB
|
Route53
Then you will see:
- Where containers come from
- Why Docker is needed
- Why ECS/Kubernetes is needed
- How load balancers work
- How production deployments happen
actual lab starts here:
DevOps Lab 1: Containerize Your GitHub Website with Docker
Goal
a few days ago, each of you created a website using Jules, merged the code into GitHub, and viewed it using GitHub Pages.
Today, we will take the same website source code and put it inside a Docker container.
This is the first step from a simple website to real production deployment.
Part 1: What We Are Building
Current workflow
Developer writes code
|
V
GitHub Repository
|
V
GitHub Pages
|
V
Website in browser
GitHub Pages is good for simple static websites.
But in real companies, applications are usually deployed like this:
Developer writes code
|
V
GitHub Repository
|
V
Docker Image
|
V
Container
|
V
Cloud Platform
|
V
Production Website
Today we are doing this part:
GitHub Repository
|
V
Docker Image
|
V
Docker Container
|
V
Website in browser
Part 2: Why DevOps Engineers Use Docker
A DevOps engineer does not only write code or test code. A DevOps engineer helps move application code from developer laptop to production safely and repeatedly.
Docker helps us package the application.
Without Docker, we may have this problem:
It works on my laptop, but it does not work on the server.
With Docker, we package:
Application code
Web server
Runtime environment
Configuration
into one Docker image.
Then the same image can run on:
Student laptop
EC2
ECS
Kubernetes
Production server
This is why enterprise companies use Docker.
Part 3: What Goes Inside Docker?
For this lab, most students have a simple static website.
Example repository:
my-website/
├── index.html
├── style.css
├── script.js
├── images/
└── README.md
Inside Docker, we need the files required to run the website:
index.html
style.css
script.js
images/
We do not need to copy unnecessary files like:
.git/
README.md
notes.txt
But for beginner practice, copying the full project is acceptable.
Part 4: Important Note About Jules Agent
Some students created the website with Jules.
Jules helped generate the code, but Jules does not need to go inside the Docker container.
Docker only needs the final website files.
If the repository has files like:
index.html
style.css
script.js
then we containerize those files.
If the repository has files like:
package.json
src/
app/
next.config.js
then it may be a React or Next.js application, and the Dockerfile will be different.
For today, we will start with the simple static website version.
Part 5: Prerequisites
Each student must have:
- Git installed
- Docker Desktop installed and running
- GitHub repository with their website
- VS Code installed
- Terminal access
Check Docker:
docker --version
Check Git:
git --version
Part 6: Clone Your GitHub Repository
Go to GitHub.
Open your website repository.
Click:
Code → HTTPS → Copy URL
Example:
https://github.com/username/my-website.git
Now open terminal.
Run:
cd Desktop
Clone your repository:
git clone https://github.com/username/my-website.git
Go inside the project folder:
cd my-website
Check files:
ls
You should see something like:
index.html
style.css
script.js
Part 7: Open Project in VS Code
Run:
code .
If code . does not work, open VS Code manually and open the project folder.
Part 8: Create Dockerfile
Inside the root of your project, create a new file named:
Dockerfile
Important:
The file name must be exactly:
Dockerfile
Not:
dockerfile
Dockerfile.txt
docker-file
Add this content:
FROM nginx:alpine
COPY . /usr/share/nginx/html
EXPOSE 80
Explanation:
FROM nginx:alpine
This means we are using Nginx as the web server.
Nginx will serve our website files to the browser.
COPY . /usr/share/nginx/html
This copies our website files from the current folder into the Nginx web folder inside the container.
EXPOSE 80
This documents that the container listens on port 80.
Port 80 is the default HTTP web port.
Docker’s COPY instruction is used to copy files into an image, and this is the key step that places the website source files into the container image.
Part 9: Create .dockerignore
Create another file:
.dockerignore
Add this:
.git
.github
README.md
node_modules
.DS_Store
Why do we need .dockerignore?
It prevents unnecessary files from going into the Docker image.
In enterprise companies, this is important because Docker images should be:
Small
Clean
Secure
Fast to build
Easy to scan
Do not put secrets inside Docker images.
Never put these inside Docker:
AWS keys
Passwords
.env files
Private tokens
SSH keys
Part 10: Build Docker Image
In terminal, make sure you are inside your project folder.
Run:
pwd
Then build the Docker image:
docker build -t my-website:v1 .
Explanation:
docker build
Builds a Docker image.
-t my-website:v1
Gives the image a name and tag.
.
Means Docker should use the current folder as the build context.
Check image:
docker images
You should see:
my-website v1
Part 11: Run Docker Container
Run:
docker run -d -p 8080:80 --name my-website-container my-website:v1
Explanation:
-d
Run container in the background.
-p 8080:80
Maps your laptop port 8080 to container port 80.
--name my-website-container
Gives the container a readable name.
my-website:v1
The image we created.
Part 12: Open Website in Browser
Open:
http://localhost:8080
You should see your website.
This means:
Your GitHub source code
|
V
Docker image
|
V
Docker container
|
V
Browser
Part 13: Check Running Containers
Run:
docker ps
You should see your running container.
Example:
CONTAINER ID IMAGE PORTS
abc123 my-website:v1 0.0.0.0:8080->80/tcp
Part 14: Stop Container
Run:
docker stop my-website-container
Check again:
docker ps
The container should not appear.
Part 15: Start Container Again
Run:
docker start my-website-container
Open again:
http://localhost:8080
Part 16: Remove Container
Stop it first:
docker stop my-website-container
Remove it:
docker rm my-website-container
Part 17: Remove Image
If you want to remove the image:
docker rmi my-website:v1
Only do this after you finish the lab.
Part 18: Common Errors and Fixes
Error 1: Docker is not running
Error:
Cannot connect to the Docker daemon
Fix:
Open Docker Desktop and wait until it says Docker is running.
Error 2: Port already in use
Error:
port is already allocated
Fix:
Use another port:
docker run -d -p 8081:80 --name my-website-container my-website:v1
Open:
http://localhost:8081
Error 3: Container name already exists
Error:
container name is already in use
Fix:
Remove old container:
docker rm my-website-container
If it is running:
docker stop my-website-container
docker rm my-website-container
Error 4: Website does not show correctly
Check if index.html is in the root folder.
Correct:
my-website/
├── index.html
├── style.css
└── script.js
Possible problem:
my-website/
└── website/
├── index.html
├── style.css
└── script.js
If your files are inside a subfolder called website, change Dockerfile:
FROM nginx:alpine
COPY website/ /usr/share/nginx/html
EXPOSE 80
Part 19: Push Dockerfile to GitHub
Now save the Dockerfile and .dockerignore in your repository.
Run:
git status
You should see:
Dockerfile
.dockerignore
Add files:
git add Dockerfile .dockerignore
Commit:
git commit -m "Add Dockerfile for website containerization"
Push:
git push
Now your GitHub repository contains:
Website source code
Dockerfile
.dockerignore
This means another DevOps engineer can clone your repo and build the same image.
Part 20: Enterprise Explanation
In a real company, developers do not manually copy files to servers.
They follow a workflow:
Developer writes code
|
V
Pull Request
|
V
Code Review
|
V
Merge to main
|
V
CI/CD Pipeline
|
V
Docker Image Build
|
V
Image Scan
|
V
Push to Registry
|
V
Deploy to ECS or Kubernetes
|
V
Monitor Logs and Metrics
A DevOps engineer is responsible for making this process:
Automated
Repeatable
Secure
Observable
Reliable
Scalable
Part 21: What Students Must Pay Attention To
1. Repository structure
Make sure the application files are organized.
Bad:
my-website/
├── final-final-index.html
├── copy-style.css
├── old-script.js
Good:
my-website/
├── index.html
├── style.css
├── script.js
├── images/
├── Dockerfile
└── .dockerignore
2. Do not expose secrets
Never commit:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
OpenAI API key
Database password
Private key
Secrets must be stored in:
AWS Secrets Manager
GitHub Secrets
SSM Parameter Store
Kubernetes Secrets
3. Use tags properly
Bad:
docker build -t my-website .
Better:
docker build -t my-website:v1 .
Best in enterprise:
docker build -t my-website:git-commit-id .
Why?
Because in production, we need to know exactly which version is deployed.
4. Keep images small
Use:
FROM nginx:alpine
instead of:
FROM nginx:latest
alpine images are usually smaller.
Smaller images are:
Faster to build
Faster to push
Faster to pull
Easier to scan
5. Test locally before pushing
Before sending the image to cloud, always run locally:
docker build -t my-website:v1 .
docker run -d -p 8080:80 my-website:v1
If it does not work locally, it will not magically work in AWS.
6. Understand the port
Inside container:
80
On laptop:
8080
This command:
docker run -p 8080:80 my-website:v1
means:
Laptop port 8080 → Container port 80
7. Understand image vs container
Docker image:
Template / package / blueprint
Docker container:
Running instance of the image
Example:
Image = class
Container = object
or:
Image = cake recipe
Container = actual cake
Part 22: Student Deliverables
Each student must submit:
- GitHub repository link
- Screenshot of Dockerfile
- Screenshot of successful
docker build - Screenshot of
docker ps - Screenshot of website running at:
http://localhost:8080
- Short explanation:
What is Docker?
What is Docker image?
What is Docker container?
Why do DevOps engineers use Docker?
Part 23: Final Architecture After This Lab
Student GitHub Repo
|
V
Dockerfile
|
V
Docker Image
|
V
Docker Container
|
V
Website running locally
This is the foundation for the next lab.
Part 24: Next Lab Preview
Next lab will be:
Docker Image
|
V
Amazon ECR
|
V
Amazon ECS Fargate
|
V
Application Load Balancer
|
V
Public Production Website
Amazon ECR is used to store Docker images in AWS, and before pushing an image, Docker must authenticate to the target ECR registry. AWS notes that ECR authentication tokens are temporary and valid for 12 hours.
In ECS, a task definition works like the blueprint for the application. It tells ECS which Docker image to use, how much CPU and memory to allocate, and how the container should run.
An Application Load Balancer can be used with ECS services to distribute traffic across running tasks, which is important in production when more than one container is running.
DevOps Lab 2: Deploy Dockerized Website to AWS ECS Fargate
Goal
In Lab 1, you:
GitHub Repository
|
V
Docker Image
|
V
Docker Container
|
V
Website running on localhost
In this lab, we move the application to AWS.
At the end of this lab:
GitHub Repository
|
V
Docker Image
|
V
Amazon ECR
|
V
Amazon ECS Fargate
|
V
Application Load Balancer
|
V
Public Website
You will access your website using a public AWS URL.
Learning Objectives
Students will learn:
- What ECR is
- What ECS is
- What Fargate is
- What a Task Definition is
- What a Service is
- What a Load Balancer is
- How production applications are deployed
Enterprise Perspective
Many students think:
Docker = Production
Wrong.
Docker only creates the package.
Production requires:
Docker Image
+
Image Registry
+
Container Orchestration
+
Networking
+
Load Balancer
+
Monitoring
This is where ECS enters.
Step 1: Architecture Overview
Today we build:
Browser
|
V
Application Load Balancer
|
V
ECS Service
|
V
ECS Task
|
V
Docker Container
|
V
Website
Step 2: Create AWS ECR Repository
Search:
Elastic Container Registry
Open:
Amazon ECR
Click:
Create Repository
Repository name:
student-website
Visibility:
Private
Click:
Create Repository
Why ECR Exists
Without ECR:
Laptop
|
V
Container
AWS cannot access your laptop.
We need a central image registry.
Laptop
|
V
ECR
|
V
ECS
Think of ECR as GitHub for Docker images.
Step 3: Authenticate Docker to AWS
Open CloudShell or Terminal.
Run:
aws configure
Enter:
Access Key
Secret Key
Region
Login:
aws ecr get-login-password \
--region us-east-1 \
| docker login \
--username AWS \
--password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
Expected:
Login Succeeded
Step 4: Tag Docker Image
Check image:
docker images
Example:
student-website:v1
Tag image:
docker tag student-website:v1 \
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/student-website:v1
Why Tag?
Locally:
student-website:v1
AWS requires:
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/student-website:v1
Now AWS knows where the image belongs.
Step 5: Push Image to ECR
Run:
docker push \
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/student-website:v1
Wait for upload.
Refresh ECR.
You should see:
student-website:v1
Step 6: Create ECS Cluster
Search:
Elastic Container Service
Open:
Amazon ECS
Click:
Create Cluster
Cluster name:
student-cluster
Infrastructure:
AWS Fargate
Click:
Create
Why ECS?
Imagine:
100 containers
Questions:
Which server runs them?
Which one failed?
How many copies?
Who restarts failed containers?
ECS manages all of this.
Step 7: Why Fargate?
Without Fargate:
You manage EC2 servers.
You must:
Patch Linux
Upgrade OS
Replace failed servers
Manage capacity
With Fargate:
AWS manages servers.
You only deploy containers.
Step 8: Create Task Definition
Open:
Task Definitions
Click:
Create New Task Definition
Launch Type:
Fargate
Task name:
student-task
CPU:
0.5 vCPU
Memory:
1 GB
Container Section
Container Name:
website
Image URI:
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/student-website:v1
Container Port:
80
Click:
Create
Why Task Definitions Matter
Task Definition is:
Docker Image
CPU
Memory
Port
Environment Variables
combined together.
Think:
Task Definition = Blueprint
Step 9: Create Service
Open cluster.
Click:
Create Service
Choose:
student-task
Launch Type:
Fargate
Desired Tasks:
1
What Is a Service?
Task:
One running container
Service:
Keeps task alive forever
If task crashes:
ECS starts another one
Automatically.
Step 10: Networking
Choose:
Default VPC
Subnets:
Select all public subnets
Assign Public IP:
Enabled
Security Group
Create:
student-web-sg
Inbound:
HTTP 80
Anywhere
Why Security Groups Matter
Security Groups are AWS firewalls.
Bad:
All ports open
Good:
Only required ports
Enterprise rule:
Least Privilege
Step 11: Create Application Load Balancer
Name:
student-alb
Scheme:
Internet-facing
Listener:
HTTP 80
Target Type:
IP
Why ALB Exists
Without ALB:
User
|
Container
One container failure = outage.
With ALB:
User
|
ALB
|
Multiple Containers
Traffic distributes automatically.
Step 12: Connect Service to ALB
Target Group:
student-target-group
Container:
website
Container Port:
80
Create Service.
Wait:
2-5 minutes
Step 13: Verify Deployment
Open:
ECS Cluster
Check:
Tasks = Running
Status:
Healthy
Step 14: Test Website
Open:
Load Balancer DNS Name
Example:
http://student-alb-123456.us-east-1.elb.amazonaws.com
Website should load.
Congratulations.
Your website is now running in AWS.
What Just Happened?
You moved from:
Laptop
to:
AWS Production Environment
Architecture:
GitHub
|
Docker Build
|
ECR
|
ECS Task Definition
|
ECS Service
|
ALB
|
Users
DevOps Engineer Responsibilities
When deploying production applications:
Verify Image
Make sure:
Correct version
Correct tag
No vulnerabilities
Verify Networking
Check:
Security Groups
Subnets
Ports
Verify Health Checks
Make sure:
Container starts
Container stays healthy
Verify Logs
Check:
CloudWatch Logs
Verify Scaling
Can application survive:
10 users?
100 users?
1000 users?
Common Production Issues
Wrong Port
Container:
80
ALB:
8080
Result:
Application unreachable
Wrong Image
Task Definition:
v1
Expected:
v2
Result:
Old application deployed
Security Group Blocked
No inbound:
80
Result:
Website inaccessible
Health Check Failure
ALB health check:
/
Container returns:
500
Result:
Target unhealthy
Student Deliverables
Each student submits:
- ECR Repository screenshot
- Successful Docker Push screenshot
- ECS Cluster screenshot
- ECS Service screenshot
- Running Task screenshot
- ALB screenshot
- Website URL
- Screenshot of website running from ALB DNS
Final Architecture
Developer
|
GitHub
|
Docker Build
|
Amazon ECR
|
Amazon ECS Fargate
|
Application Load Balancer
|
Internet
|
Users
This is the same deployment model used by thousands of enterprise applications today.
Lab 3 Preview
In Lab 3 we will automate everything.
Instead of manually:
Build
Push
Deploy
students will create:
GitHub Push
|
V
GitHub Actions
|
V
Docker Build
|
V
ECR Push
|
V
ECS Deployment
This is where you will begin learning real CI/CD and how enterprise DevOps teams deploy applications automatically.
This is where students finally start feeling like DevOps engineers.
Lab 1: Website → Docker
Lab 2: Docker → ECS
Lab 3: GitHub Push → Automatic Deployment
The goal is:
Developer changes code
|
V
Git Push
|
V
GitHub Actions
|
V
Docker Build
|
V
ECR
|
V
ECS Update
|
V
Production Updated
Before Lab 3:
Student manually:
docker build
docker push
update ECS
After Lab 3:
git push
Everything else happens automatically
LAB 3
CI/CD Pipeline with GitHub Actions
Business Problem
Imagine 50 developers.
Every day:
Developer A pushes code
Developer B pushes code
Developer C pushes code
Developer D pushes code
Should DevOps manually run:
docker build
docker push
update ecs
50 times per day?
No.
This is why CI/CD exists.
What is CI?
Continuous Integration
Whenever code changes:
Compile
Test
Validate
automatically.
What is CD?
Continuous Delivery / Deployment
Whenever code passes tests:
Build
Deploy
automatically.
Final Architecture
Developer
|
V
GitHub
|
V
GitHub Actions
|
+----------------+
| Build Image |
| Security Scan |
| Push To ECR |
+----------------+
|
V
Amazon ECR
|
V
Amazon ECS
|
V
Application Load Balancer
|
V
Users
Step 1
Create IAM User For GitHub
Search:
IAM
Create user:
github-actions-user
Permissions:
AmazonEC2ContainerRegistryFullAccess
AmazonECS_FullAccess
For bootcamp this is okay.
Later we will reduce permissions.
Why?
GitHub must authenticate to AWS.
GitHub needs:
Access Key
Secret Key
to:
Push images
Update ECS
Step 2
Create Access Keys
IAM User
Create:
Access Key
Secret Key
Save them.
Students will use them in GitHub Secrets.
Step 3
Configure GitHub Secrets
Open Repository.
Settings
Secrets and Variables
Actions
New Repository Secret
Create:
AWS_ACCESS_KEY_ID
Create:
AWS_SECRET_ACCESS_KEY
Create:
AWS_REGION
Example:
us-east-1
Create:
AWS_ACCOUNT_ID
Example:
123456789012
Why Secrets?
Never do this:
AWS_SECRET_ACCESS_KEY: abc123
inside source code.
If somebody steals repository:
AWS account compromised
Secrets protect credentials.
Step 4
Create GitHub Actions Folder
Inside repository:
.github/
└── workflows/
Create:
deploy.yml
Why?
GitHub automatically reads:
.github/workflows
and executes workflows.
Step 5
Create Workflow
deploy.yml
name: Deploy Website
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Login ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: Build Image
run: |
docker build -t website .
docker tag website:latest \
${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com/student-website:latest
- name: Push Image
run: |
docker push \
${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com/student-website:latest
What Does This Pipeline Do?
When student runs:
git push
GitHub:
Downloads code
Builds Docker image
Logs into AWS
Pushes image to ECR
automatically.
Step 6
Commit Workflow
git add .
git commit -m "Add CI/CD"
git push
Step 7
Watch Pipeline
GitHub
Actions
Students will see:
Workflow Running
Then:
Success
What Just Happened?
Nobody manually executed:
docker build
docker push
GitHub did it.
This is CI/CD.
Enterprise Discussion
Students should understand:
A DevOps engineer is NOT paid to click buttons.
A DevOps engineer is paid to automate.
Bad DevOps:
Build manually
Deploy manually
Good DevOps:
Push code
Everything automatic
Step 8
Deploy New Version
Modify:
<h1>Version 2</h1>
Commit:
git add .
git commit -m "version2"
git push
Pipeline runs.
New image created.
Enterprise Problem
ECR now contains:
latest
But ECS still runs:
old container
Why?
ECS only starts new containers when told.
Step 9
Force New Deployment
Pipeline:
- name: Deploy ECS
run: |
aws ecs update-service \
--cluster student-cluster \
--service student-service \
--force-new-deployment
Now pipeline:
Build Image
Push Image
Restart ECS Service
What Happens?
ECS:
Old Task
↓
New Task
↓
Pull Latest Image
↓
Run New Version
Production Deployment Flow
Student changes:
index.html
Pushes:
git push
Automatically:
GitHub Actions
|
Build Docker
|
Push ECR
|
Restart ECS
|
Pull Latest Image
|
Deploy
No AWS Console.
No manual work.
What DevOps Engineers Monitor
Students must learn:
Pipeline Status
Success
Failed
Build Logs
Docker build errors
AWS Authentication
Credential failures
ECS Deployment
New task healthy?
Website
Application accessible?
Common Failures
Wrong Secret
Invalid AWS credentials
Pipeline fails.
Dockerfile Broken
docker build failed
Pipeline fails.
ECR Permission Missing
access denied
Pipeline fails.
ECS Service Name Wrong
service not found
Deployment fails.
Student Deliverables
Each student submits:
- GitHub Actions workflow file
- Successful workflow screenshot
- ECR image screenshot
- ECS deployment screenshot
- Production URL
- Screenshot showing Version 2 deployed automatically
End Result
Students now understand:
Developer
|
GitHub
|
GitHub Actions
|
Docker
|
ECR
|
ECS
|
Load Balancer
|
Users
This is the first complete production-grade CI/CD pipeline.
Lab 4 should be Terraform, where students stop creating ECR, ECS, ALB, Security Groups, and networking manually and instead create the entire AWS infrastructure from code. That is usually the point where students start thinking like infrastructure engineers rather than application deployers.
Lab 4 is where students stop being "people who deploy applications" and start becoming Infrastructure Engineers / DevOps Engineers.
Up to now:
Lab 1
Website -> Docker
Lab 2
Docker -> ECS
Lab 3
GitHub -> CI/CD -> ECS
But there is a huge problem.
Imagine the company says:
We need 50 environments.
Dev
QA
UAT
Stage
Production
Will DevOps engineers manually click:
Create VPC
Create ALB
Create ECS
Create Security Groups
Create Target Groups
Create ECR
50 times?
No.
This is why Terraform exists.
LAB 4
Infrastructure as Code (Terraform)
Goal
Current situation:
Developer
|
GitHub
|
GitHub Actions
|
ECR
|
ECS
|
ALB
But everything was created manually.
Goal:
Terraform
|
+-- VPC
+-- Subnets
+-- Security Groups
+-- ALB
+-- ECS
+-- ECR
Everything created from code.
Enterprise Problem
Imagine:
Production breaks.
You must rebuild everything.
Without Terraform:
Nobody remembers:
- SG rules
- ECS settings
- ALB config
- Subnets
Disaster.
With Terraform:
terraform apply
Everything recreated.
Architecture
Students will build:
Terraform
|
V
VPC
|
Subnets
|
ALB
|
ECS
|
ECR
Step 1
Create Repository
Create new repo:
terraform-infrastructure
Structure:
terraform/
|
├── provider.tf
├── variables.tf
├── main.tf
├── outputs.tf
├── terraform.tfvars
Why Separate Repo?
Application repo:
Website Code
Infrastructure repo:
AWS Resources
Enterprise companies usually separate them.
Step 2
Install Terraform
Verify:
terraform version
Expected:
Terraform v1.x.x
Step 3
Create Provider
provider.tf
provider "aws" {
region = "us-east-1"
}
Why Provider?
Terraform supports:
- AWS
- Azure
- GCP
- GitHub
- Kubernetes
Provider tells Terraform:
Talk to AWS
Step 4
Configure Credentials
Never hardcode:
access_key="xxxx"
secret_key="xxxx"
Use:
aws configure
Verify:
aws sts get-caller-identity
Step 5
Create ECR
main.tf
resource "aws_ecr_repository" "website" {
name = "student-website"
}
Why?
Before:
Student manually created ECR.
Now:
Terraform creates ECR.
Step 6
Create VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "student-vpc"
}
}
Enterprise Explanation
Everything lives inside a VPC.
Think:
AWS Datacenter
|
V
VPC
|
V
Your Private Network
Step 7
Create Public Subnets
resource "aws_subnet" "public1" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
}
resource "aws_subnet" "public2" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.2.0/24"
availability_zone = "us-east-1b"
}
Why Two Subnets?
Enterprise applications require:
High Availability
If one AZ dies:
Other AZ survives
Step 8
Create Security Group
resource "aws_security_group" "web" {
name = "web-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
Why Security Groups?
Security Groups:
AWS Firewall
Protect resources.
Step 9
Create Application Load Balancer
resource "aws_lb" "website" {
name = "student-alb"
internal = false
load_balancer_type = "application"
security_groups = [
aws_security_group.web.id
]
subnets = [
aws_subnet.public1.id,
aws_subnet.public2.id
]
}
Why ALB?
ALB distributes traffic:
Users
|
ALB
|
Containers
Step 10
Create ECS Cluster
resource "aws_ecs_cluster" "main" {
name = "student-cluster"
}
Why ECS Cluster?
Think:
ECS Cluster
as a parking lot.
Containers park inside.
Step 11
Terraform Init
Run:
terraform init
What Happens?
Terraform downloads:
AWS Provider
Plugins
Dependencies
Step 12
Terraform Validate
terraform validate
Expected:
Success
Step 13
Terraform Plan
terraform plan
Example:
+ create VPC
+ create ALB
+ create ECS
+ create ECR
Why Plan?
Plan allows engineers to review:
What will change?
Before touching production.
Step 14
Terraform Apply
terraform apply
Type:
yes
Terraform creates:
VPC
Subnets
Security Group
ALB
ECS
ECR
Enterprise Rule
Never do:
terraform apply -auto-approve
against production.
Always review.
Step 15
Verify
AWS Console:
Check:
VPC
Subnets
Security Groups
ALB
ECS
ECR
Everything should exist.
Step 16
Destroy Environment
This is the coolest part.
Run:
terraform destroy
Confirm:
yes
Terraform removes:
ALB
ECS
ECR
Subnets
VPC
Why Destroy?
Cloud costs money.
DevOps engineers often create:
Temporary environments
for:
Developers
QA
Testing
Training
Destroying saves money.
Important DevOps Concepts
Desired State
Terraform says:
I want:
1 VPC
2 Subnets
1 ECS Cluster
1 ALB
Terraform compares:
Desired State
vs
Current State
and fixes differences.
State File
Terraform creates:
terraform.tfstate
Think:
Database of infrastructure
Never delete it.
Enterprise companies store state in:
Amazon S3
and lock it using:
Amazon DynamoDB
What Students Must Understand
Without Terraform:
Click
Click
Click
Click
Click
No documentation.
No repeatability.
No automation.
With Terraform:
Code
Commit
Review
Apply
Infrastructure becomes:
Version Controlled
Auditable
Repeatable
Recoverable
Student Deliverables
Each student submits:
- GitHub Repository with Terraform code
- Screenshot of terraform init
- Screenshot of terraform plan
- Screenshot of terraform apply
- Screenshot showing:
- VPC
- Subnets
- Security Group
- ECS Cluster
- ALB
- ECR
- Screenshot of terraform destroy
Lab 5 Preview
Lab 5 is where everything becomes enterprise-grade.
Students will create:
Terraform
|
GitHub
|
Pull Request
|
GitHub Actions
|
Terraform Plan
|
Manager Approval
|
Terraform Apply
|
AWS Infrastructure
At that point they will have:
Infrastructure as Code
+
CI/CD
+
Containerization
+
Cloud Deployment
which is very close to how modern DevOps teams operate in production.
Lab 5 is where students stop being "people who know tools" and start understanding how enterprise DevOps teams actually work.
Up to now they learned:
Lab 1 Website -> Docker
Lab 2 Docker -> ECS
Lab 3 GitHub Actions -> CI/CD
Lab 4 Terraform -> Infrastructure as Code
But there is still a huge problem.
Everything is being done by one person.
Real companies do not work like that.
LAB 5
Enterprise DevOps Workflow
Goal
Students will simulate a real company.
Instead of:
Developer
|
Production
they will build:
Developer
|
Pull Request
|
Code Review
|
Terraform Plan
|
Approval
|
Terraform Apply
|
Production
Business Problem
Imagine:
A developer accidentally changes:
resource "aws_db_instance"
and deletes production database.
Who should stop this?
Terraform?
No.
GitHub?
No.
DevOps Process.
This is why enterprise companies use:
Pull Requests
Approvals
Code Reviews
Change Control
Architecture
Students will build:
Developer
|
Feature Branch
|
Pull Request
|
GitHub Actions
|
Terraform Plan
|
Approval
|
Merge
|
Terraform Apply
|
AWS
Learning Objectives
Students learn:
- Git workflow
- Pull Requests
- Branching strategy
- Terraform Plan
- Terraform Apply
- Approvals
- Production deployment
- Change management
Step 1
Create Branch
Never work directly on:
main
Create:
git checkout -b feature/new-alb
Verify:
git branch
Output:
main
* feature/new-alb
Enterprise Rule
Bad:
Developer changes production directly
Good:
Developer
|
Feature Branch
Step 2
Modify Terraform
Example:
Add new security group.
Example:
resource "aws_security_group" "web" {
...
}
Commit:
git add .
git commit -m "Add web security group"
Push:
git push origin feature/new-alb
Step 3
Create Pull Request
GitHub:
Compare & Pull Request
Create PR.
Title:
Add Security Group For Web Layer
Why Pull Requests?
Pull Requests allow:
Review
Discussion
Approval
Audit Trail
Enterprise Example
Developer writes:
cidr_blocks = ["0.0.0.0/0"]
Reviewer asks:
Why open to the internet?
Bug found before production.
Step 4
Terraform Plan in GitHub Actions
Create:
.github/workflows/terraform-plan.yml
Example:
name: Terraform Plan
on:
pull_request:
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init
- run: terraform plan
What Happens?
Developer opens PR.
Automatically:
Terraform Init
Terraform Validate
Terraform Plan
runs.
Why?
Before merge:
Everyone sees:
What will Terraform change?
Example Output
+ Create Security Group
+ Create ALB
~ Modify ECS Service
Management can review.
Step 5
Add Validation
Add:
- run: terraform fmt -check
- run: terraform validate
Why?
Checks:
Syntax
Formatting
Errors
before deployment.
Enterprise Rule
Bad:
Merge first
Fix later
Good:
Validate first
Merge later
Step 6
Branch Protection
GitHub
Settings
Branches
Add Rule
Protect:
main
Require:
Pull Request
Review
Successful Checks
Why?
Nobody can directly push:
git push origin main
Enterprise Example
Without protection:
Developer deletes VPC
Pushes code
Production outage
With protection:
Review required
Mistake caught.
Step 7
Reviewer Approval
Student A creates PR.
Student B reviews.
Checklist:
Terraform valid?
Naming correct?
Resources required?
Security issues?
Approve.
Why?
Four eyes principle.
At least two people see changes.
Step 8
Merge PR
After approval:
Merge Pull Request
Now code reaches:
main
Step 9
Production Pipeline
Create:
terraform-apply.yml
Trigger:
on:
push:
branches:
- main
Pipeline:
terraform init
terraform validate
terraform apply
What Happens?
Developer merges.
Automatically:
Terraform Apply
runs.
Infrastructure updates.
Enterprise Workflow
Feature Branch
|
Pull Request
|
Terraform Plan
|
Approval
|
Merge
|
Terraform Apply
|
AWS Updated
Step 10
Remote State
Current:
terraform.tfstate
stored locally.
Dangerous.
Laptop lost:
State lost
Create S3 Backend
terraform {
backend "s3" {
bucket = "student-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
}
}
Why?
State shared by team.
Step 11
State Locking
Create DynamoDB table.
Terraform:
dynamodb_table = "terraform-locks"
Why?
Prevent:
Engineer A
Engineer B
running:
terraform apply
at same time.
Enterprise Problem
Without locking:
Corrupted state
Broken infrastructure
Step 12
Environment Strategy
Current:
One Environment
Enterprise:
Dev
QA
Stage
Production
Example
terraform/dev
terraform/qa
terraform/prod
or
terraform workspace
Why?
Never test directly in production.
Step 13
Approval Gates
Production pipeline:
Terraform Plan
|
Manager Approval
|
Terraform Apply
GitHub Environments:
Production
Require:
Manual Approval
before apply.
Enterprise Example
Black Friday.
Someone changes:
Load Balancer
Management wants review.
Approval gate prevents accidents.
DevOps Engineer Responsibilities
Students must understand:
A DevOps engineer is not paid for:
Creating EC2
Creating ECS
Running Terraform
They are paid for:
Preventing outages
Building automation
Reducing risk
Creating repeatable deployments
What Students Must Pay Attention To
Security
Never:
0.0.0.0/0 everywhere
Cost
Always review:
terraform plan
before apply.
Naming
Use:
dev-web-alb
qa-web-alb
prod-web-alb
not:
alb1
alb2
alb3
State
Never manually edit:
terraform.tfstate
Student Deliverables
Each student submits:
Git
- Feature branch screenshot
- Pull Request screenshot
GitHub Actions
- Terraform Plan successful
Review
- PR approved by another student
Terraform
- Successful Apply
AWS
Screenshots showing:
VPC
Subnets
ALB
Security Groups
ECS
ECR
Architecture Diagram
Student must draw:
Developer
|
Feature Branch
|
Pull Request
|
GitHub Actions
|
Terraform Plan
|
Approval
|
Terraform Apply
|
AWS
Lab 6 Preview
Lab 6 is where the application becomes a true enterprise application.
Students will split their website into:
Frontend
Backend API
Database
and deploy:
React Frontend
|
NodeJS API
|
PostgreSQL RDS
using:
Terraform
+
GitHub Actions
+
ECS Fargate
+
ALB
+
Route53
This is usually the first lab where students see a complete production architecture similar to what many companies run today.
This is the lab where students finally understand why microservices exist.
Up to Lab 5 they deployed a simple website.
The problem is:
Everything is one application.
In enterprise companies, applications are usually separated.
For example:
Netflix
Frontend
Backend APIs
Authentication
Payments
Recommendations
Database
Monitoring
all separate services.
LAB 6
Transform Website into a Microservices Application
Business Problem
Current architecture:
Browser
|
Website
Problem:
If one piece fails:
Entire application fails
Problem:
One developer changes website.
Another changes API.
They interfere with each other.
Problem:
Cannot scale independently.
Goal
Convert student website into:
Browser
|
Frontend
|
Backend API
|
Database
What Students Will Build
Architecture:
Internet
|
Application Load Balancer
|
Frontend Container
|
Backend Container
|
RDS PostgreSQL
Real Enterprise Example
Think about Amazon.
Frontend:
amazon.com
Backend:
Products API
Orders API
Users API
Database:
PostgreSQL
Aurora
DynamoDB
Separate systems.
Step 1
Create New Repositories
Current:
student-website
Split into:
frontend
backend
terraform-infra
Why?
Different teams may own:
Frontend Team
Backend Team
Infrastructure Team
Step 2
Frontend Service
Students keep:
index.html
style.css
app.js
Convert into:
frontend/
|
├── index.html
├── style.css
├── app.js
├── Dockerfile
Container:
FROM nginx:alpine
COPY . /usr/share/nginx/html
Purpose
Frontend should only:
Display information
Collect user input
Call APIs
Step 3
Create Backend API
Folder:
backend/
|
├── app.js
├── package.json
├── Dockerfile
Simple API:
const express = require("express");
const app = express();
app.get("/students",(req,res)=>{
res.json([
{
name:"John"
}
]);
});
app.listen(5000);
Why Backend?
Frontend should not contain business logic.
Bad:
HTML
|
Database
Good:
Frontend
|
Backend API
|
Database
Step 4
Backend Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 5000
CMD ["node","app.js"]
What Students Learn
One application now has:
Frontend Image
Backend Image
instead of:
One Giant Image
Step 5
Build Images
Frontend:
docker build -t frontend:v1 .
Backend:
docker build -t backend:v1 .
Verify:
docker images
Step 6
Run Locally
Backend:
docker run -d -p 5000:5000 backend:v1
Frontend:
docker run -d -p 8080:80 frontend:v1
Verify API
Open:
http://localhost:5000/students
Should return:
[
{
"name":"John"
}
]
Step 7
Frontend Calls API
JavaScript:
fetch("http://localhost:5000/students")
Now:
Frontend
|
Backend
communicate.
Enterprise Concept
This is called:
Service Communication
Every modern application does this.
Step 8
Create RDS Database
Terraform:
resource "aws_db_instance" "postgres" {
engine = "postgres"
instance_class = "db.t3.micro"
allocated_storage = 20
username = "postgres"
password = "ChangeMe123!"
}
Why Database?
Current API:
Hardcoded Data
Bad.
Need:
Persistent Storage
Step 9
Store Data in Database
Backend:
GET /students
POST /students
DELETE /students
Data stored in:
PostgreSQL
Enterprise Concept
Application should never store data:
Inside Container
Containers die.
Data must survive.
Step 10
ECS Architecture
Current:
One ECS Service
New:
Frontend ECS Service
Backend ECS Service
Architecture
ALB
|
|--- Frontend Service
|
|--- Backend Service
Step 11
ALB Routing
Frontend:
/
Backend:
/api/*
Example:
/
goes to:
Frontend
Example:
/ api / students
goes to:
Backend
Why?
Users see:
https://website.com
But ALB routes traffic internally.
Step 12
Push Images to ECR
Students now have:
frontend:v1
backend:v1
Push both.
Repositories:
frontend
backend
inside ECR.
Step 13
GitHub Actions
Frontend pipeline:
Build Frontend
Push ECR
Deploy ECS
Backend pipeline:
Build Backend
Push ECR
Deploy ECS
Separate pipelines.
Enterprise Concept
Frontend deployment should not break backend deployment.
Teams deploy independently.
Step 14
Environment Variables
Never hardcode:
password="123"
Use:
DATABASE_HOST
DATABASE_USER
DATABASE_PASSWORD
from ECS environment variables.
Enterprise Security
Even better:
AWS Secrets Manager
Step 15
Monitoring
Add:
CloudWatch Logs
For:
Frontend Logs
Backend Logs
What DevOps Engineers Check
Frontend:
Is website loading?
Backend:
Are APIs responding?
Database:
Is RDS healthy?
Enterprise Scaling Example
Current traffic:
100 users
Need:
5000 users
Scale:
Backend x10
Keep:
Frontend x2
Microservices allow independent scaling.
Common Problems
Frontend Cannot Reach API
Bad:
localhost
inside ECS.
Must use:
Internal ALB
Service Discovery
DNS
Database Security Group
Backend cannot connect.
Need:
5432 allowed
from backend SG.
Hardcoded Secrets
Bad:
GitHub repository
Good:
Secrets Manager
Student Deliverables
Frontend
- GitHub Repository
- Dockerfile
- Running ECS Service
Backend
- GitHub Repository
- Dockerfile
- Running ECS Service
Database
- RDS Screenshot
Architecture Diagram
Browser
|
ALB
|
Frontend ECS
|
Backend ECS
|
PostgreSQL RDS
GitHub Actions
- Frontend pipeline
- Backend pipeline
What Students Learn
At the end of Lab 6 they understand:
Frontend
Backend
Database
Containers
ECS
ALB
Terraform
GitHub Actions
CI/CD
RDS
This is the first architecture that starts looking like a real enterprise application.
Lab 7 Preview
Lab 7 will introduce:
Prometheus
Grafana
CloudWatch
Loki
Alerting
Students will learn:
How to know something is broken
How to collect metrics
How to troubleshoot production issues
How DevOps engineers monitor applications
Because a production application is not finished when it is deployed. A production application is finished only when it can be monitored, troubleshot, and recovered when something goes wrong.
LAB 7 – Monitoring, Logging, Alerting & Troubleshooting
This is one of the most important DevOps labs.
Many junior engineers think:
Deploy = Job Done
In reality:
Deploy = Beginning of Operations
The first question management asks after deployment is:
"How do we know if the application is healthy?"
If your answer is:
I open browser and check.
You are not operating at enterprise level.
Business Scenario
Your architecture now looks like:
Internet
|
ALB
|
Frontend ECS
|
Backend ECS
|
RDS PostgreSQL
Imagine:
Website suddenly slow
Management asks:
Why?
Can students answer?
No.
Because they have:
No Metrics
No Logs
No Alerts
This lab fixes that.
Goal
Students will build:
Frontend ECS
|
Backend ECS
|
CloudWatch Logs
|
Prometheus
|
Grafana
|
Alerting
Learning Objectives
Students will understand:
- What monitoring is
- What logging is
- What metrics are
- What alerts are
- Difference between CloudWatch and Prometheus
- Difference between logs and metrics
- Root cause analysis
Enterprise Architecture
Users
|
ALB
|
Frontend ECS
|
Backend ECS
|
RDS
|
--------------------
Monitoring Stack
--------------------
CloudWatch
Prometheus
Grafana
Alerting
Part 1
Understanding Logs
Example:
Backend error:
Database Connection Failed
How do we know?
Logs.
Example log:
2026-08-15 10:15:22 ERROR Database Connection Failed
Logs tell:
What happened
When
Where
What DevOps Engineers Use Logs For
Examples:
Application crash
Database failure
Memory issue
Security event
Without logs:
Guessing
With logs:
Evidence
Part 2
CloudWatch Logs
Open ECS Service.
Enable:
CloudWatch Logging
Container definition:
logConfiguration
Example:
{
"logDriver":"awslogs"
}
Why CloudWatch?
AWS automatically stores:
Container Logs
Application Logs
System Logs
Verify Logs
Open:
CloudWatch
Navigate:
Log Groups
Students should see:
/frontend
/backend
Exercise
Break API intentionally.
Example:
throw new Error("Database Failed");
Deploy.
Find error inside CloudWatch.
Lesson
Production engineers spend enormous amounts of time reading logs.
Part 3
Metrics
Logs tell:
What happened
Metrics tell:
How healthy system is
Examples:
CPU
Memory
Requests
Latency
Errors
Example
Website loads slowly.
Metrics show:
CPU = 98%
Problem found.
Part 4
Install Prometheus
Create ECS Service:
Prometheus
Container image:
prom/prometheus
Port:
9090
What Prometheus Does
Prometheus collects:
CPU
Memory
Requests
Response Time
Errors
every few seconds.
Think:
CloudWatch = AWS monitoring
Prometheus = Application monitoring
Architecture
Backend
|
Prometheus
Prometheus continuously collects metrics.
Part 5
Expose Application Metrics
Backend application:
Install:
npm install prom-client
Add endpoint:
/app.get("/metrics")
Example:
http://backend:5000/metrics
What Happens?
Prometheus visits:
/metrics
every few seconds.
Collects:
Request Count
Errors
Response Time
Enterprise Concept
This is called:
Instrumentation
Applications must expose metrics.
Part 6
Install Grafana
Create ECS Service:
grafana/grafana
Port:
3000
Open:
http://grafana:3000
Default:
admin
admin
What Grafana Does
Prometheus stores data.
Grafana visualizes data.
Example:
CPU Usage Chart
Memory Usage Chart
Request Count
Error Count
Architecture
Application
|
Prometheus
|
Grafana
Exercise
Create dashboard:
CPU
Memory
Requests
Errors
What DevOps Engineers Watch Daily
Infrastructure
CPU
Memory
Disk
Network
Application
Response Time
Errors
Requests
Availability
Database
Connections
Latency
Storage
Part 7
Alerting
Management does not want:
Engineer watching dashboard 24 hours
Need:
Automatic alerts
Example Alert
If:
CPU > 80%
for 5 minutes
Send:
Email
Slack
Teams
PagerDuty
Create Alert
Grafana Alert:
CPU > 80%
Trigger:
Email Notification
Enterprise Example
2 AM:
Application crashes
Alert fires.
Engineer wakes up.
Problem fixed.
Part 8
CloudWatch Alarms
Create:
ECS CPU Alarm
Condition:
CPU > 75%
Action:
SNS Notification
Why CloudWatch Alarms?
AWS infrastructure monitoring.
Examples:
ECS
ALB
RDS
Lambda
Part 9
Load Testing
Install:
ApacheBench
or
k6
Generate:
100 requests
1000 requests
Observe:
CPU
Memory
Response Time
inside Grafana.
What Students Learn
Applications behave differently under load.
Part 10
Troubleshooting Exercise
Instructor intentionally breaks:
Scenario 1
Backend stopped.
Students must identify:
CloudWatch Logs
Grafana
Prometheus
Scenario 2
Database unavailable.
Students identify:
Connection failures
inside logs.
Scenario 3
High CPU.
Students identify:
Metric spike
inside Grafana.
Root Cause Analysis
DevOps engineers do not say:
System down.
They explain:
Backend container restarted.
Database connections exhausted.
CPU reached 95%.
Response time increased.
Service unavailable.
Student Deliverables
Monitoring
Screenshot:
Prometheus Targets
Grafana
Dashboard showing:
CPU
Memory
Requests
Errors
CloudWatch
Log Group screenshot.
Alerting
Alert rule screenshot.
Troubleshooting
Student documents:
Problem
Root Cause
Fix
What Students Have Learned So Far
After Lab 7:
GitHub
Docker
ECR
ECS
ALB
Terraform
GitHub Actions
Frontend
Backend
RDS
CloudWatch
Prometheus
Grafana
Alerting
At this point they understand how a real production application is built and monitored.
LAB 8 Preview
Lab 8 is usually where I introduce:
Security
Secrets Manager
IAM
WAF
SSL/TLS
Vulnerability Scanning
Trivy
Least Privilege
because the next question management asks is:
"The application works. Is it secure?"
That is where students begin learning DevSecOps.
LAB 8 – DevSecOps: Securing the Production Environment
This lab changes the mindset of students.
Up to now they learned:
GitHub
Docker
ECR
ECS
Terraform
RDS
GitHub Actions
Prometheus
Grafana
CloudWatch
The application works.
Management asks:
"What happens if we get hacked tomorrow?"
Most junior engineers focus on deployment.
Senior DevOps engineers focus on:
Availability
Reliability
Security
Compliance
Business Scenario
Current architecture:
Internet
|
ALB
|
Frontend ECS
|
Backend ECS
|
RDS
Potential problems:
Hardcoded passwords
Open Security Groups
Exposed API Keys
Vulnerable Docker Images
No SSL
No WAF
Overprivileged IAM Roles
Goal
Secure the entire application.
Final architecture:
Internet
|
AWS WAF
|
HTTPS (SSL)
|
ALB
|
Frontend ECS
|
Backend ECS
|
Secrets Manager
|
RDS
Learning Objectives
Students will learn:
- IAM Least Privilege
- Secrets Manager
- SSL/TLS
- AWS WAF
- Container Security
- Trivy Scanning
- Image Hardening
- Security Groups
- Compliance Concepts
- Security in CI/CD
Part 1
Principle of Least Privilege
Most beginners create:
AdministratorAccess
for everything.
Bad practice.
Example
Developer needs:
Push Docker Images
Permission:
ECR Access
Only.
Not:
AdministratorAccess
Exercise
Current:
GitHub Actions User
Permissions:
AdministratorAccess
Students replace with:
AmazonEC2ContainerRegistryFullAccess
AmazonECS_FullAccess
Or even more restrictive custom policies.
Enterprise Rule
Always ask:
What is minimum access required?
Part 2
Secrets Manager
Current backend:
DB_PASSWORD="mypassword"
inside source code.
Very bad.
Problem
Repository leaked.
Now attacker knows:
Database Password
API Keys
Tokens
Solution
Store secrets in:
AWS Secrets Manager
Create Secret
Store:
DB_USERNAME
DB_PASSWORD
OPENAI_API_KEY
Backend Reads Secret
Instead of:
password="mypassword"
Use:
process.env.DB_PASSWORD
Enterprise Concept
Never hardcode:
Passwords
Tokens
Keys
Secrets
inside:
GitHub
Docker Images
Terraform Files
Part 3
Security Groups Review
Current:
0.0.0.0/0
everywhere.
Exercise
Review:
ALB
Allow:
80
443
from internet.
ECS
Allow:
80
5000
only from ALB SG.
RDS
Allow:
5432
only from Backend SG.
Enterprise Concept
Never expose database directly.
Bad:
Internet
|
PostgreSQL
Good:
Internet
|
ALB
|
Backend
|
Database
Part 4
Enable HTTPS
Current:
http://
Not encrypted.
Problem
Attacker can capture:
Passwords
Session IDs
Personal Data
Create Certificate
Use:
AWS Certificate Manager
Request certificate:
studentdomain.com
Attach To ALB
ALB Listener:
443 HTTPS
Result
Traffic becomes:
Encrypted
Enterprise Concept
Never expose login pages over HTTP.
Part 5
AWS WAF
What if someone sends:
1 million requests
or SQL Injection attempts?
Add WAF
Create:
AWS WAF
Attach to ALB.
Enable managed rules:
SQL Injection
Cross Site Scripting
Known Bad Inputs
Bot Protection
Enterprise Concept
WAF protects before requests reach application.
Part 6
Container Vulnerability Scanning
Current image:
FROM node:20
May contain vulnerabilities.
Install Trivy
Students install:
brew install trivy
or
sudo apt install trivy
Scan Image
trivy image backend:v1
Output:
CRITICAL
HIGH
MEDIUM
LOW
Exercise
Students identify:
High Vulnerabilities
and document findings.
Enterprise Concept
Every production image should be scanned before deployment.
Part 7
Security in GitHub Actions
Current:
Build
Push
Deploy
Add Security Stage
Pipeline:
Checkout
|
Trivy Scan
|
Build
|
Push
|
Deploy
Rule
If vulnerabilities found:
Pipeline Fails
Why?
Stop insecure software before production.
Part 8
Docker Image Hardening
Bad:
FROM ubuntu
Huge image.
Better:
FROM node:20-alpine
Benefits
Smaller
Faster
Less Attack Surface
Enterprise Rule
Use smallest possible image.
Part 9
IAM Roles for ECS
Current:
Access Keys
inside containers.
Bad.
Solution
Create:
Task Role
Attach permissions.
Example:
Read Secrets Manager
Benefits
No access keys stored.
Enterprise Concept
Containers should use IAM Roles, not credentials.
Part 10
Logging Security Events
CloudWatch Logs.
Log:
Failed Login
Unauthorized Access
API Errors
Suspicious Requests
Exercise
Students create:
Security Log Group
Part 11
Compliance Discussion
Introduce:
SOC2
HIPAA
PCI DSS
ISO 27001
Example
Healthcare Application:
Need:
Encryption
Audit Logs
Access Controls
Banking Application
Need:
Least Privilege
Monitoring
Incident Response
Part 12
Security Incident Simulation
Scenario:
GitHub repository leaked
Students answer:
What secrets exposed?
What rotates?
What logs reviewed?
What systems affected?
Scenario 2
Database exposed.
Students answer:
How discovered?
How isolated?
How fixed?
Scenario 3
Critical Trivy finding.
Students answer:
Can deployment continue?
Who approves exception?
What DevOps Engineers Must Check Daily
IAM
Unused Users
Unused Roles
Excessive Permissions
Containers
Vulnerabilities
Outdated Images
Infrastructure
Open Ports
Public Resources
Secrets
Rotation
Expiration
Access
Student Deliverables
Security
Screenshots:
Secrets Manager
IAM Role
Security Groups
HTTPS Listener
WAF Rules
Container Security
Trivy Scan Report
CI/CD
Pipeline with:
Security Scan Stage
Documentation
Student writes:
Top 5 security risks
How risks were mitigated
What Students Have Learned After Lab 8
GitHub
Docker
ECR
ECS
Terraform
GitHub Actions
Frontend
Backend
RDS
CloudWatch
Prometheus
Grafana
Alerting
IAM
Secrets Manager
WAF
SSL/TLS
Trivy
DevSecOps
At this point students can build, deploy, monitor, and secure a production application.
LAB 9 Preview
Lab 9 is usually the capstone project:
High Availability
Auto Scaling
Blue/Green Deployment
Disaster Recovery
Multi-AZ
Backup Strategy
Route53
Production Readiness Review
This is where students learn how large companies keep applications available even when servers, containers, databases, or entire availability zones fail.
LAB 9 – Production Readiness, High Availability, Auto Scaling & Disaster Recovery
This is the capstone project.
Up to now students built:
Frontend
Backend
RDS
Terraform
GitHub Actions
Docker
ECS
Monitoring
Security
The application works.
The application is secure.
The application is monitored.
Management now asks:
"What happens if an Availability Zone fails?"
"What happens if traffic increases 100x?"
"What happens if a deployment breaks production?"
"What happens if somebody deletes the database?"
This is where real DevOps engineering begins.
Goal
Students will transform:
Good Application
into
Production-Ready Application
Final Architecture
Users
|
Route53
|
ALB
|
-------------------
AZ-A AZ-B
-------------------
Frontend Frontend
Backend Backend
-------------------
|
RDS Multi-AZ
|
Backups
|
Monitoring
Learning Objectives
Students will learn:
- High Availability
- Multi-AZ Design
- Auto Scaling
- Blue/Green Deployment
- Disaster Recovery
- Backup Strategy
- Route53
- Production Readiness Reviews
- SLA / SLO / Error Budgets
Part 1
High Availability
Current:
ALB
|
Frontend
|
Backend
Problem:
One container dies
Application unavailable.
Solution
Run multiple tasks.
Frontend:
Desired Tasks = 2
Backend:
Desired Tasks = 2
Architecture:
ALB
|
|------ Frontend 1
|
|------ Frontend 2
|
|------ Backend 1
|
|------ Backend 2
Enterprise Concept
Never deploy:
1 container
for production.
Minimum:
2 containers
across multiple AZs.
Exercise
Students stop:
Frontend Task 1
Verify:
Website still works
Part 2
Multi-AZ Deployment
Current:
One Availability Zone
Problem:
AWS AZ outage.
Entire application unavailable.
Solution
Deploy ECS into:
us-east-1a
us-east-1b
ALB:
AZ-A
AZ-B
Frontend:
AZ-A
AZ-B
Backend:
AZ-A
AZ-B
Enterprise Rule
Production workloads should span:
Multiple Availability Zones
Exercise
Students diagram:
AZ-A Failure
and explain:
Why application survives
Part 3
Auto Scaling
Business Problem
Traffic:
100 users
becomes:
10,000 users
One backend container cannot handle load.
ECS Auto Scaling
Create policy:
CPU > 70%
Scale:
2 Tasks -> 4 Tasks
Example
Current:
Backend x2
Traffic spike:
Backend x4
Automatically.
Enterprise Concept
Scaling should happen:
Automatically
not:
Engineer manually clicking buttons
Exercise
Generate traffic using:
k6
or
ApacheBench
Observe:
Scaling Event
inside ECS.
Part 4
Route53
Current:
alb-123.us-east-1.elb.amazonaws.com
Not professional.
Create Domain
Example:
studentproject.com
Route53:
A Record
pointing to:
ALB
Result
Users visit:
https://studentproject.com
instead of:
https://alb-123.us-east-1.elb.amazonaws.com
Enterprise Concept
Customers never see AWS resource names.
Part 5
Blue/Green Deployment
Current Deployment
Version 1
Replace with:
Version 2
Risk:
Deployment breaks
Production down.
Blue Environment
Current Production
Green Environment
New Version
Architecture:
ALB
|
Blue
|
Version 1
ALB
|
Green
|
Version 2
Test Green
Verify:
Frontend
Backend
Database
working.
Switch Traffic
100%
moves to Green.
Rollback
If broken:
Traffic back to Blue
within seconds.
Enterprise Concept
Many large companies deploy this way.
Part 6
Database Backups
Question:
Database deleted
Now what?
RDS Automated Backups
Enable:
7 Day Retention
or
30 Day Retention
Create Snapshot
Manual Snapshot:
Pre-Release Backup
before deployment.
Exercise
Student documents:
Restore Procedure
Enterprise Rule
Every deployment should have:
Rollback Plan
Part 7
Disaster Recovery
Scenario:
Entire Region Fails
Example:
us-east-1 unavailable
Discussion
Recovery Options
Backup Restore
Hours
Pilot Light
Minimal Environment
running elsewhere.
Warm Standby
Reduced Environment
already running.
Multi-Region Active
Full Environment
in two regions.
Enterprise Concept
Recovery costs money.
Management chooses:
Cost
vs
Recovery Time
Part 8
SLA / SLO / Error Budget
Students learn:
SLA
Customer contract.
Example:
99.9%
availability.
SLO
Internal goal.
Example:
99.95%
availability.
Error Budget
Allowed downtime.
Example:
43 minutes/month
for 99.9%.
Exercise
Calculate:
99.9%
99.95%
99.99%
allowed downtime.
Part 9
Production Readiness Review
Before deployment students answer:
Architecture
Multi-AZ?
Monitoring
Prometheus?
Grafana?
Alerts?
Security
IAM?
Secrets Manager?
HTTPS?
WAF?
Backups
Snapshots?
Retention?
Scaling
Auto Scaling?
Disaster Recovery
Recovery Plan?
Part 10
Failure Simulation Day
Instructor intentionally breaks:
Scenario 1
Stop:
Backend Task
Students verify:
Application survives
Scenario 2
Deploy broken version.
Students:
Rollback
using Blue/Green.
Scenario 3
Database issue.
Students:
Restore Snapshot
Scenario 4
Traffic spike.
Students verify:
Auto Scaling
triggered.
What DevOps Engineers Must Think About
Junior Engineer:
Can I deploy?
Senior Engineer:
Can I recover?
Student Deliverables
Architecture Diagram
Route53
|
ALB
|
Frontend x2
|
Backend x2
|
RDS Multi-AZ
Auto Scaling
Screenshot:
Scaling Policy
Route53
Domain working.
Blue/Green
Deployment demo.
Backup
Snapshot screenshot.
Disaster Recovery
Written recovery plan.
Production Readiness Report
Students submit:
Architecture
Security
Monitoring
Scaling
Backups
Recovery
Risks
Final Result After Labs 1–9
Students have built:
GitHub
|
GitHub Actions
|
Terraform
|
Docker
|
ECR
|
ECS Fargate
|
ALB
|
Route53
|
Frontend
|
Backend
|
RDS
|
CloudWatch
Prometheus
Grafana
|
Secrets Manager
IAM
WAF
TLS
|
Auto Scaling
Blue/Green
Backups
Disaster Recovery
This sequence gives students something most bootcamps miss:
They don't just learn Docker, Terraform, AWS, and GitHub separately—they see exactly how a developer's code becomes a production application and how DevOps engineers keep it running, secure, scalable, and recoverable.
Top comments (0)