DEV Community

Aisalkyn Aidarova
Aisalkyn Aidarova

Posted on

project #4 Complete DevOps Engineering Project

End-to-End Application Deployment Using GitHub, Docker, Terraform, AWS, ECS, CI/CD, Monitoring, and Security

When you used GitHub Pages (github.io), you did not deploy containers, Kubernetes, or ECS.

The workflow was much simpler:

Student
   |
   V
GitHub Repository
   |
   V
GitHub Pages
   |
   V
Public Website
Enter fullscreen mode Exit fullscreen mode

What happened:

  1. Student created website files
index.html
style.css
app.js
Enter fullscreen mode Exit fullscreen mode
  1. Student pushed code to GitHub.
git add .
git commit -m "website"
git push
Enter fullscreen mode Exit fullscreen mode
  1. GitHub Pages detected the files.

  2. GitHub Pages copied the static files to GitHub's web hosting infrastructure.

  3. GitHub served those files directly to visitors.

When someone visits:

https://username.github.io
Enter fullscreen mode Exit fullscreen mode

GitHub simply returns:

index.html
style.css
app.js
Enter fullscreen mode Exit fullscreen mode

to the browser.


Is Docker involved?

Possibly internally inside GitHub's infrastructure, but your students do not use Docker.

Students never create:

docker build
docker run
Enter fullscreen mode Exit fullscreen mode

Is Kubernetes involved?

Possibly internally inside GitHub's infrastructure, but students do not use Kubernetes.

Students never create:

kubectl apply
kubectl get pods
Enter fullscreen mode Exit fullscreen mode

What is GitHub Pages good for?

Static websites:

HTML
CSS
JavaScript
Images
Enter fullscreen mode Exit fullscreen mode

Examples:

  • Portfolio
  • School website
  • Documentation
  • Landing page

What GitHub Pages cannot do

It cannot run:

NodeJS
Python
Java
.NET
Databases
APIs
Enter fullscreen mode Exit fullscreen mode

For example:

Frontend
Backend API
PostgreSQL
Enter fullscreen mode Exit fullscreen mode

cannot run on GitHub Pages.


If you want to learn real DevOps

The next step after GitHub Pages is:

GitHub
   |
GitHub Actions/Jenkins
   |
Docker Build
   |
ECR
   |
ECS Fargate
   |
ALB
   |
Route53
Enter fullscreen mode Exit fullscreen mode

Then you will see:

  • Where containers come from
  • Why Docker is needed
  • Why ECS/Kubernetes is needed
  • How load balancers work
  • How production deployments happen

actual lab starts here:

DevOps Lab 1: Containerize Your GitHub Website with Docker

Goal

a few days ago, each of you created a website using Jules, merged the code into GitHub, and viewed it using GitHub Pages.

Today, we will take the same website source code and put it inside a Docker container.

This is the first step from a simple website to real production deployment.


Part 1: What We Are Building

Current workflow

Developer writes code
        |
        V
GitHub Repository
        |
        V
GitHub Pages
        |
        V
Website in browser
Enter fullscreen mode Exit fullscreen mode

GitHub Pages is good for simple static websites.

But in real companies, applications are usually deployed like this:

Developer writes code
        |
        V
GitHub Repository
        |
        V
Docker Image
        |
        V
Container
        |
        V
Cloud Platform
        |
        V
Production Website
Enter fullscreen mode Exit fullscreen mode

Today we are doing this part:

GitHub Repository
        |
        V
Docker Image
        |
        V
Docker Container
        |
        V
Website in browser
Enter fullscreen mode Exit fullscreen mode

Part 2: Why DevOps Engineers Use Docker

A DevOps engineer does not only write code or test code. A DevOps engineer helps move application code from developer laptop to production safely and repeatedly.

Docker helps us package the application.

Without Docker, we may have this problem:

It works on my laptop, but it does not work on the server.
Enter fullscreen mode Exit fullscreen mode

With Docker, we package:

Application code
Web server
Runtime environment
Configuration
Enter fullscreen mode Exit fullscreen mode

into one Docker image.

Then the same image can run on:

Student laptop
EC2
ECS
Kubernetes
Production server
Enter fullscreen mode Exit fullscreen mode

This is why enterprise companies use Docker.


Part 3: What Goes Inside Docker?

For this lab, most students have a simple static website.

Example repository:

my-website/
├── index.html
├── style.css
├── script.js
├── images/
└── README.md
Enter fullscreen mode Exit fullscreen mode

Inside Docker, we need the files required to run the website:

index.html
style.css
script.js
images/
Enter fullscreen mode Exit fullscreen mode

We do not need to copy unnecessary files like:

.git/
README.md
notes.txt
Enter fullscreen mode Exit fullscreen mode

But for beginner practice, copying the full project is acceptable.


Part 4: Important Note About Jules Agent

Some students created the website with Jules.

Jules helped generate the code, but Jules does not need to go inside the Docker container.

Docker only needs the final website files.

If the repository has files like:

index.html
style.css
script.js
Enter fullscreen mode Exit fullscreen mode

then we containerize those files.

If the repository has files like:

package.json
src/
app/
next.config.js
Enter fullscreen mode Exit fullscreen mode

then it may be a React or Next.js application, and the Dockerfile will be different.

For today, we will start with the simple static website version.


Part 5: Prerequisites

Each student must have:

  1. Git installed
  2. Docker Desktop installed and running
  3. GitHub repository with their website
  4. VS Code installed
  5. Terminal access

Check Docker:

docker --version
Enter fullscreen mode Exit fullscreen mode

Check Git:

git --version
Enter fullscreen mode Exit fullscreen mode

Part 6: Clone Your GitHub Repository

Go to GitHub.

Open your website repository.

Click:

Code → HTTPS → Copy URL
Enter fullscreen mode Exit fullscreen mode

Example:

https://github.com/username/my-website.git
Enter fullscreen mode Exit fullscreen mode

Now open terminal.

Run:

cd Desktop
Enter fullscreen mode Exit fullscreen mode

Clone your repository:

git clone https://github.com/username/my-website.git
Enter fullscreen mode Exit fullscreen mode

Go inside the project folder:

cd my-website
Enter fullscreen mode Exit fullscreen mode

Check files:

ls
Enter fullscreen mode Exit fullscreen mode

You should see something like:

index.html
style.css
script.js
Enter fullscreen mode Exit fullscreen mode

Part 7: Open Project in VS Code

Run:

code .
Enter fullscreen mode Exit fullscreen mode

If code . does not work, open VS Code manually and open the project folder.


Part 8: Create Dockerfile

Inside the root of your project, create a new file named:

Dockerfile
Enter fullscreen mode Exit fullscreen mode

Important:

The file name must be exactly:

Dockerfile
Enter fullscreen mode Exit fullscreen mode

Not:

dockerfile
Dockerfile.txt
docker-file
Enter fullscreen mode Exit fullscreen mode

Add this content:

FROM nginx:alpine

COPY . /usr/share/nginx/html

EXPOSE 80
Enter fullscreen mode Exit fullscreen mode

Explanation:

FROM nginx:alpine
Enter fullscreen mode Exit fullscreen mode

This means we are using Nginx as the web server.

Nginx will serve our website files to the browser.

COPY . /usr/share/nginx/html
Enter fullscreen mode Exit fullscreen mode

This copies our website files from the current folder into the Nginx web folder inside the container.

EXPOSE 80
Enter fullscreen mode Exit fullscreen mode

This documents that the container listens on port 80.

Port 80 is the default HTTP web port.

Docker’s COPY instruction is used to copy files into an image, and this is the key step that places the website source files into the container image.


Part 9: Create .dockerignore

Create another file:

.dockerignore
Enter fullscreen mode Exit fullscreen mode

Add this:

.git
.github
README.md
node_modules
.DS_Store
Enter fullscreen mode Exit fullscreen mode

Why do we need .dockerignore?

It prevents unnecessary files from going into the Docker image.

In enterprise companies, this is important because Docker images should be:

Small
Clean
Secure
Fast to build
Easy to scan
Enter fullscreen mode Exit fullscreen mode

Do not put secrets inside Docker images.

Never put these inside Docker:

AWS keys
Passwords
.env files
Private tokens
SSH keys
Enter fullscreen mode Exit fullscreen mode

Part 10: Build Docker Image

In terminal, make sure you are inside your project folder.

Run:

pwd
Enter fullscreen mode Exit fullscreen mode

Then build the Docker image:

docker build -t my-website:v1 .
Enter fullscreen mode Exit fullscreen mode

Explanation:

docker build
Enter fullscreen mode Exit fullscreen mode

Builds a Docker image.

-t my-website:v1
Enter fullscreen mode Exit fullscreen mode

Gives the image a name and tag.

.
Enter fullscreen mode Exit fullscreen mode

Means Docker should use the current folder as the build context.

Check image:

docker images
Enter fullscreen mode Exit fullscreen mode

You should see:

my-website    v1
Enter fullscreen mode Exit fullscreen mode

Part 11: Run Docker Container

Run:

docker run -d -p 8080:80 --name my-website-container my-website:v1
Enter fullscreen mode Exit fullscreen mode

Explanation:

-d
Enter fullscreen mode Exit fullscreen mode

Run container in the background.

-p 8080:80
Enter fullscreen mode Exit fullscreen mode

Maps your laptop port 8080 to container port 80.

--name my-website-container
Enter fullscreen mode Exit fullscreen mode

Gives the container a readable name.

my-website:v1
Enter fullscreen mode Exit fullscreen mode

The image we created.


Part 12: Open Website in Browser

Open:

http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

You should see your website.

This means:

Your GitHub source code
        |
        V
Docker image
        |
        V
Docker container
        |
        V
Browser
Enter fullscreen mode Exit fullscreen mode

Part 13: Check Running Containers

Run:

docker ps
Enter fullscreen mode Exit fullscreen mode

You should see your running container.

Example:

CONTAINER ID   IMAGE           PORTS
abc123         my-website:v1   0.0.0.0:8080->80/tcp
Enter fullscreen mode Exit fullscreen mode

Part 14: Stop Container

Run:

docker stop my-website-container
Enter fullscreen mode Exit fullscreen mode

Check again:

docker ps
Enter fullscreen mode Exit fullscreen mode

The container should not appear.


Part 15: Start Container Again

Run:

docker start my-website-container
Enter fullscreen mode Exit fullscreen mode

Open again:

http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Part 16: Remove Container

Stop it first:

docker stop my-website-container
Enter fullscreen mode Exit fullscreen mode

Remove it:

docker rm my-website-container
Enter fullscreen mode Exit fullscreen mode

Part 17: Remove Image

If you want to remove the image:

docker rmi my-website:v1
Enter fullscreen mode Exit fullscreen mode

Only do this after you finish the lab.


Part 18: Common Errors and Fixes

Error 1: Docker is not running

Error:

Cannot connect to the Docker daemon
Enter fullscreen mode Exit fullscreen mode

Fix:

Open Docker Desktop and wait until it says Docker is running.


Error 2: Port already in use

Error:

port is already allocated
Enter fullscreen mode Exit fullscreen mode

Fix:

Use another port:

docker run -d -p 8081:80 --name my-website-container my-website:v1
Enter fullscreen mode Exit fullscreen mode

Open:

http://localhost:8081
Enter fullscreen mode Exit fullscreen mode

Error 3: Container name already exists

Error:

container name is already in use
Enter fullscreen mode Exit fullscreen mode

Fix:

Remove old container:

docker rm my-website-container
Enter fullscreen mode Exit fullscreen mode

If it is running:

docker stop my-website-container
docker rm my-website-container
Enter fullscreen mode Exit fullscreen mode

Error 4: Website does not show correctly

Check if index.html is in the root folder.

Correct:

my-website/
├── index.html
├── style.css
└── script.js
Enter fullscreen mode Exit fullscreen mode

Possible problem:

my-website/
└── website/
    ├── index.html
    ├── style.css
    └── script.js
Enter fullscreen mode Exit fullscreen mode

If your files are inside a subfolder called website, change Dockerfile:

FROM nginx:alpine

COPY website/ /usr/share/nginx/html

EXPOSE 80
Enter fullscreen mode Exit fullscreen mode

Part 19: Push Dockerfile to GitHub

Now save the Dockerfile and .dockerignore in your repository.

Run:

git status
Enter fullscreen mode Exit fullscreen mode

You should see:

Dockerfile
.dockerignore
Enter fullscreen mode Exit fullscreen mode

Add files:

git add Dockerfile .dockerignore
Enter fullscreen mode Exit fullscreen mode

Commit:

git commit -m "Add Dockerfile for website containerization"
Enter fullscreen mode Exit fullscreen mode

Push:

git push
Enter fullscreen mode Exit fullscreen mode

Now your GitHub repository contains:

Website source code
Dockerfile
.dockerignore
Enter fullscreen mode Exit fullscreen mode

This means another DevOps engineer can clone your repo and build the same image.


Part 20: Enterprise Explanation

In a real company, developers do not manually copy files to servers.

They follow a workflow:

Developer writes code
        |
        V
Pull Request
        |
        V
Code Review
        |
        V
Merge to main
        |
        V
CI/CD Pipeline
        |
        V
Docker Image Build
        |
        V
Image Scan
        |
        V
Push to Registry
        |
        V
Deploy to ECS or Kubernetes
        |
        V
Monitor Logs and Metrics
Enter fullscreen mode Exit fullscreen mode

A DevOps engineer is responsible for making this process:

Automated
Repeatable
Secure
Observable
Reliable
Scalable
Enter fullscreen mode Exit fullscreen mode

Part 21: What Students Must Pay Attention To

1. Repository structure

Make sure the application files are organized.

Bad:

my-website/
├── final-final-index.html
├── copy-style.css
├── old-script.js
Enter fullscreen mode Exit fullscreen mode

Good:

my-website/
├── index.html
├── style.css
├── script.js
├── images/
├── Dockerfile
└── .dockerignore
Enter fullscreen mode Exit fullscreen mode

2. Do not expose secrets

Never commit:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
OpenAI API key
Database password
Private key
Enter fullscreen mode Exit fullscreen mode

Secrets must be stored in:

AWS Secrets Manager
GitHub Secrets
SSM Parameter Store
Kubernetes Secrets
Enter fullscreen mode Exit fullscreen mode

3. Use tags properly

Bad:

docker build -t my-website .
Enter fullscreen mode Exit fullscreen mode

Better:

docker build -t my-website:v1 .
Enter fullscreen mode Exit fullscreen mode

Best in enterprise:

docker build -t my-website:git-commit-id .
Enter fullscreen mode Exit fullscreen mode

Why?

Because in production, we need to know exactly which version is deployed.


4. Keep images small

Use:

FROM nginx:alpine
Enter fullscreen mode Exit fullscreen mode

instead of:

FROM nginx:latest
Enter fullscreen mode Exit fullscreen mode

alpine images are usually smaller.

Smaller images are:

Faster to build
Faster to push
Faster to pull
Easier to scan
Enter fullscreen mode Exit fullscreen mode

5. Test locally before pushing

Before sending the image to cloud, always run locally:

docker build -t my-website:v1 .
docker run -d -p 8080:80 my-website:v1
Enter fullscreen mode Exit fullscreen mode

If it does not work locally, it will not magically work in AWS.


6. Understand the port

Inside container:

80
Enter fullscreen mode Exit fullscreen mode

On laptop:

8080
Enter fullscreen mode Exit fullscreen mode

This command:

docker run -p 8080:80 my-website:v1
Enter fullscreen mode Exit fullscreen mode

means:

Laptop port 8080 → Container port 80
Enter fullscreen mode Exit fullscreen mode

7. Understand image vs container

Docker image:

Template / package / blueprint
Enter fullscreen mode Exit fullscreen mode

Docker container:

Running instance of the image
Enter fullscreen mode Exit fullscreen mode

Example:

Image = class
Container = object
Enter fullscreen mode Exit fullscreen mode

or:

Image = cake recipe
Container = actual cake
Enter fullscreen mode Exit fullscreen mode

Part 22: Student Deliverables

Each student must submit:

  1. GitHub repository link
  2. Screenshot of Dockerfile
  3. Screenshot of successful docker build
  4. Screenshot of docker ps
  5. Screenshot of website running at:
http://localhost:8080
Enter fullscreen mode Exit fullscreen mode
  1. Short explanation:
What is Docker?
What is Docker image?
What is Docker container?
Why do DevOps engineers use Docker?
Enter fullscreen mode Exit fullscreen mode

Part 23: Final Architecture After This Lab

Student GitHub Repo
        |
        V
Dockerfile
        |
        V
Docker Image
        |
        V
Docker Container
        |
        V
Website running locally
Enter fullscreen mode Exit fullscreen mode

This is the foundation for the next lab.


Part 24: Next Lab Preview

Next lab will be:

Docker Image
        |
        V
Amazon ECR
        |
        V
Amazon ECS Fargate
        |
        V
Application Load Balancer
        |
        V
Public Production Website
Enter fullscreen mode Exit fullscreen mode

Amazon ECR is used to store Docker images in AWS, and before pushing an image, Docker must authenticate to the target ECR registry. AWS notes that ECR authentication tokens are temporary and valid for 12 hours.

In ECS, a task definition works like the blueprint for the application. It tells ECS which Docker image to use, how much CPU and memory to allocate, and how the container should run.

An Application Load Balancer can be used with ECS services to distribute traffic across running tasks, which is important in production when more than one container is running.

DevOps Lab 2: Deploy Dockerized Website to AWS ECS Fargate

Goal

In Lab 1, you:

GitHub Repository
        |
        V
Docker Image
        |
        V
Docker Container
        |
        V
Website running on localhost
Enter fullscreen mode Exit fullscreen mode

In this lab, we move the application to AWS.

At the end of this lab:

GitHub Repository
        |
        V
Docker Image
        |
        V
Amazon ECR
        |
        V
Amazon ECS Fargate
        |
        V
Application Load Balancer
        |
        V
Public Website
Enter fullscreen mode Exit fullscreen mode

You will access your website using a public AWS URL.


Learning Objectives

Students will learn:

  • What ECR is
  • What ECS is
  • What Fargate is
  • What a Task Definition is
  • What a Service is
  • What a Load Balancer is
  • How production applications are deployed

Enterprise Perspective

Many students think:

Docker = Production
Enter fullscreen mode Exit fullscreen mode

Wrong.

Docker only creates the package.

Production requires:

Docker Image
+
Image Registry
+
Container Orchestration
+
Networking
+
Load Balancer
+
Monitoring
Enter fullscreen mode Exit fullscreen mode

This is where ECS enters.


Step 1: Architecture Overview

Today we build:

Browser
   |
   V
Application Load Balancer
   |
   V
ECS Service
   |
   V
ECS Task
   |
   V
Docker Container
   |
   V
Website
Enter fullscreen mode Exit fullscreen mode

Step 2: Create AWS ECR Repository

Search:

Elastic Container Registry
Enter fullscreen mode Exit fullscreen mode

Open:

Amazon ECR
Enter fullscreen mode Exit fullscreen mode

Click:

Create Repository
Enter fullscreen mode Exit fullscreen mode

Repository name:

student-website
Enter fullscreen mode Exit fullscreen mode

Visibility:

Private
Enter fullscreen mode Exit fullscreen mode

Click:

Create Repository
Enter fullscreen mode Exit fullscreen mode

Why ECR Exists

Without ECR:

Laptop
   |
   V
Container
Enter fullscreen mode Exit fullscreen mode

AWS cannot access your laptop.

We need a central image registry.

Laptop
   |
   V
ECR
   |
   V
ECS
Enter fullscreen mode Exit fullscreen mode

Think of ECR as GitHub for Docker images.


Step 3: Authenticate Docker to AWS

Open CloudShell or Terminal.

Run:

aws configure
Enter fullscreen mode Exit fullscreen mode

Enter:

Access Key
Secret Key
Region
Enter fullscreen mode Exit fullscreen mode

Login:

aws ecr get-login-password \
--region us-east-1 \
| docker login \
--username AWS \
--password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Expected:

Login Succeeded
Enter fullscreen mode Exit fullscreen mode

Step 4: Tag Docker Image

Check image:

docker images
Enter fullscreen mode Exit fullscreen mode

Example:

student-website:v1
Enter fullscreen mode Exit fullscreen mode

Tag image:

docker tag student-website:v1 \
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/student-website:v1
Enter fullscreen mode Exit fullscreen mode

Why Tag?

Locally:

student-website:v1
Enter fullscreen mode Exit fullscreen mode

AWS requires:

ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/student-website:v1
Enter fullscreen mode Exit fullscreen mode

Now AWS knows where the image belongs.


Step 5: Push Image to ECR

Run:

docker push \
ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/student-website:v1
Enter fullscreen mode Exit fullscreen mode

Wait for upload.

Refresh ECR.

You should see:

student-website:v1
Enter fullscreen mode Exit fullscreen mode

Step 6: Create ECS Cluster

Search:

Elastic Container Service
Enter fullscreen mode Exit fullscreen mode

Open:

Amazon ECS
Enter fullscreen mode Exit fullscreen mode

Click:

Create Cluster
Enter fullscreen mode Exit fullscreen mode

Cluster name:

student-cluster
Enter fullscreen mode Exit fullscreen mode

Infrastructure:

AWS Fargate
Enter fullscreen mode Exit fullscreen mode

Click:

Create
Enter fullscreen mode Exit fullscreen mode

Why ECS?

Imagine:

100 containers
Enter fullscreen mode Exit fullscreen mode

Questions:

Which server runs them?
Which one failed?
How many copies?
Who restarts failed containers?
Enter fullscreen mode Exit fullscreen mode

ECS manages all of this.


Step 7: Why Fargate?

Without Fargate:

You manage EC2 servers.
Enter fullscreen mode Exit fullscreen mode

You must:

Patch Linux
Upgrade OS
Replace failed servers
Manage capacity
Enter fullscreen mode Exit fullscreen mode

With Fargate:

AWS manages servers.
Enter fullscreen mode Exit fullscreen mode

You only deploy containers.


Step 8: Create Task Definition

Open:

Task Definitions
Enter fullscreen mode Exit fullscreen mode

Click:

Create New Task Definition
Enter fullscreen mode Exit fullscreen mode

Launch Type:

Fargate
Enter fullscreen mode Exit fullscreen mode

Task name:

student-task
Enter fullscreen mode Exit fullscreen mode

CPU:

0.5 vCPU
Enter fullscreen mode Exit fullscreen mode

Memory:

1 GB
Enter fullscreen mode Exit fullscreen mode

Container Section

Container Name:

website
Enter fullscreen mode Exit fullscreen mode

Image URI:

ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/student-website:v1
Enter fullscreen mode Exit fullscreen mode

Container Port:

80
Enter fullscreen mode Exit fullscreen mode

Click:

Create
Enter fullscreen mode Exit fullscreen mode

Why Task Definitions Matter

Task Definition is:

Docker Image
CPU
Memory
Port
Environment Variables
Enter fullscreen mode Exit fullscreen mode

combined together.

Think:

Task Definition = Blueprint
Enter fullscreen mode Exit fullscreen mode

Step 9: Create Service

Open cluster.

Click:

Create Service
Enter fullscreen mode Exit fullscreen mode

Choose:

student-task
Enter fullscreen mode Exit fullscreen mode

Launch Type:

Fargate
Enter fullscreen mode Exit fullscreen mode

Desired Tasks:

1
Enter fullscreen mode Exit fullscreen mode

What Is a Service?

Task:

One running container
Enter fullscreen mode Exit fullscreen mode

Service:

Keeps task alive forever
Enter fullscreen mode Exit fullscreen mode

If task crashes:

ECS starts another one
Enter fullscreen mode Exit fullscreen mode

Automatically.


Step 10: Networking

Choose:

Default VPC
Enter fullscreen mode Exit fullscreen mode

Subnets:

Select all public subnets
Enter fullscreen mode Exit fullscreen mode

Assign Public IP:

Enabled
Enter fullscreen mode Exit fullscreen mode

Security Group

Create:

student-web-sg
Enter fullscreen mode Exit fullscreen mode

Inbound:

HTTP 80
Anywhere
Enter fullscreen mode Exit fullscreen mode

Why Security Groups Matter

Security Groups are AWS firewalls.

Bad:

All ports open
Enter fullscreen mode Exit fullscreen mode

Good:

Only required ports
Enter fullscreen mode Exit fullscreen mode

Enterprise rule:

Least Privilege
Enter fullscreen mode Exit fullscreen mode

Step 11: Create Application Load Balancer

Name:

student-alb
Enter fullscreen mode Exit fullscreen mode

Scheme:

Internet-facing
Enter fullscreen mode Exit fullscreen mode

Listener:

HTTP 80
Enter fullscreen mode Exit fullscreen mode

Target Type:

IP
Enter fullscreen mode Exit fullscreen mode

Why ALB Exists

Without ALB:

User
   |
Container
Enter fullscreen mode Exit fullscreen mode

One container failure = outage.

With ALB:

User
   |
ALB
   |
Multiple Containers
Enter fullscreen mode Exit fullscreen mode

Traffic distributes automatically.


Step 12: Connect Service to ALB

Target Group:

student-target-group
Enter fullscreen mode Exit fullscreen mode

Container:

website
Enter fullscreen mode Exit fullscreen mode

Container Port:

80
Enter fullscreen mode Exit fullscreen mode

Create Service.

Wait:

2-5 minutes
Enter fullscreen mode Exit fullscreen mode

Step 13: Verify Deployment

Open:

ECS Cluster
Enter fullscreen mode Exit fullscreen mode

Check:

Tasks = Running
Enter fullscreen mode Exit fullscreen mode

Status:

Healthy
Enter fullscreen mode Exit fullscreen mode

Step 14: Test Website

Open:

Load Balancer DNS Name
Enter fullscreen mode Exit fullscreen mode

Example:

http://student-alb-123456.us-east-1.elb.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Website should load.

Congratulations.

Your website is now running in AWS.


What Just Happened?

You moved from:

Laptop
Enter fullscreen mode Exit fullscreen mode

to:

AWS Production Environment
Enter fullscreen mode Exit fullscreen mode

Architecture:

GitHub
   |
Docker Build
   |
ECR
   |
ECS Task Definition
   |
ECS Service
   |
ALB
   |
Users
Enter fullscreen mode Exit fullscreen mode

DevOps Engineer Responsibilities

When deploying production applications:

Verify Image

Make sure:

Correct version
Correct tag
No vulnerabilities
Enter fullscreen mode Exit fullscreen mode

Verify Networking

Check:

Security Groups
Subnets
Ports
Enter fullscreen mode Exit fullscreen mode

Verify Health Checks

Make sure:

Container starts
Container stays healthy
Enter fullscreen mode Exit fullscreen mode

Verify Logs

Check:

CloudWatch Logs
Enter fullscreen mode Exit fullscreen mode

Verify Scaling

Can application survive:

10 users?
100 users?
1000 users?
Enter fullscreen mode Exit fullscreen mode

Common Production Issues

Wrong Port

Container:

80
Enter fullscreen mode Exit fullscreen mode

ALB:

8080
Enter fullscreen mode Exit fullscreen mode

Result:

Application unreachable
Enter fullscreen mode Exit fullscreen mode

Wrong Image

Task Definition:

v1
Enter fullscreen mode Exit fullscreen mode

Expected:

v2
Enter fullscreen mode Exit fullscreen mode

Result:

Old application deployed
Enter fullscreen mode Exit fullscreen mode

Security Group Blocked

No inbound:

80
Enter fullscreen mode Exit fullscreen mode

Result:

Website inaccessible
Enter fullscreen mode Exit fullscreen mode

Health Check Failure

ALB health check:

/
Enter fullscreen mode Exit fullscreen mode

Container returns:

500
Enter fullscreen mode Exit fullscreen mode

Result:

Target unhealthy
Enter fullscreen mode Exit fullscreen mode

Student Deliverables

Each student submits:

  1. ECR Repository screenshot
  2. Successful Docker Push screenshot
  3. ECS Cluster screenshot
  4. ECS Service screenshot
  5. Running Task screenshot
  6. ALB screenshot
  7. Website URL
  8. Screenshot of website running from ALB DNS

Final Architecture

Developer
    |
GitHub
    |
Docker Build
    |
Amazon ECR
    |
Amazon ECS Fargate
    |
Application Load Balancer
    |
Internet
    |
Users
Enter fullscreen mode Exit fullscreen mode

This is the same deployment model used by thousands of enterprise applications today.


Lab 3 Preview

In Lab 3 we will automate everything.

Instead of manually:

Build
Push
Deploy
Enter fullscreen mode Exit fullscreen mode

students will create:

GitHub Push
      |
      V
GitHub Actions
      |
      V
Docker Build
      |
      V
ECR Push
      |
      V
ECS Deployment
Enter fullscreen mode Exit fullscreen mode

This is where you will begin learning real CI/CD and how enterprise DevOps teams deploy applications automatically.

This is where students finally start feeling like DevOps engineers.

Lab 1: Website → Docker

Lab 2: Docker → ECS

Lab 3: GitHub Push → Automatic Deployment

The goal is:

Developer changes code
       |
       V
Git Push
       |
       V
GitHub Actions
       |
       V
Docker Build
       |
       V
ECR
       |
       V
ECS Update
       |
       V
Production Updated
Enter fullscreen mode Exit fullscreen mode

Before Lab 3:

Student manually:
docker build
docker push
update ECS
Enter fullscreen mode Exit fullscreen mode

After Lab 3:

git push

Everything else happens automatically
Enter fullscreen mode Exit fullscreen mode

LAB 3

CI/CD Pipeline with GitHub Actions

Business Problem

Imagine 50 developers.

Every day:

Developer A pushes code
Developer B pushes code
Developer C pushes code
Developer D pushes code
Enter fullscreen mode Exit fullscreen mode

Should DevOps manually run:

docker build
docker push
update ecs
Enter fullscreen mode Exit fullscreen mode

50 times per day?

No.

This is why CI/CD exists.


What is CI?

Continuous Integration

Whenever code changes:

Compile
Test
Validate
Enter fullscreen mode Exit fullscreen mode

automatically.


What is CD?

Continuous Delivery / Deployment

Whenever code passes tests:

Build
Deploy
Enter fullscreen mode Exit fullscreen mode

automatically.


Final Architecture

Developer
    |
    V
GitHub
    |
    V
GitHub Actions
    |
    +----------------+
    | Build Image    |
    | Security Scan  |
    | Push To ECR    |
    +----------------+
    |
    V
Amazon ECR
    |
    V
Amazon ECS
    |
    V
Application Load Balancer
    |
    V
Users
Enter fullscreen mode Exit fullscreen mode

Step 1

Create IAM User For GitHub

Search:

IAM
Enter fullscreen mode Exit fullscreen mode

Create user:

github-actions-user
Enter fullscreen mode Exit fullscreen mode

Permissions:

AmazonEC2ContainerRegistryFullAccess

AmazonECS_FullAccess
Enter fullscreen mode Exit fullscreen mode

For bootcamp this is okay.

Later we will reduce permissions.


Why?

GitHub must authenticate to AWS.

GitHub needs:

Access Key
Secret Key
Enter fullscreen mode Exit fullscreen mode

to:

Push images
Update ECS
Enter fullscreen mode Exit fullscreen mode

Step 2

Create Access Keys

IAM User

Create:

Access Key
Secret Key
Enter fullscreen mode Exit fullscreen mode

Save them.

Students will use them in GitHub Secrets.


Step 3

Configure GitHub Secrets

Open Repository.

Settings

Secrets and Variables

Actions

New Repository Secret

Create:

AWS_ACCESS_KEY_ID
Enter fullscreen mode Exit fullscreen mode

Create:

AWS_SECRET_ACCESS_KEY
Enter fullscreen mode Exit fullscreen mode

Create:

AWS_REGION
Enter fullscreen mode Exit fullscreen mode

Example:

us-east-1
Enter fullscreen mode Exit fullscreen mode

Create:

AWS_ACCOUNT_ID
Enter fullscreen mode Exit fullscreen mode

Example:

123456789012
Enter fullscreen mode Exit fullscreen mode

Why Secrets?

Never do this:

AWS_SECRET_ACCESS_KEY: abc123
Enter fullscreen mode Exit fullscreen mode

inside source code.

If somebody steals repository:

AWS account compromised
Enter fullscreen mode Exit fullscreen mode

Secrets protect credentials.


Step 4

Create GitHub Actions Folder

Inside repository:

.github/
└── workflows/
Enter fullscreen mode Exit fullscreen mode

Create:

deploy.yml
Enter fullscreen mode Exit fullscreen mode

Why?

GitHub automatically reads:

.github/workflows
Enter fullscreen mode Exit fullscreen mode

and executes workflows.


Step 5

Create Workflow

deploy.yml

name: Deploy Website

on:
  push:
    branches:
      - main

jobs:
  deploy:

    runs-on: ubuntu-latest

    steps:

      - name: Checkout
        uses: actions/checkout@v4

      - name: Configure AWS
        uses: aws-actions/configure-aws-credentials@v4

        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}

          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

          aws-region: ${{ secrets.AWS_REGION }}

      - name: Login ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build Image

        run: |

          docker build -t website .

          docker tag website:latest \
          ${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com/student-website:latest

      - name: Push Image

        run: |

          docker push \
          ${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.${{ secrets.AWS_REGION }}.amazonaws.com/student-website:latest
Enter fullscreen mode Exit fullscreen mode

What Does This Pipeline Do?

When student runs:

git push
Enter fullscreen mode Exit fullscreen mode

GitHub:

Downloads code
Builds Docker image
Logs into AWS
Pushes image to ECR
Enter fullscreen mode Exit fullscreen mode

automatically.


Step 6

Commit Workflow

git add .

git commit -m "Add CI/CD"

git push
Enter fullscreen mode Exit fullscreen mode

Step 7

Watch Pipeline

GitHub

Actions

Students will see:

Workflow Running
Enter fullscreen mode Exit fullscreen mode

Then:

Success
Enter fullscreen mode Exit fullscreen mode

What Just Happened?

Nobody manually executed:

docker build
docker push
Enter fullscreen mode Exit fullscreen mode

GitHub did it.

This is CI/CD.


Enterprise Discussion

Students should understand:

A DevOps engineer is NOT paid to click buttons.

A DevOps engineer is paid to automate.

Bad DevOps:

Build manually
Deploy manually
Enter fullscreen mode Exit fullscreen mode

Good DevOps:

Push code

Everything automatic
Enter fullscreen mode Exit fullscreen mode

Step 8

Deploy New Version

Modify:

<h1>Version 2</h1>
Enter fullscreen mode Exit fullscreen mode

Commit:

git add .

git commit -m "version2"

git push
Enter fullscreen mode Exit fullscreen mode

Pipeline runs.

New image created.


Enterprise Problem

ECR now contains:

latest
Enter fullscreen mode Exit fullscreen mode

But ECS still runs:

old container
Enter fullscreen mode Exit fullscreen mode

Why?

ECS only starts new containers when told.


Step 9

Force New Deployment

Pipeline:

- name: Deploy ECS

  run: |

    aws ecs update-service \
      --cluster student-cluster \
      --service student-service \
      --force-new-deployment
Enter fullscreen mode Exit fullscreen mode

Now pipeline:

Build Image
Push Image
Restart ECS Service
Enter fullscreen mode Exit fullscreen mode

What Happens?

ECS:

Old Task
      ↓
New Task
      ↓
Pull Latest Image
      ↓
Run New Version
Enter fullscreen mode Exit fullscreen mode

Production Deployment Flow

Student changes:

index.html
Enter fullscreen mode Exit fullscreen mode

Pushes:

git push
Enter fullscreen mode Exit fullscreen mode

Automatically:

GitHub Actions
      |
Build Docker
      |
Push ECR
      |
Restart ECS
      |
Pull Latest Image
      |
Deploy
Enter fullscreen mode Exit fullscreen mode

No AWS Console.

No manual work.


What DevOps Engineers Monitor

Students must learn:

Pipeline Status

Success
Failed
Enter fullscreen mode Exit fullscreen mode

Build Logs

Docker build errors
Enter fullscreen mode Exit fullscreen mode

AWS Authentication

Credential failures
Enter fullscreen mode Exit fullscreen mode

ECS Deployment

New task healthy?
Enter fullscreen mode Exit fullscreen mode

Website

Application accessible?
Enter fullscreen mode Exit fullscreen mode

Common Failures

Wrong Secret

Invalid AWS credentials
Enter fullscreen mode Exit fullscreen mode

Pipeline fails.


Dockerfile Broken

docker build failed
Enter fullscreen mode Exit fullscreen mode

Pipeline fails.


ECR Permission Missing

access denied
Enter fullscreen mode Exit fullscreen mode

Pipeline fails.


ECS Service Name Wrong

service not found
Enter fullscreen mode Exit fullscreen mode

Deployment fails.


Student Deliverables

Each student submits:

  1. GitHub Actions workflow file
  2. Successful workflow screenshot
  3. ECR image screenshot
  4. ECS deployment screenshot
  5. Production URL
  6. Screenshot showing Version 2 deployed automatically

End Result

Students now understand:

Developer
     |
GitHub
     |
GitHub Actions
     |
Docker
     |
ECR
     |
ECS
     |
Load Balancer
     |
Users
Enter fullscreen mode Exit fullscreen mode

This is the first complete production-grade CI/CD pipeline.

Lab 4 should be Terraform, where students stop creating ECR, ECS, ALB, Security Groups, and networking manually and instead create the entire AWS infrastructure from code. That is usually the point where students start thinking like infrastructure engineers rather than application deployers.

Lab 4 is where students stop being "people who deploy applications" and start becoming Infrastructure Engineers / DevOps Engineers.

Up to now:

Lab 1

Website -> Docker
Enter fullscreen mode Exit fullscreen mode

Lab 2

Docker -> ECS
Enter fullscreen mode Exit fullscreen mode

Lab 3

GitHub -> CI/CD -> ECS
Enter fullscreen mode Exit fullscreen mode

But there is a huge problem.

Imagine the company says:

We need 50 environments.

Dev
QA
UAT
Stage
Production
Enter fullscreen mode Exit fullscreen mode

Will DevOps engineers manually click:

Create VPC
Create ALB
Create ECS
Create Security Groups
Create Target Groups
Create ECR
Enter fullscreen mode Exit fullscreen mode

50 times?

No.

This is why Terraform exists.


LAB 4

Infrastructure as Code (Terraform)

Goal

Current situation:

Developer
    |
GitHub
    |
GitHub Actions
    |
ECR
    |
ECS
    |
ALB
Enter fullscreen mode Exit fullscreen mode

But everything was created manually.

Goal:

Terraform
    |
    +-- VPC
    +-- Subnets
    +-- Security Groups
    +-- ALB
    +-- ECS
    +-- ECR
Enter fullscreen mode Exit fullscreen mode

Everything created from code.


Enterprise Problem

Imagine:

Production breaks.
Enter fullscreen mode Exit fullscreen mode

You must rebuild everything.

Without Terraform:

Nobody remembers:
- SG rules
- ECS settings
- ALB config
- Subnets
Enter fullscreen mode Exit fullscreen mode

Disaster.

With Terraform:

terraform apply
Enter fullscreen mode Exit fullscreen mode

Everything recreated.


Architecture

Students will build:

Terraform
    |
    V
VPC
    |
Subnets
    |
ALB
    |
ECS
    |
ECR
Enter fullscreen mode Exit fullscreen mode

Step 1

Create Repository

Create new repo:

terraform-infrastructure
Enter fullscreen mode Exit fullscreen mode

Structure:

terraform/
|
├── provider.tf
├── variables.tf
├── main.tf
├── outputs.tf
├── terraform.tfvars
Enter fullscreen mode Exit fullscreen mode

Why Separate Repo?

Application repo:

Website Code
Enter fullscreen mode Exit fullscreen mode

Infrastructure repo:

AWS Resources
Enter fullscreen mode Exit fullscreen mode

Enterprise companies usually separate them.


Step 2

Install Terraform

Verify:

terraform version
Enter fullscreen mode Exit fullscreen mode

Expected:

Terraform v1.x.x
Enter fullscreen mode Exit fullscreen mode

Step 3

Create Provider

provider.tf

provider "aws" {
 region = "us-east-1"
}
Enter fullscreen mode Exit fullscreen mode

Why Provider?

Terraform supports:

  • AWS
  • Azure
  • GCP
  • GitHub
  • Kubernetes

Provider tells Terraform:

Talk to AWS
Enter fullscreen mode Exit fullscreen mode

Step 4

Configure Credentials

Never hardcode:

access_key="xxxx"
secret_key="xxxx"
Enter fullscreen mode Exit fullscreen mode

Use:

aws configure
Enter fullscreen mode Exit fullscreen mode

Verify:

aws sts get-caller-identity
Enter fullscreen mode Exit fullscreen mode

Step 5

Create ECR

main.tf

resource "aws_ecr_repository" "website" {

  name = "student-website"
}
Enter fullscreen mode Exit fullscreen mode

Why?

Before:

Student manually created ECR.
Enter fullscreen mode Exit fullscreen mode

Now:

Terraform creates ECR.
Enter fullscreen mode Exit fullscreen mode

Step 6

Create VPC

resource "aws_vpc" "main" {

  cidr_block = "10.0.0.0/16"

  tags = {
    Name = "student-vpc"
  }
}
Enter fullscreen mode Exit fullscreen mode

Enterprise Explanation

Everything lives inside a VPC.

Think:

AWS Datacenter
        |
        V
VPC
        |
        V
Your Private Network
Enter fullscreen mode Exit fullscreen mode

Step 7

Create Public Subnets

resource "aws_subnet" "public1" {

 vpc_id = aws_vpc.main.id

 cidr_block = "10.0.1.0/24"

 availability_zone = "us-east-1a"
}

resource "aws_subnet" "public2" {

 vpc_id = aws_vpc.main.id

 cidr_block = "10.0.2.0/24"

 availability_zone = "us-east-1b"
}
Enter fullscreen mode Exit fullscreen mode

Why Two Subnets?

Enterprise applications require:

High Availability
Enter fullscreen mode Exit fullscreen mode

If one AZ dies:

Other AZ survives
Enter fullscreen mode Exit fullscreen mode

Step 8

Create Security Group

resource "aws_security_group" "web" {

 name = "web-sg"

 vpc_id = aws_vpc.main.id

 ingress {

   from_port = 80
   to_port = 80

   protocol = "tcp"

   cidr_blocks = ["0.0.0.0/0"]
 }

 egress {

   from_port = 0
   to_port = 0

   protocol = "-1"

   cidr_blocks = ["0.0.0.0/0"]
 }
}
Enter fullscreen mode Exit fullscreen mode

Why Security Groups?

Security Groups:

AWS Firewall
Enter fullscreen mode Exit fullscreen mode

Protect resources.


Step 9

Create Application Load Balancer

resource "aws_lb" "website" {

 name = "student-alb"

 internal = false

 load_balancer_type = "application"

 security_groups = [
   aws_security_group.web.id
 ]

 subnets = [
   aws_subnet.public1.id,
   aws_subnet.public2.id
 ]
}
Enter fullscreen mode Exit fullscreen mode

Why ALB?

ALB distributes traffic:

Users
  |
ALB
  |
Containers
Enter fullscreen mode Exit fullscreen mode

Step 10

Create ECS Cluster

resource "aws_ecs_cluster" "main" {

 name = "student-cluster"
}
Enter fullscreen mode Exit fullscreen mode

Why ECS Cluster?

Think:

ECS Cluster
Enter fullscreen mode Exit fullscreen mode

as a parking lot.

Containers park inside.


Step 11

Terraform Init

Run:

terraform init
Enter fullscreen mode Exit fullscreen mode

What Happens?

Terraform downloads:

AWS Provider
Plugins
Dependencies
Enter fullscreen mode Exit fullscreen mode

Step 12

Terraform Validate

terraform validate
Enter fullscreen mode Exit fullscreen mode

Expected:

Success
Enter fullscreen mode Exit fullscreen mode

Step 13

Terraform Plan

terraform plan
Enter fullscreen mode Exit fullscreen mode

Example:

+ create VPC
+ create ALB
+ create ECS
+ create ECR
Enter fullscreen mode Exit fullscreen mode

Why Plan?

Plan allows engineers to review:

What will change?
Enter fullscreen mode Exit fullscreen mode

Before touching production.


Step 14

Terraform Apply

terraform apply
Enter fullscreen mode Exit fullscreen mode

Type:

yes
Enter fullscreen mode Exit fullscreen mode

Terraform creates:

VPC
Subnets
Security Group
ALB
ECS
ECR
Enter fullscreen mode Exit fullscreen mode

Enterprise Rule

Never do:

terraform apply -auto-approve
Enter fullscreen mode Exit fullscreen mode

against production.

Always review.


Step 15

Verify

AWS Console:

Check:

VPC
Subnets
Security Groups
ALB
ECS
ECR
Enter fullscreen mode Exit fullscreen mode

Everything should exist.


Step 16

Destroy Environment

This is the coolest part.

Run:

terraform destroy
Enter fullscreen mode Exit fullscreen mode

Confirm:

yes
Enter fullscreen mode Exit fullscreen mode

Terraform removes:

ALB
ECS
ECR
Subnets
VPC
Enter fullscreen mode Exit fullscreen mode

Why Destroy?

Cloud costs money.

DevOps engineers often create:

Temporary environments
Enter fullscreen mode Exit fullscreen mode

for:

Developers
QA
Testing
Training
Enter fullscreen mode Exit fullscreen mode

Destroying saves money.


Important DevOps Concepts

Desired State

Terraform says:

I want:
1 VPC
2 Subnets
1 ECS Cluster
1 ALB
Enter fullscreen mode Exit fullscreen mode

Terraform compares:

Desired State
vs
Current State
Enter fullscreen mode Exit fullscreen mode

and fixes differences.


State File

Terraform creates:

terraform.tfstate
Enter fullscreen mode Exit fullscreen mode

Think:

Database of infrastructure
Enter fullscreen mode Exit fullscreen mode

Never delete it.

Enterprise companies store state in:

Amazon S3

and lock it using:

Amazon DynamoDB


What Students Must Understand

Without Terraform:

Click
Click
Click
Click
Click
Enter fullscreen mode Exit fullscreen mode

No documentation.

No repeatability.

No automation.


With Terraform:

Code
Commit
Review
Apply
Enter fullscreen mode Exit fullscreen mode

Infrastructure becomes:

Version Controlled
Auditable
Repeatable
Recoverable
Enter fullscreen mode Exit fullscreen mode

Student Deliverables

Each student submits:

  1. GitHub Repository with Terraform code
  2. Screenshot of terraform init
  3. Screenshot of terraform plan
  4. Screenshot of terraform apply
  5. Screenshot showing:
  • VPC
  • Subnets
  • Security Group
  • ECS Cluster
  • ALB
  • ECR
    1. Screenshot of terraform destroy

Lab 5 Preview

Lab 5 is where everything becomes enterprise-grade.

Students will create:

Terraform
      |
GitHub
      |
Pull Request
      |
GitHub Actions
      |
Terraform Plan
      |
Manager Approval
      |
Terraform Apply
      |
AWS Infrastructure
Enter fullscreen mode Exit fullscreen mode

At that point they will have:

Infrastructure as Code
+
CI/CD
+
Containerization
+
Cloud Deployment
Enter fullscreen mode Exit fullscreen mode

which is very close to how modern DevOps teams operate in production.

Lab 5 is where students stop being "people who know tools" and start understanding how enterprise DevOps teams actually work.

Up to now they learned:

Lab 1  Website -> Docker
Lab 2  Docker -> ECS
Lab 3  GitHub Actions -> CI/CD
Lab 4  Terraform -> Infrastructure as Code
Enter fullscreen mode Exit fullscreen mode

But there is still a huge problem.

Everything is being done by one person.

Real companies do not work like that.


LAB 5

Enterprise DevOps Workflow

Goal

Students will simulate a real company.

Instead of:

Developer
    |
Production
Enter fullscreen mode Exit fullscreen mode

they will build:

Developer
    |
Pull Request
    |
Code Review
    |
Terraform Plan
    |
Approval
    |
Terraform Apply
    |
Production
Enter fullscreen mode Exit fullscreen mode

Business Problem

Imagine:

A developer accidentally changes:

resource "aws_db_instance"
Enter fullscreen mode Exit fullscreen mode

and deletes production database.

Who should stop this?

Terraform?

No.

GitHub?

No.

DevOps Process.

This is why enterprise companies use:

Pull Requests
Approvals
Code Reviews
Change Control
Enter fullscreen mode Exit fullscreen mode

Architecture

Students will build:

Developer
    |
Feature Branch
    |
Pull Request
    |
GitHub Actions
    |
Terraform Plan
    |
Approval
    |
Merge
    |
Terraform Apply
    |
AWS
Enter fullscreen mode Exit fullscreen mode

Learning Objectives

Students learn:

  • Git workflow
  • Pull Requests
  • Branching strategy
  • Terraform Plan
  • Terraform Apply
  • Approvals
  • Production deployment
  • Change management

Step 1

Create Branch

Never work directly on:

main
Enter fullscreen mode Exit fullscreen mode

Create:

git checkout -b feature/new-alb
Enter fullscreen mode Exit fullscreen mode

Verify:

git branch
Enter fullscreen mode Exit fullscreen mode

Output:

main
* feature/new-alb
Enter fullscreen mode Exit fullscreen mode

Enterprise Rule

Bad:

Developer changes production directly
Enter fullscreen mode Exit fullscreen mode

Good:

Developer
   |
Feature Branch
Enter fullscreen mode Exit fullscreen mode

Step 2

Modify Terraform

Example:

Add new security group.

Example:

resource "aws_security_group" "web" {
 ...
}
Enter fullscreen mode Exit fullscreen mode

Commit:

git add .

git commit -m "Add web security group"
Enter fullscreen mode Exit fullscreen mode

Push:

git push origin feature/new-alb
Enter fullscreen mode Exit fullscreen mode

Step 3

Create Pull Request

GitHub:

Compare & Pull Request
Enter fullscreen mode Exit fullscreen mode

Create PR.

Title:

Add Security Group For Web Layer
Enter fullscreen mode Exit fullscreen mode

Why Pull Requests?

Pull Requests allow:

Review
Discussion
Approval
Audit Trail
Enter fullscreen mode Exit fullscreen mode

Enterprise Example

Developer writes:

cidr_blocks = ["0.0.0.0/0"]
Enter fullscreen mode Exit fullscreen mode

Reviewer asks:

Why open to the internet?
Enter fullscreen mode Exit fullscreen mode

Bug found before production.


Step 4

Terraform Plan in GitHub Actions

Create:

.github/workflows/terraform-plan.yml
Enter fullscreen mode Exit fullscreen mode

Example:

name: Terraform Plan

on:
 pull_request:

jobs:

 plan:

   runs-on: ubuntu-latest

   steps:

   - uses: actions/checkout@v4

   - uses: hashicorp/setup-terraform@v3

   - run: terraform init

   - run: terraform plan
Enter fullscreen mode Exit fullscreen mode

What Happens?

Developer opens PR.

Automatically:

Terraform Init
Terraform Validate
Terraform Plan
Enter fullscreen mode Exit fullscreen mode

runs.


Why?

Before merge:

Everyone sees:

What will Terraform change?
Enter fullscreen mode Exit fullscreen mode

Example Output

+ Create Security Group

+ Create ALB

~ Modify ECS Service
Enter fullscreen mode Exit fullscreen mode

Management can review.


Step 5

Add Validation

Add:

- run: terraform fmt -check

- run: terraform validate
Enter fullscreen mode Exit fullscreen mode

Why?

Checks:

Syntax
Formatting
Errors
Enter fullscreen mode Exit fullscreen mode

before deployment.


Enterprise Rule

Bad:

Merge first
Fix later
Enter fullscreen mode Exit fullscreen mode

Good:

Validate first
Merge later
Enter fullscreen mode Exit fullscreen mode

Step 6

Branch Protection

GitHub

Settings

Branches

Add Rule

Protect:

main
Enter fullscreen mode Exit fullscreen mode

Require:

Pull Request
Review
Successful Checks
Enter fullscreen mode Exit fullscreen mode

Why?

Nobody can directly push:

git push origin main
Enter fullscreen mode Exit fullscreen mode

Enterprise Example

Without protection:

Developer deletes VPC
Pushes code
Production outage
Enter fullscreen mode Exit fullscreen mode

With protection:

Review required
Enter fullscreen mode Exit fullscreen mode

Mistake caught.


Step 7

Reviewer Approval

Student A creates PR.

Student B reviews.

Checklist:

Terraform valid?
Naming correct?
Resources required?
Security issues?
Enter fullscreen mode Exit fullscreen mode

Approve.


Why?

Four eyes principle.

At least two people see changes.


Step 8

Merge PR

After approval:

Merge Pull Request
Enter fullscreen mode Exit fullscreen mode

Now code reaches:

main
Enter fullscreen mode Exit fullscreen mode

Step 9

Production Pipeline

Create:

terraform-apply.yml
Enter fullscreen mode Exit fullscreen mode

Trigger:

on:

 push:

   branches:

     - main
Enter fullscreen mode Exit fullscreen mode

Pipeline:

terraform init

terraform validate

terraform apply
Enter fullscreen mode Exit fullscreen mode

What Happens?

Developer merges.

Automatically:

Terraform Apply
Enter fullscreen mode Exit fullscreen mode

runs.

Infrastructure updates.


Enterprise Workflow

Feature Branch
      |
Pull Request
      |
Terraform Plan
      |
Approval
      |
Merge
      |
Terraform Apply
      |
AWS Updated
Enter fullscreen mode Exit fullscreen mode

Step 10

Remote State

Current:

terraform.tfstate
Enter fullscreen mode Exit fullscreen mode

stored locally.

Dangerous.

Laptop lost:

State lost
Enter fullscreen mode Exit fullscreen mode

Create S3 Backend

terraform {

 backend "s3" {

   bucket = "student-terraform-state"

   key = "prod/terraform.tfstate"

   region = "us-east-1"
 }
}
Enter fullscreen mode Exit fullscreen mode

Why?

State shared by team.


Step 11

State Locking

Create DynamoDB table.

Terraform:

dynamodb_table = "terraform-locks"
Enter fullscreen mode Exit fullscreen mode

Why?

Prevent:

Engineer A
Engineer B
Enter fullscreen mode Exit fullscreen mode

running:

terraform apply
Enter fullscreen mode Exit fullscreen mode

at same time.


Enterprise Problem

Without locking:

Corrupted state
Broken infrastructure
Enter fullscreen mode Exit fullscreen mode

Step 12

Environment Strategy

Current:

One Environment
Enter fullscreen mode Exit fullscreen mode

Enterprise:

Dev
QA
Stage
Production
Enter fullscreen mode Exit fullscreen mode

Example

terraform/dev

terraform/qa

terraform/prod
Enter fullscreen mode Exit fullscreen mode

or

terraform workspace
Enter fullscreen mode Exit fullscreen mode

Why?

Never test directly in production.


Step 13

Approval Gates

Production pipeline:

Terraform Plan
      |
Manager Approval
      |
Terraform Apply
Enter fullscreen mode Exit fullscreen mode

GitHub Environments:

Production
Enter fullscreen mode Exit fullscreen mode

Require:

Manual Approval
Enter fullscreen mode Exit fullscreen mode

before apply.


Enterprise Example

Black Friday.

Someone changes:

Load Balancer
Enter fullscreen mode Exit fullscreen mode

Management wants review.

Approval gate prevents accidents.


DevOps Engineer Responsibilities

Students must understand:

A DevOps engineer is not paid for:

Creating EC2
Creating ECS
Running Terraform
Enter fullscreen mode Exit fullscreen mode

They are paid for:

Preventing outages
Building automation
Reducing risk
Creating repeatable deployments
Enter fullscreen mode Exit fullscreen mode

What Students Must Pay Attention To

Security

Never:

0.0.0.0/0 everywhere
Enter fullscreen mode Exit fullscreen mode

Cost

Always review:

terraform plan
Enter fullscreen mode Exit fullscreen mode

before apply.

Naming

Use:

dev-web-alb

qa-web-alb

prod-web-alb
Enter fullscreen mode Exit fullscreen mode

not:

alb1
alb2
alb3
Enter fullscreen mode Exit fullscreen mode

State

Never manually edit:

terraform.tfstate
Enter fullscreen mode Exit fullscreen mode

Student Deliverables

Each student submits:

Git

  • Feature branch screenshot
  • Pull Request screenshot

GitHub Actions

  • Terraform Plan successful

Review

  • PR approved by another student

Terraform

  • Successful Apply

AWS

Screenshots showing:

VPC
Subnets
ALB
Security Groups
ECS
ECR
Enter fullscreen mode Exit fullscreen mode

Architecture Diagram

Student must draw:

Developer
    |
Feature Branch
    |
Pull Request
    |
GitHub Actions
    |
Terraform Plan
    |
Approval
    |
Terraform Apply
    |
AWS
Enter fullscreen mode Exit fullscreen mode

Lab 6 Preview

Lab 6 is where the application becomes a true enterprise application.

Students will split their website into:

Frontend
Backend API
Database
Enter fullscreen mode Exit fullscreen mode

and deploy:

React Frontend
     |
NodeJS API
     |
PostgreSQL RDS
Enter fullscreen mode Exit fullscreen mode

using:

Terraform
+
GitHub Actions
+
ECS Fargate
+
ALB
+
Route53
Enter fullscreen mode Exit fullscreen mode

This is usually the first lab where students see a complete production architecture similar to what many companies run today.

This is the lab where students finally understand why microservices exist.

Up to Lab 5 they deployed a simple website.

The problem is:

Everything is one application.
Enter fullscreen mode Exit fullscreen mode

In enterprise companies, applications are usually separated.

For example:

Netflix

Frontend
Backend APIs
Authentication
Payments
Recommendations
Database
Monitoring
Enter fullscreen mode Exit fullscreen mode

all separate services.


LAB 6

Transform Website into a Microservices Application

Business Problem

Current architecture:

Browser
   |
Website
Enter fullscreen mode Exit fullscreen mode

Problem:

If one piece fails:

Entire application fails
Enter fullscreen mode Exit fullscreen mode

Problem:

One developer changes website.

Another changes API.

They interfere with each other.

Problem:

Cannot scale independently.


Goal

Convert student website into:

Browser
   |
Frontend
   |
Backend API
   |
Database
Enter fullscreen mode Exit fullscreen mode

What Students Will Build

Architecture:

Internet
    |
Application Load Balancer
    |
Frontend Container
    |
Backend Container
    |
RDS PostgreSQL
Enter fullscreen mode Exit fullscreen mode

Real Enterprise Example

Think about Amazon.

Frontend:

amazon.com
Enter fullscreen mode Exit fullscreen mode

Backend:

Products API
Orders API
Users API
Enter fullscreen mode Exit fullscreen mode

Database:

PostgreSQL
Aurora
DynamoDB
Enter fullscreen mode Exit fullscreen mode

Separate systems.


Step 1

Create New Repositories

Current:

student-website
Enter fullscreen mode Exit fullscreen mode

Split into:

frontend

backend

terraform-infra
Enter fullscreen mode Exit fullscreen mode

Why?

Different teams may own:

Frontend Team

Backend Team

Infrastructure Team
Enter fullscreen mode Exit fullscreen mode

Step 2

Frontend Service

Students keep:

index.html

style.css

app.js
Enter fullscreen mode Exit fullscreen mode

Convert into:

frontend/
|
├── index.html
├── style.css
├── app.js
├── Dockerfile
Enter fullscreen mode Exit fullscreen mode

Container:

FROM nginx:alpine

COPY . /usr/share/nginx/html
Enter fullscreen mode Exit fullscreen mode

Purpose

Frontend should only:

Display information
Collect user input
Call APIs
Enter fullscreen mode Exit fullscreen mode

Step 3

Create Backend API

Folder:

backend/
|
├── app.js
├── package.json
├── Dockerfile
Enter fullscreen mode Exit fullscreen mode

Simple API:

const express = require("express");

const app = express();

app.get("/students",(req,res)=>{

 res.json([
  {
   name:"John"
  }
 ]);

});

app.listen(5000);
Enter fullscreen mode Exit fullscreen mode

Why Backend?

Frontend should not contain business logic.

Bad:

HTML
|
Database
Enter fullscreen mode Exit fullscreen mode

Good:

Frontend
|
Backend API
|
Database
Enter fullscreen mode Exit fullscreen mode

Step 4

Backend Dockerfile

FROM node:20-alpine

WORKDIR /app

COPY . .

RUN npm install

EXPOSE 5000

CMD ["node","app.js"]
Enter fullscreen mode Exit fullscreen mode

What Students Learn

One application now has:

Frontend Image

Backend Image
Enter fullscreen mode Exit fullscreen mode

instead of:

One Giant Image
Enter fullscreen mode Exit fullscreen mode

Step 5

Build Images

Frontend:

docker build -t frontend:v1 .
Enter fullscreen mode Exit fullscreen mode

Backend:

docker build -t backend:v1 .
Enter fullscreen mode Exit fullscreen mode

Verify:

docker images
Enter fullscreen mode Exit fullscreen mode

Step 6

Run Locally

Backend:

docker run -d -p 5000:5000 backend:v1
Enter fullscreen mode Exit fullscreen mode

Frontend:

docker run -d -p 8080:80 frontend:v1
Enter fullscreen mode Exit fullscreen mode

Verify API

Open:

http://localhost:5000/students
Enter fullscreen mode Exit fullscreen mode

Should return:

[
 {
  "name":"John"
 }
]
Enter fullscreen mode Exit fullscreen mode

Step 7

Frontend Calls API

JavaScript:

fetch("http://localhost:5000/students")
Enter fullscreen mode Exit fullscreen mode

Now:

Frontend
     |
Backend
Enter fullscreen mode Exit fullscreen mode

communicate.


Enterprise Concept

This is called:

Service Communication
Enter fullscreen mode Exit fullscreen mode

Every modern application does this.


Step 8

Create RDS Database

Terraform:

resource "aws_db_instance" "postgres" {

 engine = "postgres"

 instance_class = "db.t3.micro"

 allocated_storage = 20

 username = "postgres"

 password = "ChangeMe123!"
}
Enter fullscreen mode Exit fullscreen mode

Why Database?

Current API:

Hardcoded Data
Enter fullscreen mode Exit fullscreen mode

Bad.

Need:

Persistent Storage
Enter fullscreen mode Exit fullscreen mode

Step 9

Store Data in Database

Backend:

GET /students

POST /students

DELETE /students
Enter fullscreen mode Exit fullscreen mode

Data stored in:

PostgreSQL
Enter fullscreen mode Exit fullscreen mode

Enterprise Concept

Application should never store data:

Inside Container
Enter fullscreen mode Exit fullscreen mode

Containers die.

Data must survive.


Step 10

ECS Architecture

Current:

One ECS Service
Enter fullscreen mode Exit fullscreen mode

New:

Frontend ECS Service

Backend ECS Service
Enter fullscreen mode Exit fullscreen mode

Architecture

ALB
|
|--- Frontend Service
|
|--- Backend Service
Enter fullscreen mode Exit fullscreen mode

Step 11

ALB Routing

Frontend:

/
Enter fullscreen mode Exit fullscreen mode

Backend:

/api/*
Enter fullscreen mode Exit fullscreen mode

Example:

/
Enter fullscreen mode Exit fullscreen mode

goes to:

Frontend
Enter fullscreen mode Exit fullscreen mode

Example:

/ api / students
Enter fullscreen mode Exit fullscreen mode

goes to:

Backend
Enter fullscreen mode Exit fullscreen mode

Why?

Users see:

https://website.com
Enter fullscreen mode Exit fullscreen mode

But ALB routes traffic internally.


Step 12

Push Images to ECR

Students now have:

frontend:v1

backend:v1
Enter fullscreen mode Exit fullscreen mode

Push both.

Repositories:

frontend

backend
Enter fullscreen mode Exit fullscreen mode

inside ECR.


Step 13

GitHub Actions

Frontend pipeline:

Build Frontend

Push ECR

Deploy ECS
Enter fullscreen mode Exit fullscreen mode

Backend pipeline:

Build Backend

Push ECR

Deploy ECS
Enter fullscreen mode Exit fullscreen mode

Separate pipelines.


Enterprise Concept

Frontend deployment should not break backend deployment.

Teams deploy independently.


Step 14

Environment Variables

Never hardcode:

password="123"
Enter fullscreen mode Exit fullscreen mode

Use:

DATABASE_HOST

DATABASE_USER

DATABASE_PASSWORD
Enter fullscreen mode Exit fullscreen mode

from ECS environment variables.


Enterprise Security

Even better:

AWS Secrets Manager
Enter fullscreen mode Exit fullscreen mode

Step 15

Monitoring

Add:

CloudWatch Logs
Enter fullscreen mode Exit fullscreen mode

For:

Frontend Logs

Backend Logs
Enter fullscreen mode Exit fullscreen mode

What DevOps Engineers Check

Frontend:

Is website loading?
Enter fullscreen mode Exit fullscreen mode

Backend:

Are APIs responding?
Enter fullscreen mode Exit fullscreen mode

Database:

Is RDS healthy?
Enter fullscreen mode Exit fullscreen mode

Enterprise Scaling Example

Current traffic:

100 users
Enter fullscreen mode Exit fullscreen mode

Need:

5000 users
Enter fullscreen mode Exit fullscreen mode

Scale:

Backend x10
Enter fullscreen mode Exit fullscreen mode

Keep:

Frontend x2
Enter fullscreen mode Exit fullscreen mode

Microservices allow independent scaling.


Common Problems

Frontend Cannot Reach API

Bad:

localhost
Enter fullscreen mode Exit fullscreen mode

inside ECS.

Must use:

Internal ALB

Service Discovery

DNS
Enter fullscreen mode Exit fullscreen mode

Database Security Group

Backend cannot connect.

Need:

5432 allowed
Enter fullscreen mode Exit fullscreen mode

from backend SG.


Hardcoded Secrets

Bad:

GitHub repository
Enter fullscreen mode Exit fullscreen mode

Good:

Secrets Manager
Enter fullscreen mode Exit fullscreen mode

Student Deliverables

Frontend

  • GitHub Repository
  • Dockerfile
  • Running ECS Service

Backend

  • GitHub Repository
  • Dockerfile
  • Running ECS Service

Database

  • RDS Screenshot

Architecture Diagram

Browser
    |
ALB
    |
Frontend ECS
    |
Backend ECS
    |
PostgreSQL RDS
Enter fullscreen mode Exit fullscreen mode

GitHub Actions

  • Frontend pipeline
  • Backend pipeline

What Students Learn

At the end of Lab 6 they understand:

Frontend
Backend
Database
Containers
ECS
ALB
Terraform
GitHub Actions
CI/CD
RDS
Enter fullscreen mode Exit fullscreen mode

This is the first architecture that starts looking like a real enterprise application.


Lab 7 Preview

Lab 7 will introduce:

Prometheus
Grafana
CloudWatch
Loki
Alerting
Enter fullscreen mode Exit fullscreen mode

Students will learn:

How to know something is broken
How to collect metrics
How to troubleshoot production issues
How DevOps engineers monitor applications
Enter fullscreen mode Exit fullscreen mode

Because a production application is not finished when it is deployed. A production application is finished only when it can be monitored, troubleshot, and recovered when something goes wrong.

LAB 7 – Monitoring, Logging, Alerting & Troubleshooting

This is one of the most important DevOps labs.

Many junior engineers think:

Deploy = Job Done
Enter fullscreen mode Exit fullscreen mode

In reality:

Deploy = Beginning of Operations
Enter fullscreen mode Exit fullscreen mode

The first question management asks after deployment is:

"How do we know if the application is healthy?"

If your answer is:

I open browser and check.
Enter fullscreen mode Exit fullscreen mode

You are not operating at enterprise level.


Business Scenario

Your architecture now looks like:

Internet
    |
ALB
    |
Frontend ECS
    |
Backend ECS
    |
RDS PostgreSQL
Enter fullscreen mode Exit fullscreen mode

Imagine:

Website suddenly slow
Enter fullscreen mode Exit fullscreen mode

Management asks:

Why?
Enter fullscreen mode Exit fullscreen mode

Can students answer?

No.

Because they have:

No Metrics
No Logs
No Alerts
Enter fullscreen mode Exit fullscreen mode

This lab fixes that.


Goal

Students will build:

Frontend ECS
        |
Backend ECS
        |
CloudWatch Logs
        |
Prometheus
        |
Grafana
        |
Alerting
Enter fullscreen mode Exit fullscreen mode

Learning Objectives

Students will understand:

  • What monitoring is
  • What logging is
  • What metrics are
  • What alerts are
  • Difference between CloudWatch and Prometheus
  • Difference between logs and metrics
  • Root cause analysis

Enterprise Architecture

Users
   |
ALB
   |
Frontend ECS
   |
Backend ECS
   |
RDS
   |
--------------------
Monitoring Stack
--------------------
CloudWatch
Prometheus
Grafana
Alerting
Enter fullscreen mode Exit fullscreen mode

Part 1

Understanding Logs

Example:

Backend error:

Database Connection Failed
Enter fullscreen mode Exit fullscreen mode

How do we know?

Logs.

Example log:

2026-08-15 10:15:22 ERROR Database Connection Failed
Enter fullscreen mode Exit fullscreen mode

Logs tell:

What happened
When
Where
Enter fullscreen mode Exit fullscreen mode

What DevOps Engineers Use Logs For

Examples:

Application crash
Database failure
Memory issue
Security event
Enter fullscreen mode Exit fullscreen mode

Without logs:

Guessing
Enter fullscreen mode Exit fullscreen mode

With logs:

Evidence
Enter fullscreen mode Exit fullscreen mode

Part 2

CloudWatch Logs

Open ECS Service.

Enable:

CloudWatch Logging
Enter fullscreen mode Exit fullscreen mode

Container definition:

logConfiguration
Enter fullscreen mode Exit fullscreen mode

Example:

{
 "logDriver":"awslogs"
}
Enter fullscreen mode Exit fullscreen mode

Why CloudWatch?

AWS automatically stores:

Container Logs
Application Logs
System Logs
Enter fullscreen mode Exit fullscreen mode

Verify Logs

Open:

CloudWatch
Enter fullscreen mode Exit fullscreen mode

Navigate:

Log Groups
Enter fullscreen mode Exit fullscreen mode

Students should see:

/frontend

/backend
Enter fullscreen mode Exit fullscreen mode

Exercise

Break API intentionally.

Example:

throw new Error("Database Failed");
Enter fullscreen mode Exit fullscreen mode

Deploy.

Find error inside CloudWatch.


Lesson

Production engineers spend enormous amounts of time reading logs.


Part 3

Metrics

Logs tell:

What happened
Enter fullscreen mode Exit fullscreen mode

Metrics tell:

How healthy system is
Enter fullscreen mode Exit fullscreen mode

Examples:

CPU
Memory
Requests
Latency
Errors
Enter fullscreen mode Exit fullscreen mode

Example

Website loads slowly.

Metrics show:

CPU = 98%
Enter fullscreen mode Exit fullscreen mode

Problem found.


Part 4

Install Prometheus

Create ECS Service:

Prometheus
Enter fullscreen mode Exit fullscreen mode

Container image:

prom/prometheus
Enter fullscreen mode Exit fullscreen mode

Port:

9090
Enter fullscreen mode Exit fullscreen mode

What Prometheus Does

Prometheus collects:

CPU
Memory
Requests
Response Time
Errors
Enter fullscreen mode Exit fullscreen mode

every few seconds.

Think:

CloudWatch = AWS monitoring

Prometheus = Application monitoring
Enter fullscreen mode Exit fullscreen mode

Architecture

Backend
   |
Prometheus
Enter fullscreen mode Exit fullscreen mode

Prometheus continuously collects metrics.


Part 5

Expose Application Metrics

Backend application:

Install:

npm install prom-client
Enter fullscreen mode Exit fullscreen mode

Add endpoint:

/app.get("/metrics")
Enter fullscreen mode Exit fullscreen mode

Example:

http://backend:5000/metrics
Enter fullscreen mode Exit fullscreen mode

What Happens?

Prometheus visits:

/metrics
Enter fullscreen mode Exit fullscreen mode

every few seconds.

Collects:

Request Count
Errors
Response Time
Enter fullscreen mode Exit fullscreen mode

Enterprise Concept

This is called:

Instrumentation
Enter fullscreen mode Exit fullscreen mode

Applications must expose metrics.


Part 6

Install Grafana

Create ECS Service:

grafana/grafana
Enter fullscreen mode Exit fullscreen mode

Port:

3000
Enter fullscreen mode Exit fullscreen mode

Open:

http://grafana:3000
Enter fullscreen mode Exit fullscreen mode

Default:

admin
admin
Enter fullscreen mode Exit fullscreen mode

What Grafana Does

Prometheus stores data.

Grafana visualizes data.

Example:

CPU Usage Chart
Memory Usage Chart
Request Count
Error Count
Enter fullscreen mode Exit fullscreen mode

Architecture

Application
     |
Prometheus
     |
Grafana
Enter fullscreen mode Exit fullscreen mode

Exercise

Create dashboard:

CPU
Memory
Requests
Errors
Enter fullscreen mode Exit fullscreen mode

What DevOps Engineers Watch Daily

Infrastructure

CPU
Memory
Disk
Network
Enter fullscreen mode Exit fullscreen mode

Application

Response Time
Errors
Requests
Availability
Enter fullscreen mode Exit fullscreen mode

Database

Connections
Latency
Storage
Enter fullscreen mode Exit fullscreen mode

Part 7

Alerting

Management does not want:

Engineer watching dashboard 24 hours
Enter fullscreen mode Exit fullscreen mode

Need:

Automatic alerts
Enter fullscreen mode Exit fullscreen mode

Example Alert

If:

CPU > 80%
for 5 minutes
Enter fullscreen mode Exit fullscreen mode

Send:

Email
Slack
Teams
PagerDuty
Enter fullscreen mode Exit fullscreen mode

Create Alert

Grafana Alert:

CPU > 80%
Enter fullscreen mode Exit fullscreen mode

Trigger:

Email Notification
Enter fullscreen mode Exit fullscreen mode

Enterprise Example

2 AM:

Application crashes
Enter fullscreen mode Exit fullscreen mode

Alert fires.

Engineer wakes up.

Problem fixed.


Part 8

CloudWatch Alarms

Create:

ECS CPU Alarm
Enter fullscreen mode Exit fullscreen mode

Condition:

CPU > 75%
Enter fullscreen mode Exit fullscreen mode

Action:

SNS Notification
Enter fullscreen mode Exit fullscreen mode

Why CloudWatch Alarms?

AWS infrastructure monitoring.

Examples:

ECS
ALB
RDS
Lambda
Enter fullscreen mode Exit fullscreen mode

Part 9

Load Testing

Install:

ApacheBench
Enter fullscreen mode Exit fullscreen mode

or

k6
Enter fullscreen mode Exit fullscreen mode

Generate:

100 requests
1000 requests
Enter fullscreen mode Exit fullscreen mode

Observe:

CPU
Memory
Response Time
Enter fullscreen mode Exit fullscreen mode

inside Grafana.


What Students Learn

Applications behave differently under load.


Part 10

Troubleshooting Exercise

Instructor intentionally breaks:

Scenario 1

Backend stopped.

Students must identify:

CloudWatch Logs
Grafana
Prometheus
Enter fullscreen mode Exit fullscreen mode

Scenario 2

Database unavailable.

Students identify:

Connection failures
Enter fullscreen mode Exit fullscreen mode

inside logs.


Scenario 3

High CPU.

Students identify:

Metric spike
Enter fullscreen mode Exit fullscreen mode

inside Grafana.


Root Cause Analysis

DevOps engineers do not say:

System down.
Enter fullscreen mode Exit fullscreen mode

They explain:

Backend container restarted.

Database connections exhausted.

CPU reached 95%.

Response time increased.

Service unavailable.
Enter fullscreen mode Exit fullscreen mode

Student Deliverables

Monitoring

Screenshot:

Prometheus Targets
Enter fullscreen mode Exit fullscreen mode

Grafana

Dashboard showing:

CPU
Memory
Requests
Errors
Enter fullscreen mode Exit fullscreen mode

CloudWatch

Log Group screenshot.


Alerting

Alert rule screenshot.


Troubleshooting

Student documents:

Problem
Root Cause
Fix
Enter fullscreen mode Exit fullscreen mode

What Students Have Learned So Far

After Lab 7:

GitHub
Docker
ECR
ECS
ALB
Terraform
GitHub Actions
Frontend
Backend
RDS
CloudWatch
Prometheus
Grafana
Alerting
Enter fullscreen mode Exit fullscreen mode

At this point they understand how a real production application is built and monitored.


LAB 8 Preview

Lab 8 is usually where I introduce:

Security
Secrets Manager
IAM
WAF
SSL/TLS
Vulnerability Scanning
Trivy
Least Privilege
Enter fullscreen mode Exit fullscreen mode

because the next question management asks is:

"The application works. Is it secure?"

That is where students begin learning DevSecOps.

LAB 8 – DevSecOps: Securing the Production Environment

This lab changes the mindset of students.

Up to now they learned:

GitHub
Docker
ECR
ECS
Terraform
RDS
GitHub Actions
Prometheus
Grafana
CloudWatch
Enter fullscreen mode Exit fullscreen mode

The application works.

Management asks:

"What happens if we get hacked tomorrow?"

Most junior engineers focus on deployment.

Senior DevOps engineers focus on:

Availability
Reliability
Security
Compliance
Enter fullscreen mode Exit fullscreen mode

Business Scenario

Current architecture:

Internet
   |
ALB
   |
Frontend ECS
   |
Backend ECS
   |
RDS
Enter fullscreen mode Exit fullscreen mode

Potential problems:

Hardcoded passwords
Open Security Groups
Exposed API Keys
Vulnerable Docker Images
No SSL
No WAF
Overprivileged IAM Roles
Enter fullscreen mode Exit fullscreen mode

Goal

Secure the entire application.

Final architecture:

Internet
   |
AWS WAF
   |
HTTPS (SSL)
   |
ALB
   |
Frontend ECS
   |
Backend ECS
   |
Secrets Manager
   |
RDS
Enter fullscreen mode Exit fullscreen mode

Learning Objectives

Students will learn:

  • IAM Least Privilege
  • Secrets Manager
  • SSL/TLS
  • AWS WAF
  • Container Security
  • Trivy Scanning
  • Image Hardening
  • Security Groups
  • Compliance Concepts
  • Security in CI/CD

Part 1

Principle of Least Privilege

Most beginners create:

AdministratorAccess
Enter fullscreen mode Exit fullscreen mode

for everything.

Bad practice.


Example

Developer needs:

Push Docker Images
Enter fullscreen mode Exit fullscreen mode

Permission:

ECR Access
Enter fullscreen mode Exit fullscreen mode

Only.

Not:

AdministratorAccess
Enter fullscreen mode Exit fullscreen mode

Exercise

Current:

GitHub Actions User
Enter fullscreen mode Exit fullscreen mode

Permissions:

AdministratorAccess
Enter fullscreen mode Exit fullscreen mode

Students replace with:

AmazonEC2ContainerRegistryFullAccess

AmazonECS_FullAccess
Enter fullscreen mode Exit fullscreen mode

Or even more restrictive custom policies.


Enterprise Rule

Always ask:

What is minimum access required?
Enter fullscreen mode Exit fullscreen mode

Part 2

Secrets Manager

Current backend:

DB_PASSWORD="mypassword"
Enter fullscreen mode Exit fullscreen mode

inside source code.

Very bad.


Problem

Repository leaked.

Now attacker knows:

Database Password
API Keys
Tokens
Enter fullscreen mode Exit fullscreen mode

Solution

Store secrets in:

AWS Secrets Manager


Create Secret

Store:

DB_USERNAME
DB_PASSWORD
OPENAI_API_KEY
Enter fullscreen mode Exit fullscreen mode

Backend Reads Secret

Instead of:

password="mypassword"
Enter fullscreen mode Exit fullscreen mode

Use:

process.env.DB_PASSWORD
Enter fullscreen mode Exit fullscreen mode

Enterprise Concept

Never hardcode:

Passwords
Tokens
Keys
Secrets
Enter fullscreen mode Exit fullscreen mode

inside:

GitHub
Docker Images
Terraform Files
Enter fullscreen mode Exit fullscreen mode

Part 3

Security Groups Review

Current:

0.0.0.0/0
Enter fullscreen mode Exit fullscreen mode

everywhere.


Exercise

Review:

ALB

Allow:

80
443
Enter fullscreen mode Exit fullscreen mode

from internet.


ECS

Allow:

80
5000
Enter fullscreen mode Exit fullscreen mode

only from ALB SG.


RDS

Allow:

5432
Enter fullscreen mode Exit fullscreen mode

only from Backend SG.


Enterprise Concept

Never expose database directly.

Bad:

Internet
     |
PostgreSQL
Enter fullscreen mode Exit fullscreen mode

Good:

Internet
     |
ALB
     |
Backend
     |
Database
Enter fullscreen mode Exit fullscreen mode

Part 4

Enable HTTPS

Current:

http://
Enter fullscreen mode Exit fullscreen mode

Not encrypted.


Problem

Attacker can capture:

Passwords
Session IDs
Personal Data
Enter fullscreen mode Exit fullscreen mode

Create Certificate

Use:

AWS Certificate Manager

Request certificate:

studentdomain.com
Enter fullscreen mode Exit fullscreen mode

Attach To ALB

ALB Listener:

443 HTTPS
Enter fullscreen mode Exit fullscreen mode

Result

Traffic becomes:

Encrypted
Enter fullscreen mode Exit fullscreen mode

Enterprise Concept

Never expose login pages over HTTP.


Part 5

AWS WAF

What if someone sends:

1 million requests
Enter fullscreen mode Exit fullscreen mode

or SQL Injection attempts?


Add WAF

Create:

AWS WAF

Attach to ALB.

Enable managed rules:

SQL Injection

Cross Site Scripting

Known Bad Inputs

Bot Protection
Enter fullscreen mode Exit fullscreen mode

Enterprise Concept

WAF protects before requests reach application.


Part 6

Container Vulnerability Scanning

Current image:

FROM node:20
Enter fullscreen mode Exit fullscreen mode

May contain vulnerabilities.


Install Trivy

Students install:

brew install trivy
Enter fullscreen mode Exit fullscreen mode

or

sudo apt install trivy
Enter fullscreen mode Exit fullscreen mode

Scan Image

trivy image backend:v1
Enter fullscreen mode Exit fullscreen mode

Output:

CRITICAL
HIGH
MEDIUM
LOW
Enter fullscreen mode Exit fullscreen mode

Exercise

Students identify:

High Vulnerabilities
Enter fullscreen mode Exit fullscreen mode

and document findings.


Enterprise Concept

Every production image should be scanned before deployment.


Part 7

Security in GitHub Actions

Current:

Build
Push
Deploy
Enter fullscreen mode Exit fullscreen mode

Add Security Stage

Pipeline:

Checkout
     |
Trivy Scan
     |
Build
     |
Push
     |
Deploy
Enter fullscreen mode Exit fullscreen mode

Rule

If vulnerabilities found:

Pipeline Fails
Enter fullscreen mode Exit fullscreen mode

Why?

Stop insecure software before production.


Part 8

Docker Image Hardening

Bad:

FROM ubuntu
Enter fullscreen mode Exit fullscreen mode

Huge image.


Better:

FROM node:20-alpine
Enter fullscreen mode Exit fullscreen mode

Benefits

Smaller
Faster
Less Attack Surface
Enter fullscreen mode Exit fullscreen mode

Enterprise Rule

Use smallest possible image.


Part 9

IAM Roles for ECS

Current:

Access Keys
Enter fullscreen mode Exit fullscreen mode

inside containers.

Bad.


Solution

Create:

Task Role
Enter fullscreen mode Exit fullscreen mode

Attach permissions.

Example:

Read Secrets Manager
Enter fullscreen mode Exit fullscreen mode

Benefits

No access keys stored.


Enterprise Concept

Containers should use IAM Roles, not credentials.


Part 10

Logging Security Events

CloudWatch Logs.

Log:

Failed Login

Unauthorized Access

API Errors

Suspicious Requests
Enter fullscreen mode Exit fullscreen mode

Exercise

Students create:

Security Log Group
Enter fullscreen mode Exit fullscreen mode

Part 11

Compliance Discussion

Introduce:

SOC2
HIPAA
PCI DSS
ISO 27001
Enter fullscreen mode Exit fullscreen mode

Example

Healthcare Application:

Need:

Encryption
Audit Logs
Access Controls
Enter fullscreen mode Exit fullscreen mode

Banking Application

Need:

Least Privilege
Monitoring
Incident Response
Enter fullscreen mode Exit fullscreen mode

Part 12

Security Incident Simulation

Scenario:

GitHub repository leaked
Enter fullscreen mode Exit fullscreen mode

Students answer:

What secrets exposed?

What rotates?

What logs reviewed?

What systems affected?
Enter fullscreen mode Exit fullscreen mode

Scenario 2

Database exposed.

Students answer:

How discovered?

How isolated?

How fixed?
Enter fullscreen mode Exit fullscreen mode

Scenario 3

Critical Trivy finding.

Students answer:

Can deployment continue?

Who approves exception?
Enter fullscreen mode Exit fullscreen mode

What DevOps Engineers Must Check Daily

IAM

Unused Users
Unused Roles
Excessive Permissions
Enter fullscreen mode Exit fullscreen mode

Containers

Vulnerabilities
Outdated Images
Enter fullscreen mode Exit fullscreen mode

Infrastructure

Open Ports
Public Resources
Enter fullscreen mode Exit fullscreen mode

Secrets

Rotation
Expiration
Access
Enter fullscreen mode Exit fullscreen mode

Student Deliverables

Security

Screenshots:

Secrets Manager

IAM Role

Security Groups

HTTPS Listener

WAF Rules
Enter fullscreen mode Exit fullscreen mode

Container Security

Trivy Scan Report
Enter fullscreen mode Exit fullscreen mode

CI/CD

Pipeline with:

Security Scan Stage
Enter fullscreen mode Exit fullscreen mode

Documentation

Student writes:

Top 5 security risks

How risks were mitigated
Enter fullscreen mode Exit fullscreen mode

What Students Have Learned After Lab 8

GitHub
Docker
ECR
ECS
Terraform
GitHub Actions
Frontend
Backend
RDS
CloudWatch
Prometheus
Grafana
Alerting
IAM
Secrets Manager
WAF
SSL/TLS
Trivy
DevSecOps
Enter fullscreen mode Exit fullscreen mode

At this point students can build, deploy, monitor, and secure a production application.


LAB 9 Preview

Lab 9 is usually the capstone project:

High Availability
Auto Scaling
Blue/Green Deployment
Disaster Recovery
Multi-AZ
Backup Strategy
Route53
Production Readiness Review
Enter fullscreen mode Exit fullscreen mode

This is where students learn how large companies keep applications available even when servers, containers, databases, or entire availability zones fail.

LAB 9 – Production Readiness, High Availability, Auto Scaling & Disaster Recovery

This is the capstone project.

Up to now students built:

Frontend
Backend
RDS
Terraform
GitHub Actions
Docker
ECS
Monitoring
Security
Enter fullscreen mode Exit fullscreen mode

The application works.

The application is secure.

The application is monitored.

Management now asks:

"What happens if an Availability Zone fails?"

"What happens if traffic increases 100x?"

"What happens if a deployment breaks production?"

"What happens if somebody deletes the database?"

This is where real DevOps engineering begins.


Goal

Students will transform:

Good Application
Enter fullscreen mode Exit fullscreen mode

into

Production-Ready Application
Enter fullscreen mode Exit fullscreen mode

Final Architecture

Users
   |
Route53
   |
ALB
   |
-------------------
AZ-A      AZ-B
-------------------
Frontend  Frontend
Backend   Backend
-------------------
       |
RDS Multi-AZ
       |
Backups
       |
Monitoring
Enter fullscreen mode Exit fullscreen mode

Learning Objectives

Students will learn:

  • High Availability
  • Multi-AZ Design
  • Auto Scaling
  • Blue/Green Deployment
  • Disaster Recovery
  • Backup Strategy
  • Route53
  • Production Readiness Reviews
  • SLA / SLO / Error Budgets

Part 1

High Availability

Current:

ALB
 |
Frontend
 |
Backend
Enter fullscreen mode Exit fullscreen mode

Problem:

One container dies
Enter fullscreen mode Exit fullscreen mode

Application unavailable.


Solution

Run multiple tasks.

Frontend:

Desired Tasks = 2
Enter fullscreen mode Exit fullscreen mode

Backend:

Desired Tasks = 2
Enter fullscreen mode Exit fullscreen mode

Architecture:

ALB
 |
 |------ Frontend 1
 |
 |------ Frontend 2

 |
 |------ Backend 1
 |
 |------ Backend 2
Enter fullscreen mode Exit fullscreen mode

Enterprise Concept

Never deploy:

1 container
Enter fullscreen mode Exit fullscreen mode

for production.

Minimum:

2 containers
Enter fullscreen mode Exit fullscreen mode

across multiple AZs.


Exercise

Students stop:

Frontend Task 1
Enter fullscreen mode Exit fullscreen mode

Verify:

Website still works
Enter fullscreen mode Exit fullscreen mode

Part 2

Multi-AZ Deployment

Current:

One Availability Zone
Enter fullscreen mode Exit fullscreen mode

Problem:

AWS AZ outage.

Entire application unavailable.


Solution

Deploy ECS into:

us-east-1a

us-east-1b
Enter fullscreen mode Exit fullscreen mode

ALB:

AZ-A

AZ-B
Enter fullscreen mode Exit fullscreen mode

Frontend:

AZ-A

AZ-B
Enter fullscreen mode Exit fullscreen mode

Backend:

AZ-A

AZ-B
Enter fullscreen mode Exit fullscreen mode

Enterprise Rule

Production workloads should span:

Multiple Availability Zones
Enter fullscreen mode Exit fullscreen mode

Exercise

Students diagram:

AZ-A Failure
Enter fullscreen mode Exit fullscreen mode

and explain:

Why application survives
Enter fullscreen mode Exit fullscreen mode

Part 3

Auto Scaling

Business Problem

Traffic:

100 users
Enter fullscreen mode Exit fullscreen mode

becomes:

10,000 users
Enter fullscreen mode Exit fullscreen mode

One backend container cannot handle load.


ECS Auto Scaling

Create policy:

CPU > 70%
Enter fullscreen mode Exit fullscreen mode

Scale:

2 Tasks -> 4 Tasks
Enter fullscreen mode Exit fullscreen mode

Example

Current:

Backend x2
Enter fullscreen mode Exit fullscreen mode

Traffic spike:

Backend x4
Enter fullscreen mode Exit fullscreen mode

Automatically.


Enterprise Concept

Scaling should happen:

Automatically
Enter fullscreen mode Exit fullscreen mode

not:

Engineer manually clicking buttons
Enter fullscreen mode Exit fullscreen mode

Exercise

Generate traffic using:

k6
Enter fullscreen mode Exit fullscreen mode

or

ApacheBench
Enter fullscreen mode Exit fullscreen mode

Observe:

Scaling Event
Enter fullscreen mode Exit fullscreen mode

inside ECS.


Part 4

Route53

Current:

alb-123.us-east-1.elb.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Not professional.


Create Domain

Example:

studentproject.com
Enter fullscreen mode Exit fullscreen mode

Route53:

A Record
Enter fullscreen mode Exit fullscreen mode

pointing to:

ALB
Enter fullscreen mode Exit fullscreen mode

Result

Users visit:

https://studentproject.com
Enter fullscreen mode Exit fullscreen mode

instead of:

https://alb-123.us-east-1.elb.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Enterprise Concept

Customers never see AWS resource names.


Part 5

Blue/Green Deployment

Current Deployment

Version 1
Enter fullscreen mode Exit fullscreen mode

Replace with:

Version 2
Enter fullscreen mode Exit fullscreen mode

Risk:

Deployment breaks
Enter fullscreen mode Exit fullscreen mode

Production down.


Blue Environment

Current Production
Enter fullscreen mode Exit fullscreen mode

Green Environment

New Version
Enter fullscreen mode Exit fullscreen mode

Architecture:

ALB
 |
Blue
 |
Version 1

ALB
 |
Green
 |
Version 2
Enter fullscreen mode Exit fullscreen mode

Test Green

Verify:

Frontend
Backend
Database
Enter fullscreen mode Exit fullscreen mode

working.


Switch Traffic

100%
Enter fullscreen mode Exit fullscreen mode

moves to Green.


Rollback

If broken:

Traffic back to Blue
Enter fullscreen mode Exit fullscreen mode

within seconds.


Enterprise Concept

Many large companies deploy this way.


Part 6

Database Backups

Question:

Database deleted
Enter fullscreen mode Exit fullscreen mode

Now what?


RDS Automated Backups

Enable:

7 Day Retention
Enter fullscreen mode Exit fullscreen mode

or

30 Day Retention
Enter fullscreen mode Exit fullscreen mode

Create Snapshot

Manual Snapshot:

Pre-Release Backup
Enter fullscreen mode Exit fullscreen mode

before deployment.


Exercise

Student documents:

Restore Procedure
Enter fullscreen mode Exit fullscreen mode

Enterprise Rule

Every deployment should have:

Rollback Plan
Enter fullscreen mode Exit fullscreen mode

Part 7

Disaster Recovery

Scenario:

Entire Region Fails
Enter fullscreen mode Exit fullscreen mode

Example:

us-east-1 unavailable
Enter fullscreen mode Exit fullscreen mode

Discussion

Recovery Options

Backup Restore

Hours
Enter fullscreen mode Exit fullscreen mode

Pilot Light

Minimal Environment
Enter fullscreen mode Exit fullscreen mode

running elsewhere.

Warm Standby

Reduced Environment
Enter fullscreen mode Exit fullscreen mode

already running.

Multi-Region Active

Full Environment
Enter fullscreen mode Exit fullscreen mode

in two regions.


Enterprise Concept

Recovery costs money.

Management chooses:

Cost
vs
Recovery Time
Enter fullscreen mode Exit fullscreen mode

Part 8

SLA / SLO / Error Budget

Students learn:

SLA

Customer contract.

Example:

99.9%
Enter fullscreen mode Exit fullscreen mode

availability.


SLO

Internal goal.

Example:

99.95%
Enter fullscreen mode Exit fullscreen mode

availability.


Error Budget

Allowed downtime.

Example:

43 minutes/month
Enter fullscreen mode Exit fullscreen mode

for 99.9%.


Exercise

Calculate:

99.9%
99.95%
99.99%
Enter fullscreen mode Exit fullscreen mode

allowed downtime.


Part 9

Production Readiness Review

Before deployment students answer:

Architecture

Multi-AZ?
Enter fullscreen mode Exit fullscreen mode

Monitoring

Prometheus?
Grafana?
Alerts?
Enter fullscreen mode Exit fullscreen mode

Security

IAM?
Secrets Manager?
HTTPS?
WAF?
Enter fullscreen mode Exit fullscreen mode

Backups

Snapshots?
Retention?
Enter fullscreen mode Exit fullscreen mode

Scaling

Auto Scaling?
Enter fullscreen mode Exit fullscreen mode

Disaster Recovery

Recovery Plan?
Enter fullscreen mode Exit fullscreen mode

Part 10

Failure Simulation Day

Instructor intentionally breaks:

Scenario 1

Stop:

Backend Task
Enter fullscreen mode Exit fullscreen mode

Students verify:

Application survives
Enter fullscreen mode Exit fullscreen mode

Scenario 2

Deploy broken version.

Students:

Rollback
Enter fullscreen mode Exit fullscreen mode

using Blue/Green.


Scenario 3

Database issue.

Students:

Restore Snapshot
Enter fullscreen mode Exit fullscreen mode

Scenario 4

Traffic spike.

Students verify:

Auto Scaling
Enter fullscreen mode Exit fullscreen mode

triggered.


What DevOps Engineers Must Think About

Junior Engineer:

Can I deploy?
Enter fullscreen mode Exit fullscreen mode

Senior Engineer:

Can I recover?
Enter fullscreen mode Exit fullscreen mode

Student Deliverables

Architecture Diagram

Route53
   |
ALB
   |
Frontend x2
   |
Backend x2
   |
RDS Multi-AZ
Enter fullscreen mode Exit fullscreen mode

Auto Scaling

Screenshot:

Scaling Policy
Enter fullscreen mode Exit fullscreen mode

Route53

Domain working.


Blue/Green

Deployment demo.


Backup

Snapshot screenshot.


Disaster Recovery

Written recovery plan.


Production Readiness Report

Students submit:

Architecture
Security
Monitoring
Scaling
Backups
Recovery
Risks
Enter fullscreen mode Exit fullscreen mode

Final Result After Labs 1–9

Students have built:

GitHub
     |
GitHub Actions
     |
Terraform
     |
Docker
     |
ECR
     |
ECS Fargate
     |
ALB
     |
Route53
     |
Frontend
     |
Backend
     |
RDS
     |
CloudWatch
Prometheus
Grafana
     |
Secrets Manager
IAM
WAF
TLS
     |
Auto Scaling
Blue/Green
Backups
Disaster Recovery
Enter fullscreen mode Exit fullscreen mode

This sequence gives students something most bootcamps miss:

They don't just learn Docker, Terraform, AWS, and GitHub separately—they see exactly how a developer's code becomes a production application and how DevOps engineers keep it running, secure, scalable, and recoverable.

Top comments (0)