The Big Picture
What we're building:
A production-ready TODO application with:
- Multiple microservices (5 different programming languages!)
- Automated infrastructure provisioning (Terraform)
- Automated configuration management (Ansible)
- CI/CD pipelines with drift detection
- Automatic HTTPS with Traefik
- Zero-trust security model
Architecture:
┌──────────────┐
│ Traefik │
│ (HTTPS/Proxy)│
└───────┬──────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐
│ Frontend │ │ Auth API │ │ Todos API │
│ (Vue.js) │ │ (Go) │ │ (Node.js) │
└────────────────┘ └────────────────┘ └────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌───────▼────────┐ ┌──▼────┐ ┌─────▼──────┐
│ Users API │ │ Redis │ │ Log │
│ (Java Spring) │ │ Queue │ │ Processor │
└────────────────┘ └───────┘ │ (Python) │
└────────────┘
Understanding the Application
The Services
1. Frontend (Vue.js)
- User interface
- Login page → TODO dashboard
- Communicates with backend APIs
- Port: 80/443 (via Traefik)
2. Auth API (Go)
- Handles user authentication
- Issues JWT tokens
- Endpoint:
/api/auth
3. Todos API (Node.js)
- Manages TODO items
- CRUD operations
- Requires valid JWT token
- Endpoint:
/api/todos
4. Users API (Java Spring Boot)
- User management
- Profile operations
- Endpoint:
/api/users
5. Log Processor (Python)
- Processes background tasks
- Consumes from Redis queue
- Writes audit logs
6. Redis Queue
- Message broker
- Task queue for async operations
Phase 1: Containerization
Let's containerize each service. The key is understanding that each language has its own best practices.
Frontend Dockerfile (Vue.js)
# Multi-stage build for optimized production image
# Stage 1: Build the application
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production
# Copy source code
COPY . .
# Build for production
RUN npm run build
# Stage 2: Serve with nginx
FROM nginx:alpine
# Copy built assets from builder stage
COPY --from=builder /app/dist /usr/share/nginx/html
# Copy nginx configuration
COPY nginx.conf /etc/nginx/conf.d/default.conf
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD wget --quiet --tries=1 --spider http://localhost/ || exit 1
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Why multi-stage builds?
- Builder stage: 800MB (includes build tools)
- Final stage: 25MB (only nginx + static files)
- 97% size reduction!
Frontend nginx config:
server {
listen 80;
root /usr/share/nginx/html;
index index.html;
# SPA routing - send all requests to index.html
location / {
try_files $uri $uri/ /index.html;
}
# Cache static assets
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg)$ {
expires 1y;
add_header Cache-Control "public, immutable";
}
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
}
Auth API Dockerfile (Go)
# Multi-stage build for Go
# Stage 1: Build the binary
FROM golang:1.21-alpine AS builder
WORKDIR /app
# Copy go mod files
COPY go.mod go.sum ./
# Download dependencies
RUN go mod download
# Copy source code
COPY . .
# Build the binary
# CGO_ENABLED=0 creates a static binary (no external dependencies)
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .
# Stage 2: Create minimal runtime image
FROM alpine:latest
# Add ca-certificates for HTTPS calls
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy the binary from builder
COPY --from=builder /app/main .
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD wget --quiet --tries=1 --spider http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["./main"]
Why this approach?
- Builder stage: 400MB
- Final stage: 15MB (just Alpine + binary)
- Static binary = no runtime dependencies
- Faster startup, smaller attack surface
Todos API Dockerfile (Node.js)
FROM node:18-alpine
WORKDIR /app
# Install dependencies first (better caching)
COPY package*.json ./
RUN npm ci --only=production
# Copy application code
COPY . .
# Create non-root user for security
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001 && \
chown -R nodejs:nodejs /app
USER nodejs
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD node healthcheck.js || exit 1
EXPOSE 3000
CMD ["node", "server.js"]
Security note: Running as non-root user limits damage if container is compromised.
Users API Dockerfile (Java Spring Boot)
# Multi-stage build for Java
# Stage 1: Build with Maven
FROM maven:3.9-eclipse-temurin-17 AS builder
WORKDIR /app
# Copy pom.xml first (dependency caching)
COPY pom.xml ./
RUN mvn dependency:go-offline
# Copy source and build
COPY src ./src
RUN mvn clean package -DskipTests
# Stage 2: Runtime
FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
# Copy JAR from builder
COPY --from=builder /app/target/*.jar app.jar
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD wget --quiet --tries=1 --spider http://localhost:8080/actuator/health || exit 1
EXPOSE 8080
# Use exec form to ensure proper signal handling
ENTRYPOINT ["java", "-jar", "/app/app.jar"]
Java-specific optimizations:
# Production optimization flags
ENTRYPOINT ["java", \
"-XX:+UseContainerSupport", \
"-XX:MaxRAMPercentage=75.0", \
"-XX:+ExitOnOutOfMemoryError", \
"-jar", "/app/app.jar"]
Log Processor Dockerfile (Python)
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Create non-root user
RUN useradd -m -u 1001 processor && \
chown -R processor:processor /app
USER processor
# Health check (check if process is running)
HEALTHCHECK --interval=30s --timeout=3s \
CMD ps aux | grep processor.py || exit 1
CMD ["python", "processor.py"]
Docker Compose - Orchestrating Everything
Now let's tie it all together with docker-compose.yml:
version: '3.8'
services:
# Traefik reverse proxy
traefik:
image: traefik:v2.10
container_name: traefik
command:
# API and dashboard
- "--api.dashboard=true"
- "--api.insecure=true"
# Docker provider
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
# Entrypoints
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
# HTTP to HTTPS redirect
- "--entrypoints.web.http.redirections.entrypoint.to=websecure"
- "--entrypoints.web.http.redirections.entrypoint.scheme=https"
# Let's Encrypt
- "--certificatesresolvers.letsencrypt.acme.tlschallenge=true"
- "--certificatesresolvers.letsencrypt.acme.email=${ACME_EMAIL}"
- "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
ports:
- "80:80"
- "443:443"
- "8080:8080" # Dashboard
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
- "./letsencrypt:/letsencrypt"
networks:
- web
restart: unless-stopped
# Frontend
frontend:
build:
context: ./frontend
dockerfile: Dockerfile
container_name: frontend
labels:
- "traefik.enable=true"
- "traefik.http.routers.frontend.rule=Host(`${DOMAIN}`)"
- "traefik.http.routers.frontend.entrypoints=websecure"
- "traefik.http.routers.frontend.tls.certresolver=letsencrypt"
- "traefik.http.services.frontend.loadbalancer.server.port=80"
networks:
- web
restart: unless-stopped
# Auth API
auth:
build:
context: ./auth-api
dockerfile: Dockerfile
container_name: auth-api
environment:
- DB_HOST=postgres
- DB_PORT=5432
- DB_NAME=${DB_NAME}
- DB_USER=${DB_USER}
- DB_PASSWORD=${DB_PASSWORD}
- JWT_SECRET=${JWT_SECRET}
- REDIS_URL=redis://redis:6379
labels:
- "traefik.enable=true"
- "traefik.http.routers.auth.rule=Host(`${DOMAIN}`) && PathPrefix(`/api/auth`)"
- "traefik.http.routers.auth.entrypoints=websecure"
- "traefik.http.routers.auth.tls.certresolver=letsencrypt"
- "traefik.http.services.auth.loadbalancer.server.port=8080"
depends_on:
- postgres
- redis
networks:
- web
- backend
restart: unless-stopped
# Todos API
todos:
build:
context: ./todos-api
dockerfile: Dockerfile
container_name: todos-api
environment:
- DB_HOST=postgres
- DB_PORT=5432
- DB_NAME=${DB_NAME}
- DB_USER=${DB_USER}
- DB_PASSWORD=${DB_PASSWORD}
- REDIS_URL=redis://redis:6379
labels:
- "traefik.enable=true"
- "traefik.http.routers.todos.rule=Host(`${DOMAIN}`) && PathPrefix(`/api/todos`)"
- "traefik.http.routers.todos.entrypoints=websecure"
- "traefik.http.routers.todos.tls.certresolver=letsencrypt"
- "traefik.http.services.todos.loadbalancer.server.port=3000"
depends_on:
- postgres
- redis
networks:
- web
- backend
restart: unless-stopped
# Users API
users:
build:
context: ./users-api
dockerfile: Dockerfile
container_name: users-api
environment:
- SPRING_DATASOURCE_URL=jdbc:postgresql://postgres:5432/${DB_NAME}
- SPRING_DATASOURCE_USERNAME=${DB_USER}
- SPRING_DATASOURCE_PASSWORD=${DB_PASSWORD}
- SPRING_REDIS_HOST=redis
- SPRING_REDIS_PORT=6379
labels:
- "traefik.enable=true"
- "traefik.http.routers.users.rule=Host(`${DOMAIN}`) && PathPrefix(`/api/users`)"
- "traefik.http.routers.users.entrypoints=websecure"
- "traefik.http.routers.users.tls.certresolver=letsencrypt"
- "traefik.http.services.users.loadbalancer.server.port=8080"
depends_on:
- postgres
- redis
networks:
- web
- backend
restart: unless-stopped
# Log Processor
log-processor:
build:
context: ./log-processor
dockerfile: Dockerfile
container_name: log-processor
environment:
- REDIS_URL=redis://redis:6379
- LOG_PATH=/logs
volumes:
- ./logs:/logs
depends_on:
- redis
networks:
- backend
restart: unless-stopped
# Redis
redis:
image: redis:7-alpine
container_name: redis
command: redis-server --appendonly yes
volumes:
- redis-data:/data
networks:
- backend
restart: unless-stopped
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
# PostgreSQL
postgres:
image: postgres:15-alpine
container_name: postgres
environment:
- POSTGRES_DB=${DB_NAME}
- POSTGRES_USER=${DB_USER}
- POSTGRES_PASSWORD=${DB_PASSWORD}
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- backend
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
interval: 10s
timeout: 3s
retries: 3
networks:
web:
driver: bridge
backend:
driver: bridge
volumes:
postgres-data:
redis-data:
Key concepts in this compose file:
1. Networks:
networks:
web: # Public-facing services
backend: # Internal services only
- Frontend, APIs →
webnetwork (accessible via Traefik) - Database, Redis →
backendnetwork only (isolated) - This provides network-level security
2. Traefik Labels:
labels:
- "traefik.enable=true"
- "traefik.http.routers.auth.rule=Host(`${DOMAIN}`) && PathPrefix(`/api/auth`)"
- "traefik.http.routers.auth.tls.certresolver=letsencrypt"
These labels tell Traefik how to route traffic:
- Route requests to
yourdomain.com/api/auth→ auth service - Automatically get SSL certificate from Let's Encrypt
- Handle HTTPS termination
3. Environment Variables:
environment:
- DB_HOST=postgres
- JWT_SECRET=${JWT_SECRET}
Secrets come from .env file (never committed to git!).
Environment Configuration
Create .env file:
# Domain configuration
DOMAIN=your-domain.com
ACME_EMAIL=your-email@example.com
# Database
DB_NAME=todoapp
DB_USER=todouser
DB_PASSWORD=change-this-strong-password
# Security
JWT_SECRET=change-this-to-random-string-min-32-chars
# Optional: Docker registry
DOCKER_REGISTRY=ghcr.io/yourusername
Security checklist for .env:
- [ ] Never commit .env to git
- [ ] Add .env to .gitignore
- [ ] Use strong passwords (20+ characters)
- [ ] Use different passwords for each service
- [ ] Rotate secrets regularly
Phase 2: Infrastructure as Code with Terraform
Now let's provision the cloud infrastructure automatically.
Project Structure
infra/
├── terraform/
│ ├── main.tf # Main configuration
│ ├── variables.tf # Input variables
│ ├── outputs.tf # Output values
│ ├── provider.tf # Provider configuration
│ └── backend.tf # Remote state configuration
├── ansible/
│ ├── inventory/ # Dynamic inventory
│ ├── roles/
│ │ ├── dependencies/ # Install Docker, etc.
│ │ └── deploy/ # Deploy application
│ ├── playbook.yml # Main playbook
│ └── ansible.cfg # Ansible configuration
└── scripts/
├── deploy.sh # Deployment orchestration
└── drift-check.sh # Drift detection
Terraform Configuration
provider.tf:
# Provider configuration
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
local = {
source = "hashicorp/local"
version = "~> 2.0"
}
null = {
source = "hashicorp/null"
version = "~> 3.0"
}
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = "todo-app"
Environment = var.environment
ManagedBy = "terraform"
}
}
}
backend.tf:
# Remote state storage - crucial for team collaboration
terraform {
backend "s3" {
bucket = "your-terraform-state-bucket"
key = "todo-app/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
Why remote state?
- Team collaboration - everyone sees same state
- State locking - prevents concurrent modifications
- Backup - state is backed up in S3
- Encryption - sensitive data encrypted at rest
variables.tf:
variable "aws_region" {
description = "AWS region to deploy resources"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
type = string
default = "production"
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.medium"
}
variable "ssh_public_key" {
description = "SSH public key for access"
type = string
}
variable "domain_name" {
description = "Domain name for the application"
type = string
}
variable "alert_email" {
description = "Email for drift detection alerts"
type = string
}
variable "app_port" {
description = "Application port"
type = number
default = 80
}
main.tf:
# VPC for network isolation
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "todo-app-vpc"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "todo-app-igw"
}
}
# Public Subnet
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "${var.aws_region}a"
map_public_ip_on_launch = true
tags = {
Name = "todo-app-public-subnet"
}
}
# Route Table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "todo-app-public-rt"
}
}
# Route Table Association
resource "aws_route_table_association" "public" {
subnet_id = aws_subnet.public.id
route_table_id = aws_route_table.public.id
}
# Security Group
resource "aws_security_group" "app" {
name = "todo-app-sg"
description = "Security group for TODO application"
vpc_id = aws_vpc.main.id
# SSH
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "SSH access"
}
# HTTP
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "HTTP access"
}
# HTTPS
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "HTTPS access"
}
# Outbound - allow all
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
description = "Allow all outbound"
}
tags = {
Name = "todo-app-sg"
}
}
# SSH Key Pair
resource "aws_key_pair" "deployer" {
key_name = "todo-app-deployer"
public_key = var.ssh_public_key
}
# Latest Ubuntu AMI
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
# EC2 Instance
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_type
key_name = aws_key_pair.deployer.key_name
subnet_id = aws_subnet.public.id
vpc_security_group_ids = [aws_security_group.app.id]
root_block_device {
volume_size = 30
volume_type = "gp3"
encrypted = true
}
user_data = <<-EOF
#!/bin/bash
apt-get update
apt-get install -y python3 python3-pip
EOF
tags = {
Name = "todo-app-server"
}
# Lifecycle rule for idempotency
lifecycle {
ignore_changes = [
user_data, # Don't recreate if user_data changes
ami, # Don't recreate on AMI updates unless forced
]
}
}
# Elastic IP for stable public IP
resource "aws_eip" "app" {
instance = aws_instance.app.id
domain = "vpc"
tags = {
Name = "todo-app-eip"
}
}
# Generate Ansible inventory
resource "local_file" "ansible_inventory" {
content = templatefile("${path.module}/templates/inventory.tpl", {
app_server_ip = aws_eip.app.public_ip
ssh_key_path = "~/.ssh/id_rsa"
ssh_user = "ubuntu"
})
filename = "${path.module}/../ansible/inventory/hosts"
# Only regenerate if values change
lifecycle {
create_before_destroy = true
}
}
# Trigger Ansible after provisioning
resource "null_resource" "ansible_provisioner" {
# Run when instance changes
triggers = {
instance_id = aws_instance.app.id
timestamp = timestamp()
}
# Wait for instance to be ready
provisioner "local-exec" {
command = <<-EOT
echo "Waiting for SSH to be ready..."
until ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 ubuntu@${aws_eip.app.public_ip} echo "SSH Ready"; do
sleep 5
done
echo "Running Ansible playbook..."
cd ${path.module}/../ansible
ansible-playbook -i inventory/hosts playbook.yml
EOT
}
depends_on = [
local_file.ansible_inventory,
aws_eip.app
]
}
templates/inventory.tpl:
[app_servers]
todo-app ansible_host=${app_server_ip} ansible_user=${ssh_user} ansible_ssh_private_key_file=${ssh_key_path}
[app_servers:vars]
ansible_python_interpreter=/usr/bin/python3
outputs.tf:
output "instance_public_ip" {
description = "Public IP of the application server"
value = aws_eip.app.public_ip
}
output "instance_id" {
description = "ID of the EC2 instance"
value = aws_instance.app.id
}
output "domain_name" {
description = "Domain name for the application"
value = var.domain_name
}
output "ssh_command" {
description = "SSH command to connect to the server"
value = "ssh ubuntu@${aws_eip.app.public_ip}"
}
Understanding Terraform Idempotency
What is idempotency?
Running the same Terraform code multiple times produces the same result without creating duplicates.
Example - Non-idempotent (bad):
resource "aws_instance" "app" {
ami = "ami-12345"
instance_type = "t3.medium"
# This causes recreation on every apply!
tags = {
Timestamp = timestamp()
}
}
Idempotent (good):
resource "aws_instance" "app" {
ami = "ami-12345"
instance_type = "t3.medium"
tags = {
Name = "todo-app-server"
}
lifecycle {
ignore_changes = [
tags["Timestamp"],
user_data
]
}
}
Drift Detection
What is drift?
Drift occurs when actual infrastructure differs from Terraform state (manual changes, external tools, etc.).
drift-check.sh:
#!/bin/bash
set -e
echo "Checking for infrastructure drift..."
# Run terraform plan and capture output
PLAN_OUTPUT=$(terraform plan -detailed-exitcode -no-color 2>&1) || EXIT_CODE=$?
# Exit codes:
# 0 = no changes
# 1 = error
# 2 = changes detected (drift!)
if [ $EXIT_CODE -eq 0 ]; then
echo "✅ No drift detected - infrastructure matches desired state"
exit 0
elif [ $EXIT_CODE -eq 2 ]; then
echo "⚠️ DRIFT DETECTED - infrastructure has changed!"
echo ""
echo "$PLAN_OUTPUT"
echo ""
# Send email alert
./send-drift-alert.sh "$PLAN_OUTPUT"
# In CI/CD, pause for manual approval
if [ "$CI" = "true" ]; then
echo "Pausing for manual approval..."
# GitHub Actions, GitLab CI, etc. have approval mechanisms
exit 2
fi
else
echo "❌ Error running terraform plan"
echo "$PLAN_OUTPUT"
exit 1
fi
send-drift-alert.sh:
#!/bin/bash
DRIFT_DETAILS="$1"
ALERT_EMAIL="${ALERT_EMAIL:-admin@example.com}"
# Using AWS SES
aws ses send-email \
--from "terraform@example.com" \
--to "$ALERT_EMAIL" \
--subject "⚠️ Terraform Drift Detected" \
--text "$DRIFT_DETAILS"
# Or using curl with Mailgun, SendGrid, etc.
curl -s --user "api:$MAILGUN_API_KEY" \
https://api.mailgun.net/v3/$MAILGUN_DOMAIN/messages \
-F from="terraform@example.com" \
-F to="$ALERT_EMAIL" \
-F subject="⚠️ Terraform Drift Detected" \
-F text="$DRIFT_DETAILS"
Phase 3: Configuration Management with Ansible
Terraform provisions infrastructure, Ansible configures it.
Ansible Project Structure
ansible/
├── inventory/
│ └── hosts # Generated by Terraform
├── roles/
│ ├── dependencies/
│ │ ├── tasks/
│ │ │ └── main.yml
│ │ └── handlers/
│ │ └── main.yml
│ └── deploy/
│ ├── tasks/
│ │ └── main.yml
│ ├── templates/
│ │ └── .env.j2
│ └── handlers/
│ └── main.yml
├── playbook.yml
└── ansible.cfg
ansible.cfg
[defaults]
inventory = inventory/hosts
remote_user = ubuntu
private_key_file = ~/.ssh/id_rsa
host_key_checking = False
retry_files_enabled = False
# Faster execution
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 3600
# Better output
stdout_callback = yaml
bin_ansible_callbacks = True
[ssh_connection]
pipelining = True
roles/dependencies/tasks/main.yml
---
# Install required system dependencies
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
become: yes
- name: Install required packages
apt:
name:
- apt-transport-https
- ca-certificates
- curl
- gnupg
- lsb-release
- python3-pip
- git
- ufw
state: present
become: yes
- name: Add Docker GPG key
apt_key:
url: https://download.docker.com/linux/ubuntu/gpg
state: present
become: yes
- name: Add Docker repository
apt_repository:
repo: "deb [arch=amd64] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
state: present
become: yes
- name: Install Docker
apt:
name:
- docker-ce
- docker-ce-cli
- containerd.io
- docker-buildx-plugin
- docker-compose-plugin
state: present
become: yes
notify: Restart Docker
- name: Add user to docker group
user:
name: "{{ ansible_user }}"
groups: docker
append: yes
become: yes
- name: Install Docker Compose (standalone)
get_url:
url: "https://github.com/docker/compose/releases/download/v2.23.0/docker-compose-linux-x86_64"
dest: /usr/local/bin/docker-compose
mode: '0755'
become: yes
- name: Configure UFW firewall
ufw:
rule: "{{ item.rule }}"
port: "{{ item.port }}"
proto: "{{ item.proto }}"
loop:
- { rule: 'allow', port: '22', proto: 'tcp' }
- { rule: 'allow', port: '80', proto: 'tcp' }
- { rule: 'allow', port: '443', proto: 'tcp' }
become: yes
- name: Enable UFW
ufw:
state: enabled
become: yes
roles/dependencies/handlers/main.yml
---
- name: Restart Docker
systemd:
name: docker
state: restarted
enabled: yes
become: yes
roles/deploy/tasks/main.yml
---
# Deploy the application
- name: Create application directory
file:
path: /opt/todo-app
state: directory
owner: "{{ ansible_user }}"
group: "{{ ansible_user }}"
mode: '0755'
become: yes
- name: Clone application repository
git:
repo: "{{ app_repo_url }}"
dest: /opt/todo-app
version: "{{ app_branch | default('main') }}"
force: yes
register: git_clone
- name: Create environment file from template
template:
src: .env.j2
dest: /opt/todo-app/.env
owner: "{{ ansible_user }}"
mode: '0600'
no_log: yes # Don't log sensitive env vars
- name: Create letsencrypt directory
file:
path: /opt/todo-app/letsencrypt
state: directory
mode: '0755'
- name: Pull latest Docker images
community.docker.docker_compose:
project_src: /opt/todo-app
pull: yes
when: git_clone.changed
- name: Start application with Docker Compose
community.docker.docker_compose:
project_src: /opt/todo-app
state: present
restarted: "{{ git_clone.changed }}"
register: compose_output
- name: Wait for application to be healthy
uri:
url: "https://{{ domain_name }}/health"
status_code: 200
validate_certs: no
retries: 10
delay: 10
register: health_check
until: health_check.status == 200
- name: Display deployment status
debug:
msg: "Application deployed successfully at https://{{ domain_name }}"
roles/deploy/templates/.env.j2
# Auto-generated by Ansible - DO NOT EDIT MANUALLY
# Domain configuration
DOMAIN={{ domain_name }}
ACME_EMAIL={{ acme_email }}
# Database
DB_NAME={{ db_name }}
DB_USER={{ db_user }}
DB_PASSWORD={{ db_password }}
# Security
JWT_SECRET={{ jwt_secret }}
# Application
NODE_ENV=production
LOG_LEVEL=info
playbook.yml
---
- name: Deploy TODO Application
hosts: app_servers
become: no
vars:
app_repo_url: "https://github.com/yourusername/todo-app.git"
app_branch: "main"
domain_name: "{{ lookup('env', 'DOMAIN') }}"
acme_email: "{{ lookup('env', 'ACME_EMAIL') }}"
db_name: "{{ lookup('env', 'DB_NAME') }}"
db_user: "{{ lookup('env', 'DB_USER') }}"
db_password: "{{ lookup('env', 'DB_PASSWORD') }}"
jwt_secret: "{{ lookup('env', 'JWT_SECRET') }}"
roles:
- dependencies
- deploy
post_tasks:
- name: Verify deployment
uri:
url: "https://{{ domain_name }}"
status_code: 200
validate_certs: yes
delegate_to: localhost
- name: Display application URL
debug:
msg: "Application is live at https://{{ domain_name }}"
Phase 4: CI/CD Pipeline
Now let's automate everything with GitHub Actions.
.github/workflows/infrastructure.yml
name: Infrastructure Deployment
on:
push:
branches: [main]
paths:
- 'infra/terraform/**'
- 'infra/ansible/**'
- '.github/workflows/infrastructure.yml'
workflow_dispatch: # Manual trigger
env:
TF_VERSION: '1.6.0'
AWS_REGION: 'us-east-1'
jobs:
terraform-plan:
name: Terraform Plan & Drift Detection
runs-on: ubuntu-latest
outputs:
has_changes: ${{ steps.plan.outputs.has_changes }}
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Terraform Init
run: |
cd infra/terraform
terraform init
- name: Terraform Plan
id: plan
run: |
cd infra/terraform
terraform plan -detailed-exitcode -out=tfplan || EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "has_changes=false" >> $GITHUB_OUTPUT
echo "✅ No infrastructure changes detected"
elif [ $EXIT_CODE -eq 2 ]; then
echo "has_changes=true" >> $GITHUB_OUTPUT
echo "⚠️ Infrastructure drift detected!"
else
echo "❌ Terraform plan failed"
exit 1
fi
- name: Save plan
if: steps.plan.outputs.has_changes == 'true'
uses: actions/upload-artifact@v3
with:
name: tfplan
path: infra/terraform/tfplan
- name: Send drift alert email
if: steps.plan.outputs.has_changes == 'true'
uses: dawidd6/action-send-mail@v3
with:
server_address: smtp.gmail.com
server_port: 465
username: ${{ secrets.MAIL_USERNAME }}
password: ${{ secrets.MAIL_PASSWORD }}
subject: ⚠️ Terraform Drift Detected - TODO App
to: ${{ secrets.ALERT_EMAIL }}
from: Terraform CI/CD
body: |
Infrastructure drift has been detected!
Review the changes and approve the workflow to apply:
${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
terraform-apply:
name: Terraform Apply
runs-on: ubuntu-latest
needs: terraform-plan
if: needs.terraform-plan.outputs.has_changes == 'true'
environment: production # Requires manual approval
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Download plan
uses: actions/download-artifact@v3
with:
name: tfplan
path: infra/terraform/
- name: Terraform Init
run: |
cd infra/terraform
terraform init
- name: Terraform Apply
run: |
cd infra/terraform
terraform apply tfplan
- name: Save outputs
run: |
cd infra/terraform
terraform output -json > outputs.json
- name: Upload outputs
uses: actions/upload-artifact@v3
with:
name: terraform-outputs
path: infra/terraform/outputs.json
ansible-deploy:
name: Ansible Deployment
runs-on: ubuntu-latest
needs: terraform-apply
if: always() && (needs.terraform-apply.result == 'success' || needs.terraform-plan.outputs.has_changes == 'false')
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install Ansible
run: |
pip install ansible
- name: Setup SSH key
run: |
mkdir -p ~/.ssh
echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa
ssh-keyscan -H ${{ secrets.SERVER_IP }} >> ~/.ssh/known_hosts
- name: Run Ansible playbook
env:
DOMAIN: ${{ secrets.DOMAIN }}
ACME_EMAIL: ${{ secrets.ACME_EMAIL }}
DB_NAME: ${{ secrets.DB_NAME }}
DB_USER: ${{ secrets.DB_USER }}
DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
JWT_SECRET: ${{ secrets.JWT_SECRET }}
run: |
cd infra/ansible
ansible-playbook -i inventory/hosts playbook.yml
- name: Verify deployment
run: |
sleep 30 # Wait for services to stabilize
curl -f https://${{ secrets.DOMAIN }}/health || exit 1
echo "✅ Deployment verified!"
.github/workflows/application.yml
name: Application Deployment
on:
push:
branches: [main]
paths:
- 'frontend/**'
- 'auth-api/**'
- 'todos-api/**'
- 'users-api/**'
- 'log-processor/**'
- 'docker-compose.yml'
workflow_dispatch:
jobs:
deploy:
name: Deploy Application
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup SSH key
run: |
mkdir -p ~/.ssh
echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa
ssh-keyscan -H ${{ secrets.SERVER_IP }} >> ~/.ssh/known_hosts
- name: Deploy to server
run: |
ssh ubuntu@${{ secrets.SERVER_IP }} << 'EOF'
cd /opt/todo-app
git pull origin main
docker-compose pull
docker-compose up -d --build
EOF
- name: Wait for deployment
run: sleep 30
- name: Health check
run: |
curl -f https://${{ secrets.DOMAIN }}/health || exit 1
echo "✅ Application deployed successfully!"
Understanding the CI/CD Flow
Infrastructure changes (Terraform/Ansible):
1. Push to main
↓
2. Run terraform plan
↓
3. Detect drift? → Send email
↓
4. Pause for manual approval (GitHub Environment protection)
↓
5. Apply changes
↓
6. Run Ansible
↓
7. Verify deployment
Application changes:
1. Push to main
↓
2. SSH to server
↓
3. Git pull
↓
4. docker-compose pull
↓
5. docker-compose up
↓
6. Health check
Testing the Complete Setup
Local Testing
1. Test containers locally:
# Start everything
docker-compose up -d
# Check status
docker-compose ps
# View logs
docker-compose logs -f
# Test frontend
curl http://localhost
# Test APIs
curl http://localhost/api/auth/health
curl http://localhost/api/todos/health
curl http://localhost/api/users/health
# Stop everything
docker-compose down
2. Test Terraform:
cd infra/terraform
# Initialize
terraform init
# Validate
terraform validate
# Plan (dry run)
terraform plan
# Apply (create infrastructure)
terraform apply
# Show outputs
terraform output
# Destroy (cleanup)
terraform destroy
3. Test Ansible:
cd infra/ansible
# Test connection
ansible all -m ping
# Check syntax
ansible-playbook playbook.yml --syntax-check
# Dry run
ansible-playbook playbook.yml --check
# Run for real
ansible-playbook playbook.yml
# Run specific role
ansible-playbook playbook.yml --tags deploy
Production Deployment
Complete deployment from scratch:
# 1. Clone the repository
git clone https://github.com/yourusername/todo-app.git
cd todo-app
# 2. Configure secrets
cp .env.example .env
# Edit .env with your values
# 3. Initialize Terraform
cd infra/terraform
terraform init
# 4. Create infrastructure
terraform plan
terraform apply
# Wait for Ansible to complete (triggered automatically)
# 5. Configure DNS
# Point your domain to the Elastic IP shown in terraform outputs
# 6. Verify deployment
curl https://your-domain.com
Expected result:
- Login page loads at https://your-domain.com
- HTTPS works (automatic certificate from Let's Encrypt)
- APIs respond at /api/auth, /api/todos, /api/users
Troubleshooting
Issue: Terraform fails with "state locked"
# Check lock info
terraform force-unlock <LOCK_ID>
# Or wait for other operation to complete
Issue: Ansible can't connect to server
# Test SSH manually
ssh -i ~/.ssh/id_rsa ubuntu@<SERVER_IP>
# Check inventory
ansible-inventory --list -i inventory/hosts
# Verbose output
ansible-playbook playbook.yml -vvv
Issue: Containers won't start
# Check logs
docker-compose logs <service-name>
# Check disk space
df -h
# Check memory
free -h
# Restart specific service
docker-compose restart <service-name>
Issue: HTTPS not working
# Check Traefik logs
docker logs traefik
# Verify DNS points to server
dig your-domain.com
# Check certificate
docker exec traefik cat /letsencrypt/acme.json
# Force certificate renewal
docker-compose down
rm -rf letsencrypt/acme.json
docker-compose up -d
Top comments (0)