Sherifdeen Adebayo

Posted on Dec 9, 2025

Containerized Microservices with Full Automation

#devops #docker #terraform #microservices

The Big Picture

What we're building:
A production-ready TODO application with:

Multiple microservices (5 different programming languages!)
Automated infrastructure provisioning (Terraform)
Automated configuration management (Ansible)
CI/CD pipelines with drift detection
Automatic HTTPS with Traefik
Zero-trust security model

Architecture:

                           ┌──────────────┐
                           │   Traefik    │
                           │ (HTTPS/Proxy)│
                           └───────┬──────┘
                                   │
            ┌──────────────────────┼──────────────────────┐
            │                      │                      │
    ┌───────▼────────┐    ┌───────▼────────┐    ┌───────▼────────┐
    │   Frontend     │    │   Auth API     │    │   Todos API    │
    │   (Vue.js)     │    │     (Go)       │    │   (Node.js)    │
    └────────────────┘    └────────────────┘    └────────────────┘
                                   │
                    ┌──────────────┼──────────────┐
                    │              │              │
            ┌───────▼────────┐ ┌──▼────┐  ┌─────▼──────┐
            │   Users API    │ │ Redis │  │ Log        │
            │ (Java Spring)  │ │ Queue │  │ Processor  │
            └────────────────┘ └───────┘  │ (Python)   │
                                           └────────────┘

Understanding the Application

The Services

1. Frontend (Vue.js)

User interface
Login page → TODO dashboard
Communicates with backend APIs
Port: 80/443 (via Traefik)

2. Auth API (Go)

Handles user authentication
Issues JWT tokens
Endpoint: /api/auth

3. Todos API (Node.js)

Manages TODO items
CRUD operations
Requires valid JWT token
Endpoint: /api/todos

4. Users API (Java Spring Boot)

User management
Profile operations
Endpoint: /api/users

5. Log Processor (Python)

Processes background tasks
Consumes from Redis queue
Writes audit logs

6. Redis Queue

Message broker
Task queue for async operations

Phase 1: Containerization

Let's containerize each service. The key is understanding that each language has its own best practices.

Frontend Dockerfile (Vue.js)

# Multi-stage build for optimized production image

# Stage 1: Build the application
FROM node:18-alpine AS builder

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy source code
COPY . .

# Build for production
RUN npm run build

# Stage 2: Serve with nginx
FROM nginx:alpine

# Copy built assets from builder stage
COPY --from=builder /app/dist /usr/share/nginx/html

# Copy nginx configuration
COPY nginx.conf /etc/nginx/conf.d/default.conf

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget --quiet --tries=1 --spider http://localhost/ || exit 1

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

Why multi-stage builds?

Builder stage: 800MB (includes build tools)
Final stage: 25MB (only nginx + static files)
97% size reduction!

Frontend nginx config:

server {
    listen 80;
    root /usr/share/nginx/html;
    index index.html;

    # SPA routing - send all requests to index.html
    location / {
        try_files $uri $uri/ /index.html;
    }

    # Cache static assets
    location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg)$ {
        expires 1y;
        add_header Cache-Control "public, immutable";
    }

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
}

Auth API Dockerfile (Go)

# Multi-stage build for Go

# Stage 1: Build the binary
FROM golang:1.21-alpine AS builder

WORKDIR /app

# Copy go mod files
COPY go.mod go.sum ./

# Download dependencies
RUN go mod download

# Copy source code
COPY . .

# Build the binary
# CGO_ENABLED=0 creates a static binary (no external dependencies)
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .

# Stage 2: Create minimal runtime image
FROM alpine:latest

# Add ca-certificates for HTTPS calls
RUN apk --no-cache add ca-certificates

WORKDIR /root/

# Copy the binary from builder
COPY --from=builder /app/main .

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget --quiet --tries=1 --spider http://localhost:8080/health || exit 1

EXPOSE 8080

CMD ["./main"]

Why this approach?

Builder stage: 400MB
Final stage: 15MB (just Alpine + binary)
Static binary = no runtime dependencies
Faster startup, smaller attack surface

Todos API Dockerfile (Node.js)

FROM node:18-alpine

WORKDIR /app

# Install dependencies first (better caching)
COPY package*.json ./
RUN npm ci --only=production

# Copy application code
COPY . .

# Create non-root user for security
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001 && \
    chown -R nodejs:nodejs /app

USER nodejs

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD node healthcheck.js || exit 1

EXPOSE 3000

CMD ["node", "server.js"]

Security note: Running as non-root user limits damage if container is compromised.

Users API Dockerfile (Java Spring Boot)

# Multi-stage build for Java

# Stage 1: Build with Maven
FROM maven:3.9-eclipse-temurin-17 AS builder

WORKDIR /app

# Copy pom.xml first (dependency caching)
COPY pom.xml ./
RUN mvn dependency:go-offline

# Copy source and build
COPY src ./src
RUN mvn clean package -DskipTests

# Stage 2: Runtime
FROM eclipse-temurin:17-jre-alpine

WORKDIR /app

# Copy JAR from builder
COPY --from=builder /app/target/*.jar app.jar

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget --quiet --tries=1 --spider http://localhost:8080/actuator/health || exit 1

EXPOSE 8080

# Use exec form to ensure proper signal handling
ENTRYPOINT ["java", "-jar", "/app/app.jar"]

Java-specific optimizations:

# Production optimization flags
ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-XX:+ExitOnOutOfMemoryError", \
  "-jar", "/app/app.jar"]

Log Processor Dockerfile (Python)

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Create non-root user
RUN useradd -m -u 1001 processor && \
    chown -R processor:processor /app

USER processor

# Health check (check if process is running)
HEALTHCHECK --interval=30s --timeout=3s \
  CMD ps aux | grep processor.py || exit 1

CMD ["python", "processor.py"]

Docker Compose - Orchestrating Everything

Now let's tie it all together with docker-compose.yml:

version: '3.8'

services:
  # Traefik reverse proxy
  traefik:
    image: traefik:v2.10
    container_name: traefik
    command:
      # API and dashboard
      - "--api.dashboard=true"
      - "--api.insecure=true"

      # Docker provider
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"

      # Entrypoints
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"

      # HTTP to HTTPS redirect
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"

      # Let's Encrypt
      - "--certificatesresolvers.letsencrypt.acme.tlschallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.email=${ACME_EMAIL}"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"  # Dashboard
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "./letsencrypt:/letsencrypt"
    networks:
      - web
    restart: unless-stopped

  # Frontend
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
    container_name: frontend
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.frontend.rule=Host(`${DOMAIN}`)"
      - "traefik.http.routers.frontend.entrypoints=websecure"
      - "traefik.http.routers.frontend.tls.certresolver=letsencrypt"
      - "traefik.http.services.frontend.loadbalancer.server.port=80"
    networks:
      - web
    restart: unless-stopped

  # Auth API
  auth:
    build:
      context: ./auth-api
      dockerfile: Dockerfile
    container_name: auth-api
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=${DB_NAME}
      - DB_USER=${DB_USER}
      - DB_PASSWORD=${DB_PASSWORD}
      - JWT_SECRET=${JWT_SECRET}
      - REDIS_URL=redis://redis:6379
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.auth.rule=Host(`${DOMAIN}`) && PathPrefix(`/api/auth`)"
      - "traefik.http.routers.auth.entrypoints=websecure"
      - "traefik.http.routers.auth.tls.certresolver=letsencrypt"
      - "traefik.http.services.auth.loadbalancer.server.port=8080"
    depends_on:
      - postgres
      - redis
    networks:
      - web
      - backend
    restart: unless-stopped

  # Todos API
  todos:
    build:
      context: ./todos-api
      dockerfile: Dockerfile
    container_name: todos-api
    environment:
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=${DB_NAME}
      - DB_USER=${DB_USER}
      - DB_PASSWORD=${DB_PASSWORD}
      - REDIS_URL=redis://redis:6379
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.todos.rule=Host(`${DOMAIN}`) && PathPrefix(`/api/todos`)"
      - "traefik.http.routers.todos.entrypoints=websecure"
      - "traefik.http.routers.todos.tls.certresolver=letsencrypt"
      - "traefik.http.services.todos.loadbalancer.server.port=3000"
    depends_on:
      - postgres
      - redis
    networks:
      - web
      - backend
    restart: unless-stopped

  # Users API
  users:
    build:
      context: ./users-api
      dockerfile: Dockerfile
    container_name: users-api
    environment:
      - SPRING_DATASOURCE_URL=jdbc:postgresql://postgres:5432/${DB_NAME}
      - SPRING_DATASOURCE_USERNAME=${DB_USER}
      - SPRING_DATASOURCE_PASSWORD=${DB_PASSWORD}
      - SPRING_REDIS_HOST=redis
      - SPRING_REDIS_PORT=6379
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.users.rule=Host(`${DOMAIN}`) && PathPrefix(`/api/users`)"
      - "traefik.http.routers.users.entrypoints=websecure"
      - "traefik.http.routers.users.tls.certresolver=letsencrypt"
      - "traefik.http.services.users.loadbalancer.server.port=8080"
    depends_on:
      - postgres
      - redis
    networks:
      - web
      - backend
    restart: unless-stopped

  # Log Processor
  log-processor:
    build:
      context: ./log-processor
      dockerfile: Dockerfile
    container_name: log-processor
    environment:
      - REDIS_URL=redis://redis:6379
      - LOG_PATH=/logs
    volumes:
      - ./logs:/logs
    depends_on:
      - redis
    networks:
      - backend
    restart: unless-stopped

  # Redis
  redis:
    image: redis:7-alpine
    container_name: redis
    command: redis-server --appendonly yes
    volumes:
      - redis-data:/data
    networks:
      - backend
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3

  # PostgreSQL
  postgres:
    image: postgres:15-alpine
    container_name: postgres
    environment:
      - POSTGRES_DB=${DB_NAME}
      - POSTGRES_USER=${DB_USER}
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - backend
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
      interval: 10s
      timeout: 3s
      retries: 3

networks:
  web:
    driver: bridge
  backend:
    driver: bridge

volumes:
  postgres-data:
  redis-data:

Key concepts in this compose file:

1. Networks:

networks:
  web:      # Public-facing services
  backend:  # Internal services only

Frontend, APIs → web network (accessible via Traefik)
Database, Redis → backend network only (isolated)
This provides network-level security

2. Traefik Labels:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.auth.rule=Host(`${DOMAIN}`) && PathPrefix(`/api/auth`)"
  - "traefik.http.routers.auth.tls.certresolver=letsencrypt"

These labels tell Traefik how to route traffic:

Route requests to yourdomain.com/api/auth → auth service
Automatically get SSL certificate from Let's Encrypt
Handle HTTPS termination

3. Environment Variables:

environment:
  - DB_HOST=postgres
  - JWT_SECRET=${JWT_SECRET}

Secrets come from .env file (never committed to git!).

Environment Configuration

Create .env file:

# Domain configuration
DOMAIN=your-domain.com
ACME_EMAIL=your-email@example.com

# Database
DB_NAME=todoapp
DB_USER=todouser
DB_PASSWORD=change-this-strong-password

# Security
JWT_SECRET=change-this-to-random-string-min-32-chars

# Optional: Docker registry
DOCKER_REGISTRY=ghcr.io/yourusername

Security checklist for .env:

[ ] Never commit .env to git
[ ] Add .env to .gitignore
[ ] Use strong passwords (20+ characters)
[ ] Use different passwords for each service
[ ] Rotate secrets regularly

Phase 2: Infrastructure as Code with Terraform

Now let's provision the cloud infrastructure automatically.

Project Structure

infra/
├── terraform/
│   ├── main.tf              # Main configuration
│   ├── variables.tf         # Input variables
│   ├── outputs.tf           # Output values
│   ├── provider.tf          # Provider configuration
│   └── backend.tf           # Remote state configuration
├── ansible/
│   ├── inventory/           # Dynamic inventory
│   ├── roles/
│   │   ├── dependencies/    # Install Docker, etc.
│   │   └── deploy/          # Deploy application
│   ├── playbook.yml         # Main playbook
│   └── ansible.cfg          # Ansible configuration
└── scripts/
    ├── deploy.sh            # Deployment orchestration
    └── drift-check.sh       # Drift detection

Terraform Configuration

provider.tf:

# Provider configuration
terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }

    local = {
      source  = "hashicorp/local"
      version = "~> 2.0"
    }

    null = {
      source  = "hashicorp/null"
      version = "~> 3.0"
    }
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = "todo-app"
      Environment = var.environment
      ManagedBy   = "terraform"
    }
  }
}

backend.tf:

# Remote state storage - crucial for team collaboration
terraform {
  backend "s3" {
    bucket         = "your-terraform-state-bucket"
    key            = "todo-app/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}

Why remote state?

Team collaboration - everyone sees same state
State locking - prevents concurrent modifications
Backup - state is backed up in S3
Encryption - sensitive data encrypted at rest

variables.tf:

variable "aws_region" {
  description = "AWS region to deploy resources"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "instance_type" {
  description = "EC2 instance type"
  type        = string
  default     = "t3.medium"
}

variable "ssh_public_key" {
  description = "SSH public key for access"
  type        = string
}

variable "domain_name" {
  description = "Domain name for the application"
  type        = string
}

variable "alert_email" {
  description = "Email for drift detection alerts"
  type        = string
}

variable "app_port" {
  description = "Application port"
  type        = number
  default     = 80
}

main.tf:

# VPC for network isolation
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "todo-app-vpc"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "todo-app-igw"
  }
}

# Public Subnet
resource "aws_subnet" "public" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "${var.aws_region}a"
  map_public_ip_on_launch = true

  tags = {
    Name = "todo-app-public-subnet"
  }
}

# Route Table
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "todo-app-public-rt"
  }
}

# Route Table Association
resource "aws_route_table_association" "public" {
  subnet_id      = aws_subnet.public.id
  route_table_id = aws_route_table.public.id
}

# Security Group
resource "aws_security_group" "app" {
  name        = "todo-app-sg"
  description = "Security group for TODO application"
  vpc_id      = aws_vpc.main.id

  # SSH
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "SSH access"
  }

  # HTTP
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTP access"
  }

  # HTTPS
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "HTTPS access"
  }

  # Outbound - allow all
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound"
  }

  tags = {
    Name = "todo-app-sg"
  }
}

# SSH Key Pair
resource "aws_key_pair" "deployer" {
  key_name   = "todo-app-deployer"
  public_key = var.ssh_public_key
}

# Latest Ubuntu AMI
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]  # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

# EC2 Instance
resource "aws_instance" "app" {
  ami                    = data.aws_ami.ubuntu.id
  instance_type          = var.instance_type
  key_name               = aws_key_pair.deployer.key_name
  subnet_id              = aws_subnet.public.id
  vpc_security_group_ids = [aws_security_group.app.id]

  root_block_device {
    volume_size = 30
    volume_type = "gp3"
    encrypted   = true
  }

  user_data = <<-EOF
              #!/bin/bash
              apt-get update
              apt-get install -y python3 python3-pip
              EOF

  tags = {
    Name = "todo-app-server"
  }

  # Lifecycle rule for idempotency
  lifecycle {
    ignore_changes = [
      user_data,  # Don't recreate if user_data changes
      ami,        # Don't recreate on AMI updates unless forced
    ]
  }
}

# Elastic IP for stable public IP
resource "aws_eip" "app" {
  instance = aws_instance.app.id
  domain   = "vpc"

  tags = {
    Name = "todo-app-eip"
  }
}

# Generate Ansible inventory
resource "local_file" "ansible_inventory" {
  content = templatefile("${path.module}/templates/inventory.tpl", {
    app_server_ip = aws_eip.app.public_ip
    ssh_key_path  = "~/.ssh/id_rsa"
    ssh_user      = "ubuntu"
  })

  filename = "${path.module}/../ansible/inventory/hosts"

  # Only regenerate if values change
  lifecycle {
    create_before_destroy = true
  }
}

# Trigger Ansible after provisioning
resource "null_resource" "ansible_provisioner" {
  # Run when instance changes
  triggers = {
    instance_id = aws_instance.app.id
    timestamp   = timestamp()
  }

  # Wait for instance to be ready
  provisioner "local-exec" {
    command = <<-EOT
      echo "Waiting for SSH to be ready..."
      until ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 ubuntu@${aws_eip.app.public_ip} echo "SSH Ready"; do
        sleep 5
      done

      echo "Running Ansible playbook..."
      cd ${path.module}/../ansible
      ansible-playbook -i inventory/hosts playbook.yml
    EOT
  }

  depends_on = [
    local_file.ansible_inventory,
    aws_eip.app
  ]
}

templates/inventory.tpl:

[app_servers]
todo-app ansible_host=${app_server_ip} ansible_user=${ssh_user} ansible_ssh_private_key_file=${ssh_key_path}

[app_servers:vars]
ansible_python_interpreter=/usr/bin/python3

outputs.tf:

output "instance_public_ip" {
  description = "Public IP of the application server"
  value       = aws_eip.app.public_ip
}

output "instance_id" {
  description = "ID of the EC2 instance"
  value       = aws_instance.app.id
}

output "domain_name" {
  description = "Domain name for the application"
  value       = var.domain_name
}

output "ssh_command" {
  description = "SSH command to connect to the server"
  value       = "ssh ubuntu@${aws_eip.app.public_ip}"
}

Understanding Terraform Idempotency

What is idempotency?
Running the same Terraform code multiple times produces the same result without creating duplicates.

Example - Non-idempotent (bad):

resource "aws_instance" "app" {
  ami           = "ami-12345"
  instance_type = "t3.medium"

  # This causes recreation on every apply!
  tags = {
    Timestamp = timestamp()
  }
}

Idempotent (good):

resource "aws_instance" "app" {
  ami           = "ami-12345"
  instance_type = "t3.medium"

  tags = {
    Name = "todo-app-server"
  }

  lifecycle {
    ignore_changes = [
      tags["Timestamp"],
      user_data
    ]
  }
}

Drift Detection

What is drift?
Drift occurs when actual infrastructure differs from Terraform state (manual changes, external tools, etc.).

drift-check.sh:

#!/bin/bash
set -e

echo "Checking for infrastructure drift..."

# Run terraform plan and capture output
PLAN_OUTPUT=$(terraform plan -detailed-exitcode -no-color 2>&1) || EXIT_CODE=$?

# Exit codes:
# 0 = no changes
# 1 = error
# 2 = changes detected (drift!)

if [ $EXIT_CODE -eq 0 ]; then
    echo "✅ No drift detected - infrastructure matches desired state"
    exit 0

elif [ $EXIT_CODE -eq 2 ]; then
    echo "⚠️  DRIFT DETECTED - infrastructure has changed!"
    echo ""
    echo "$PLAN_OUTPUT"
    echo ""

    # Send email alert
    ./send-drift-alert.sh "$PLAN_OUTPUT"

    # In CI/CD, pause for manual approval
    if [ "$CI" = "true" ]; then
        echo "Pausing for manual approval..."
        # GitHub Actions, GitLab CI, etc. have approval mechanisms
        exit 2
    fi

else
    echo "❌ Error running terraform plan"
    echo "$PLAN_OUTPUT"
    exit 1
fi

send-drift-alert.sh:

#!/bin/bash

DRIFT_DETAILS="$1"
ALERT_EMAIL="${ALERT_EMAIL:-admin@example.com}"

# Using AWS SES
aws ses send-email \
  --from "terraform@example.com" \
  --to "$ALERT_EMAIL" \
  --subject "⚠️ Terraform Drift Detected" \
  --text "$DRIFT_DETAILS"

# Or using curl with Mailgun, SendGrid, etc.
curl -s --user "api:$MAILGUN_API_KEY" \
  https://api.mailgun.net/v3/$MAILGUN_DOMAIN/messages \
  -F from="terraform@example.com" \
  -F to="$ALERT_EMAIL" \
  -F subject="⚠️ Terraform Drift Detected" \
  -F text="$DRIFT_DETAILS"

Phase 3: Configuration Management with Ansible

Terraform provisions infrastructure, Ansible configures it.

Ansible Project Structure

ansible/
├── inventory/
│   └── hosts                 # Generated by Terraform
├── roles/
│   ├── dependencies/
│   │   ├── tasks/
│   │   │   └── main.yml
│   │   └── handlers/
│   │       └── main.yml
│   └── deploy/
│       ├── tasks/
│       │   └── main.yml
│       ├── templates/
│       │   └── .env.j2
│       └── handlers/
│           └── main.yml
├── playbook.yml
└── ansible.cfg

ansible.cfg

[defaults]
inventory = inventory/hosts
remote_user = ubuntu
private_key_file = ~/.ssh/id_rsa
host_key_checking = False
retry_files_enabled = False

# Faster execution
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 3600

# Better output
stdout_callback = yaml
bin_ansible_callbacks = True

[ssh_connection]
pipelining = True

roles/dependencies/tasks/main.yml

---
# Install required system dependencies

- name: Update apt cache
  apt:
    update_cache: yes
    cache_valid_time: 3600
  become: yes

- name: Install required packages
  apt:
    name:
      - apt-transport-https
      - ca-certificates
      - curl
      - gnupg
      - lsb-release
      - python3-pip
      - git
      - ufw
    state: present
  become: yes

- name: Add Docker GPG key
  apt_key:
    url: https://download.docker.com/linux/ubuntu/gpg
    state: present
  become: yes

- name: Add Docker repository
  apt_repository:
    repo: "deb [arch=amd64] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
    state: present
  become: yes

- name: Install Docker
  apt:
    name:
      - docker-ce
      - docker-ce-cli
      - containerd.io
      - docker-buildx-plugin
      - docker-compose-plugin
    state: present
  become: yes
  notify: Restart Docker

- name: Add user to docker group
  user:
    name: "{{ ansible_user }}"
    groups: docker
    append: yes
  become: yes

- name: Install Docker Compose (standalone)
  get_url:
    url: "https://github.com/docker/compose/releases/download/v2.23.0/docker-compose-linux-x86_64"
    dest: /usr/local/bin/docker-compose
    mode: '0755'
  become: yes

- name: Configure UFW firewall
  ufw:
    rule: "{{ item.rule }}"
    port: "{{ item.port }}"
    proto: "{{ item.proto }}"
  loop:
    - { rule: 'allow', port: '22', proto: 'tcp' }
    - { rule: 'allow', port: '80', proto: 'tcp' }
    - { rule: 'allow', port: '443', proto: 'tcp' }
  become: yes

- name: Enable UFW
  ufw:
    state: enabled
  become: yes

roles/dependencies/handlers/main.yml

---
- name: Restart Docker
  systemd:
    name: docker
    state: restarted
    enabled: yes
  become: yes

roles/deploy/tasks/main.yml

---
# Deploy the application

- name: Create application directory
  file:
    path: /opt/todo-app
    state: directory
    owner: "{{ ansible_user }}"
    group: "{{ ansible_user }}"
    mode: '0755'
  become: yes

- name: Clone application repository
  git:
    repo: "{{ app_repo_url }}"
    dest: /opt/todo-app
    version: "{{ app_branch | default('main') }}"
    force: yes
  register: git_clone

- name: Create environment file from template
  template:
    src: .env.j2
    dest: /opt/todo-app/.env
    owner: "{{ ansible_user }}"
    mode: '0600'
  no_log: yes  # Don't log sensitive env vars

- name: Create letsencrypt directory
  file:
    path: /opt/todo-app/letsencrypt
    state: directory
    mode: '0755'

- name: Pull latest Docker images
  community.docker.docker_compose:
    project_src: /opt/todo-app
    pull: yes
  when: git_clone.changed

- name: Start application with Docker Compose
  community.docker.docker_compose:
    project_src: /opt/todo-app
    state: present
    restarted: "{{ git_clone.changed }}"
  register: compose_output

- name: Wait for application to be healthy
  uri:
    url: "https://{{ domain_name }}/health"
    status_code: 200
    validate_certs: no
  retries: 10
  delay: 10
  register: health_check
  until: health_check.status == 200

- name: Display deployment status
  debug:
    msg: "Application deployed successfully at https://{{ domain_name }}"

roles/deploy/templates/.env.j2

# Auto-generated by Ansible - DO NOT EDIT MANUALLY

# Domain configuration
DOMAIN={{ domain_name }}
ACME_EMAIL={{ acme_email }}

# Database
DB_NAME={{ db_name }}
DB_USER={{ db_user }}
DB_PASSWORD={{ db_password }}

# Security
JWT_SECRET={{ jwt_secret }}

# Application
NODE_ENV=production
LOG_LEVEL=info

playbook.yml

---
- name: Deploy TODO Application
  hosts: app_servers
  become: no

  vars:
    app_repo_url: "https://github.com/yourusername/todo-app.git"
    app_branch: "main"
    domain_name: "{{ lookup('env', 'DOMAIN') }}"
    acme_email: "{{ lookup('env', 'ACME_EMAIL') }}"
    db_name: "{{ lookup('env', 'DB_NAME') }}"
    db_user: "{{ lookup('env', 'DB_USER') }}"
    db_password: "{{ lookup('env', 'DB_PASSWORD') }}"
    jwt_secret: "{{ lookup('env', 'JWT_SECRET') }}"

  roles:
    - dependencies
    - deploy

  post_tasks:
    - name: Verify deployment
      uri:
        url: "https://{{ domain_name }}"
        status_code: 200
        validate_certs: yes
      delegate_to: localhost

    - name: Display application URL
      debug:
        msg: "Application is live at https://{{ domain_name }}"

Phase 4: CI/CD Pipeline

Now let's automate everything with GitHub Actions.

.github/workflows/infrastructure.yml

name: Infrastructure Deployment

on:
  push:
    branches: [main]
    paths:
      - 'infra/terraform/**'
      - 'infra/ansible/**'
      - '.github/workflows/infrastructure.yml'
  workflow_dispatch:  # Manual trigger

env:
  TF_VERSION: '1.6.0'
  AWS_REGION: 'us-east-1'

jobs:
  terraform-plan:
    name: Terraform Plan & Drift Detection
    runs-on: ubuntu-latest
    outputs:
      has_changes: ${{ steps.plan.outputs.has_changes }}

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Terraform Init
        run: |
          cd infra/terraform
          terraform init

      - name: Terraform Plan
        id: plan
        run: |
          cd infra/terraform
          terraform plan -detailed-exitcode -out=tfplan || EXIT_CODE=$?

          if [ $EXIT_CODE -eq 0 ]; then
            echo "has_changes=false" >> $GITHUB_OUTPUT
            echo "✅ No infrastructure changes detected"
          elif [ $EXIT_CODE -eq 2 ]; then
            echo "has_changes=true" >> $GITHUB_OUTPUT
            echo "⚠️  Infrastructure drift detected!"
          else
            echo "❌ Terraform plan failed"
            exit 1
          fi

      - name: Save plan
        if: steps.plan.outputs.has_changes == 'true'
        uses: actions/upload-artifact@v3
        with:
          name: tfplan
          path: infra/terraform/tfplan

      - name: Send drift alert email
        if: steps.plan.outputs.has_changes == 'true'
        uses: dawidd6/action-send-mail@v3
        with:
          server_address: smtp.gmail.com
          server_port: 465
          username: ${{ secrets.MAIL_USERNAME }}
          password: ${{ secrets.MAIL_PASSWORD }}
          subject: ⚠️ Terraform Drift Detected - TODO App
          to: ${{ secrets.ALERT_EMAIL }}
          from: Terraform CI/CD
          body: |
            Infrastructure drift has been detected!

            Review the changes and approve the workflow to apply:
            ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}

  terraform-apply:
    name: Terraform Apply
    runs-on: ubuntu-latest
    needs: terraform-plan
    if: needs.terraform-plan.outputs.has_changes == 'true'
    environment: production  # Requires manual approval

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Download plan
        uses: actions/download-artifact@v3
        with:
          name: tfplan
          path: infra/terraform/

      - name: Terraform Init
        run: |
          cd infra/terraform
          terraform init

      - name: Terraform Apply
        run: |
          cd infra/terraform
          terraform apply tfplan

      - name: Save outputs
        run: |
          cd infra/terraform
          terraform output -json > outputs.json

      - name: Upload outputs
        uses: actions/upload-artifact@v3
        with:
          name: terraform-outputs
          path: infra/terraform/outputs.json

  ansible-deploy:
    name: Ansible Deployment
    runs-on: ubuntu-latest
    needs: terraform-apply
    if: always() && (needs.terraform-apply.result == 'success' || needs.terraform-plan.outputs.has_changes == 'false')

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install Ansible
        run: |
          pip install ansible

      - name: Setup SSH key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa
          chmod 600 ~/.ssh/id_rsa
          ssh-keyscan -H ${{ secrets.SERVER_IP }} >> ~/.ssh/known_hosts

      - name: Run Ansible playbook
        env:
          DOMAIN: ${{ secrets.DOMAIN }}
          ACME_EMAIL: ${{ secrets.ACME_EMAIL }}
          DB_NAME: ${{ secrets.DB_NAME }}
          DB_USER: ${{ secrets.DB_USER }}
          DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
          JWT_SECRET: ${{ secrets.JWT_SECRET }}
        run: |
          cd infra/ansible
          ansible-playbook -i inventory/hosts playbook.yml

      - name: Verify deployment
        run: |
          sleep 30  # Wait for services to stabilize
          curl -f https://${{ secrets.DOMAIN }}/health || exit 1
          echo "✅ Deployment verified!"

.github/workflows/application.yml

name: Application Deployment

on:
  push:
    branches: [main]
    paths:
      - 'frontend/**'
      - 'auth-api/**'
      - 'todos-api/**'
      - 'users-api/**'
      - 'log-processor/**'
      - 'docker-compose.yml'
  workflow_dispatch:

jobs:
  deploy:
    name: Deploy Application
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Setup SSH key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa
          chmod 600 ~/.ssh/id_rsa
          ssh-keyscan -H ${{ secrets.SERVER_IP }} >> ~/.ssh/known_hosts

      - name: Deploy to server
        run: |
          ssh ubuntu@${{ secrets.SERVER_IP }} << 'EOF'
            cd /opt/todo-app
            git pull origin main
            docker-compose pull
            docker-compose up -d --build
          EOF

      - name: Wait for deployment
        run: sleep 30

      - name: Health check
        run: |
          curl -f https://${{ secrets.DOMAIN }}/health || exit 1
          echo "✅ Application deployed successfully!"

Understanding the CI/CD Flow

Infrastructure changes (Terraform/Ansible):

1. Push to main
   ↓
2. Run terraform plan
   ↓
3. Detect drift? → Send email
   ↓
4. Pause for manual approval (GitHub Environment protection)
   ↓
5. Apply changes
   ↓
6. Run Ansible
   ↓
7. Verify deployment

Application changes:

1. Push to main
   ↓
2. SSH to server
   ↓
3. Git pull
   ↓
4. docker-compose pull
   ↓
5. docker-compose up
   ↓
6. Health check

Testing the Complete Setup

Local Testing

1. Test containers locally:

# Start everything
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f

# Test frontend
curl http://localhost

# Test APIs
curl http://localhost/api/auth/health
curl http://localhost/api/todos/health
curl http://localhost/api/users/health

# Stop everything
docker-compose down

2. Test Terraform:

cd infra/terraform

# Initialize
terraform init

# Validate
terraform validate

# Plan (dry run)
terraform plan

# Apply (create infrastructure)
terraform apply

# Show outputs
terraform output

# Destroy (cleanup)
terraform destroy

3. Test Ansible:

cd infra/ansible

# Test connection
ansible all -m ping

# Check syntax
ansible-playbook playbook.yml --syntax-check

# Dry run
ansible-playbook playbook.yml --check

# Run for real
ansible-playbook playbook.yml

# Run specific role
ansible-playbook playbook.yml --tags deploy

Production Deployment

Complete deployment from scratch:

# 1. Clone the repository
git clone https://github.com/yourusername/todo-app.git
cd todo-app

# 2. Configure secrets
cp .env.example .env
# Edit .env with your values

# 3. Initialize Terraform
cd infra/terraform
terraform init

# 4. Create infrastructure
terraform plan
terraform apply

# Wait for Ansible to complete (triggered automatically)

# 5. Configure DNS
# Point your domain to the Elastic IP shown in terraform outputs

# 6. Verify deployment
curl https://your-domain.com

Expected result:

Login page loads at https://your-domain.com
HTTPS works (automatic certificate from Let's Encrypt)
APIs respond at /api/auth, /api/todos, /api/users

Troubleshooting

Issue: Terraform fails with "state locked"

# Check lock info
terraform force-unlock <LOCK_ID>

# Or wait for other operation to complete

Issue: Ansible can't connect to server

# Test SSH manually
ssh -i ~/.ssh/id_rsa ubuntu@<SERVER_IP>

# Check inventory
ansible-inventory --list -i inventory/hosts

# Verbose output
ansible-playbook playbook.yml -vvv

Issue: Containers won't start

# Check logs
docker-compose logs <service-name>

# Check disk space
df -h

# Check memory
free -h

# Restart specific service
docker-compose restart <service-name>

Issue: HTTPS not working

# Check Traefik logs
docker logs traefik

# Verify DNS points to server
dig your-domain.com

# Check certificate
docker exec traefik cat /letsencrypt/acme.json

# Force certificate renewal
docker-compose down
rm -rf letsencrypt/acme.json
docker-compose up -d

DEV Community