Suvrajeet Banerjee

Posted on Dec 31, 2025

🚀 From Chaos to Orchestration: Mastering Azure DevOps CI/CD Pipelines [Week-9] ⚙️

#azure #devops #automation #infrastructureascode

📌 Introduction: Why This Week Changed Everything

Picture this: You're pushing code to GitHub. Minutes later, it's running in production. No manual SSH sessions. No "it works on my machine" moments. No 2 AM deployments gone wrong. That's not fantasy—that's Azure DevOps CI/CD Pipelines in action.

This week, I went from basic pipeline awareness to orchestrating three interconnected projects that took my DevOps journey from theoretical to battle-tested production-grade. We're talking infrastructure provisioning, configuration management, multi-stage automation, real error debugging, and end-to-end deployment workflows. This isn't "hello world" territory anymore.

Let me walk you through what I discovered, the walls I hit, and how I demolished them.

🎯 Understanding the Week 9 Journey: Three Projects, One Cohesive Story

Before diving into the deep end, let's map the landscape. Week 9 consists of three projects that build on each other like a DevOps Jenga tower:

Project 1: React App CI/CD Pipeline to Nginx (foundational multi-stage automation)
Project 2: Book Review App on AWS Infrastructure (full-stack IaC + configuration management)
Project 3: Advanced Troubleshooting & Production Hardening (real-world debugging patterns)

The beauty? They're not isolated. Each project adds a layer. Project 1 teaches you pipeline mechanics. Project 2 teaches you infrastructure-application orchestration. Project 3 teaches you how to survive when things break—which is 80% of DevOps.

📚 Part 1: The Foundation – What is Azure DevOps, Really?

🤔 Question I Asked Myself: "Why Do We Need Azure DevOps When GitHub Exists?"

Good question. Here's the honest answer:

GitHub is version control. Azure DevOps is the entire delivery orchestration platform. It's like comparing a calendar to a full project management system.

What Azure DevOps Actually Solves

Imagine deploying code manually:

Developer writes code
  ↓
Developer emails "deploy please"
  ↓
DevOps engineer SSHs into server
  ↓
DevOps engineer runs scripts
  ↓
DevOps engineer tests manually
  ↓
App is live (hopefully)

Azure DevOps eliminates this:

Developer pushes to main branch
  ↓
Azure Pipeline triggers automatically
  ↓ Build stage (compile, package)
  ↓ Test stage (run unit tests)
  ↓ Publish stage (create artifact)
  ↓ Deploy stage (push to production)
  ↓ App is live (verified by tests)

No human intervention. No email chains. No 2 AM scrambles. Just velocity + reliability.

Core Components You MUST Understand

🔷 Pipelines: Automated workflows that execute when you push code
🔷 Stages: Logical groupings (Build → Test → Deploy)
🔷 Jobs: Collections of tasks within a stage
🔷 Tasks: Individual actions (compile, test, copy files)
🔷 Artifacts: Build outputs (compiled binaries, Docker images, static files)
🔷 Service Connections: Secure authentication to external systems (AWS, GitHub, SSH servers)

🏗️ Part 2: Project 1 Assignment – The Gateway Drug to CI/CD

The Challenge: Deploy a React App via Multi-Stage Pipeline

What was tasked:
Build a 4-stage Azure DevOps pipeline that:

Builds a React app (npm install + npm run build)
Tests the app (npm test)
Publishes build artifacts
Deploys to an Nginx server via SSH

Why this matters: This is the canonical CI/CD pattern. Every enterprise uses this exact structure.

🔴 The Errors That Made Me Question Everything

Error #1: YAML Syntax – The `AND` vs `and()` Confusion

I wrote this and Azure DevOps exploded:

condition: succeeded('Build') AND succeeded('Test')

Error message:

Unexpected symbol AND. Located at position 20 within expression succeeded('Build') AND succeeded('Test')

What I didn't know: Azure DevOps uses function syntax, not C-style operators.

The fix:

condition: and(succeeded('Build'), succeeded('Test'))

Learning: This one error taught me that DevOps is very particular about syntax. One typo, one wrong operator, and the entire pipeline fails. There's no fuzzy matching.

Error #2: Permission Denied on Nginx Root (`/var/www/html`)

After the pipeline deployed files, the app returned 403 Forbidden. Why?

Because permissions were broken.

The flow was:

Ubuntu user copies files (has write permission)
Nginx runs as www-data user (doesn't have read permission)
Result: Nginx can't read files → 403 error

The three-step permission fix I discovered:

# BEFORE copying files
sudo chown -R ubuntu:ubuntu /var/www/html
sudo chmod 755 /var/www/html

# Copy files via Azure Pipeline

# AFTER copying files
sudo chown -R www-data:www-data /var/www/html
sudo chmod -R 755 /var/www/html
sudo chmod -R 644 /var/www/html/*

Why this matters: This permission handoff pattern is everywhere in DevOps. Different users, different roles, different permissions. You must think in layers.

✅ The Solution: Multi-Stage YAML Pipeline

Here's the working 4-stage pipeline:

trigger:
  branches:
    include:
      - main

variables:
  sshEndpoint: 'ubuntu-nginx-ssh'
  artifactName: 'react_build'
  webRoot: '/var/www/html'

stages:

# Stage 1: Build
- stage: Build
  displayName: '🔨 Build React App'
  jobs:
    - job: BuildJob
      pool:
        vmImage: 'SelfHostedPool'
      steps:
        - checkout: self
        - task: NodeTool@0
          inputs:
            versionSpec: '18.x'
        - script: npm install
          displayName: 'Install Dependencies'
        - script: npm run build
          displayName: 'Build React App'
        - publish: $(Build.SourcesDirectory)/build
          artifact: $(artifactName)

# Stage 2: Test
- stage: Test
  displayName: '✅ Test React App'
  dependsOn: Build
  condition: succeeded('Build')
  jobs:
    - job: TestJob
      pool:
        vmImage: 'SelfHostedPool'
      steps:
        - checkout: self
        - task: NodeTool@0
          inputs:
            versionSpec: '18.x'
        - script: npm install
        - script: npm test -- --watchAll=false
          displayName: 'Run Tests'

# Stage 3: Publish
- stage: Publish
  displayName: '📦 Publish Artifact'
  dependsOn: Test
  condition: succeeded('Test')
  jobs:
    - job: PublishJob
      pool:
        vmImage: 'SelfHostedPool'
      steps:
        - download: current
          artifact: $(artifactName)
        - script: echo "Artifact published and ready for deployment"

# Stage 4: Deploy
- stage: Deploy
  displayName: '🚀 Deploy to Nginx'
  dependsOn: 
    - Build
    - Test
  condition: and(succeeded('Build'), succeeded('Test'))
  jobs:
    - job: DeployJob
      pool:
        vmImage: 'SelfHostedPool'
      steps:
        - download: current
          artifact: $(artifactName)

        # FIX 1: Set permissions for upload
        - task: SSH@0
          displayName: 'Set Write Permissions'
          inputs:
            sshEndpoint: $(sshEndpoint)
            runOptions: 'inline'
            inline: |
              sudo chown -R ubuntu:ubuntu $(webRoot)
              sudo chmod 755 $(webRoot)

        # Copy files
        - task: CopyFilesOverSSH@0
          displayName: 'Copy React Build'
          inputs:
            sshEndpoint: $(sshEndpoint)
            sourceFolder: '$(Pipeline.Workspace)/$(artifactName)'
            contents: '**'
            targetFolder: $(webRoot)
            overwrite: true

        # FIX 2: Set permissions for Nginx
        - task: SSH@0
          displayName: 'Set Nginx Permissions'
          inputs:
            sshEndpoint: $(sshEndpoint)
            runOptions: 'inline'
            inline: |
              sudo chown -R www-data:www-data $(webRoot)
              sudo chmod -R 755 $(webRoot)
              sudo chmod -R 644 $(webRoot)/*

        # Restart Nginx
        - task: SSH@0
          displayName: 'Restart Nginx'
          inputs:
            sshEndpoint: $(sshEndpoint)
            runOptions: 'inline'
            inline: |
              sudo systemctl restart nginx
              sudo systemctl status nginx

Key insight: Each stage depends on the previous one. If Build fails, Test never runs. This is atomic safety—you can't deploy broken code.

🏛️ Part 3: Project 2 – Infrastructure as Code + Application Deployment

The Challenge: Full 3-Tier Architecture on AWS

What was tasked:
Deploy a Book Review Application using:

Terraform to provision infrastructure (VPC, EC2 instances, RDS database)
Ansible to configure servers (install Nginx, Flask, MySQL)
Azure DevOps to orchestrate both

This is where things got seriously complex.

🤔 Question: "Why Separate Infrastructure and Application?"

In enterprise environments, infrastructure and application teams have different responsibilities:

Infrastructure Team (mine for this project):
- Provisions cloud resources (VMs, databases, networking)
- Manages security groups, firewalls
- Ensures infrastructure stability
- Changes infrequently (quarterly updates)
Application Team:
- Writes code, runs tests
- Deploys applications
- Changes frequently (multiple times per day)

Separating them means:
✅ One team doesn't block the other
✅ Each team owns their concerns
✅ Easier rollback if something breaks
✅ Clear accountability

🔴 The Infrastructure Challenges

Challenge #1: Database Version Mismatch Hell

I started with RDS (Relational Database Service) thinking it was the "easy" option.

Error:

Error: Creating DB instance: DBParameterGroupNotFound
Cannot find MySQL version 8.0.28 in region ap-south-1

Why: Azure region compatibility is finicky. Not all database versions are available in all regions.

The pivot: Instead of RDS, I deployed MySQL on an EC2 instance. Same database, no version drama. Plus, it's free-tier compatible.

Lesson learned: Sometimes the "managed" solution (RDS) creates more problems than a simple "unmanaged" solution (MySQL on EC2). Know when to keep it simple.

Challenge #2: Security Group Configuration

Three resources needed to communicate:

Frontend EC2 (public subnet) → needs HTTP from internet
Backend API (private subnet) → needs Flask port 5000 from frontend
Database (private subnet) → needs MySQL port 3306 from backend

If any security group rule was wrong, nothing would talk to anything.

Example issue:

Backend EC2 can't connect to MySQL database

Root cause: RDS security group didn't have an inbound rule allowing port 3306 from backend EC2's security group.

The fix:

resource "aws_security_group_rule" "rds_from_backend" {
  type                     = "ingress"
  from_port                = 3306
  to_port                  = 3306
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.backend.id  # Allow from backend SG
  security_group_id        = aws_security_group.rds.id
}

Key concept: Security groups are stateful firewalls. Think of them as "who can talk to whom".

🔷 The Three-Tier Architecture I Built

┌─────────────────────────────────────────┐
│           AWS VPC 10.0.0.0/16           │
├──────────────────┬──────────────────────┤
│  Public Subnet   │  Private Subnets     │
│  10.0.1.0/24     │  10.0.2.0/24 & ...   │
├──────────────────┼──────────────────────┤
│  Frontend EC2    │  Backend EC2         │
│  Nginx + PHP     │  Flask API           │
│  Public IP ✓     │  Private IP only     │
│  Port 80 open    │  Port 5000 (internal)│
│  to internet     │                      │
└──────────────────┼──────────────────────┘
                   │
                   ▼
         Database EC2 / RDS
         MySQL 5.7.44
         Port 3306 (internal)

Why this layout:

Frontend must be public (users visit it)
Backend must be private (only frontend talks to it)
Database must be private (only backend talks to it)
Security groups enforce these rules

🟢 The Terraform Infrastructure Code

I'll spare you 500 lines, but here's the conceptual structure:

# Define VPC
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

# Public subnet for frontend
resource "aws_subnet" "public" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.1.0/24"
  map_public_ip_on_launch = true
}

# Private subnets for backend and database
resource "aws_subnet" "private_1" {
  vpc_id     = aws_vpc.main.id
  cidr_block = "10.0.2.0/24"
}

resource "aws_subnet" "private_2" {
  vpc_id     = aws_vpc.main.id
  cidr_block = "10.0.3.0/24"
}

# Internet Gateway for public subnet
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

# Route public subnet traffic to internet gateway
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block      = "0.0.0.0/0"
    gateway_id      = aws_internet_gateway.main.id
  }
}

# Security groups
resource "aws_security_group" "frontend" {
  name_prefix = "frontend-"
  vpc_id      = aws_vpc.main.id

  # Allow HTTP from anywhere
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "backend" {
  name_prefix = "backend-"
  vpc_id      = aws_vpc.main.id

  # Allow Flask port from frontend SG only
  ingress {
    from_port       = 5000
    to_port         = 5000
    protocol        = "tcp"
    security_groups = [aws_security_group.frontend.id]
  }
}

resource "aws_security_group" "rds" {
  name_prefix = "rds-"
  vpc_id      = aws_vpc.main.id

  # Allow MySQL from backend SG only
  ingress {
    from_port       = 3306
    to_port         = 3306
    protocol        = "tcp"
    security_groups = [aws_security_group.backend.id]
  }
}

# EC2 instances
resource "aws_instance" "frontend" {
  ami           = "ami-02b8269d5e85954ef"  # Ubuntu 24.04
  instance_type = "t2.micro"
  subnet_id     = aws_subnet.public.id
  vpc_security_group_ids = [aws_security_group.frontend.id]
  key_name      = aws_key_pair.deployer.key_name
}

resource "aws_instance" "backend" {
  ami           = "ami-02b8269d5e85954ef"  # Ubuntu 24.04
  instance_type = "t2.micro"
  subnet_id     = aws_subnet.private_1.id
  vpc_security_group_ids = [aws_security_group.backend.id]
  key_name      = aws_key_pair.deployer.key_name
}

# RDS Database
resource "aws_db_instance" "mysql" {
  identifier       = "book-review-mysql"
  engine           = "mysql"
  instance_class   = "db.t2.micro"
  allocated_storage = 20
  username         = "admin"
  password         = var.db_password
  db_subnet_group_name = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
  skip_final_snapshot = true
}

# Outputs for Ansible
output "frontend_public_ip" {
  value = aws_instance.frontend.public_ip
}

output "backend_private_ip" {
  value = aws_instance.backend.private_ip
}

output "rds_endpoint" {
  value = aws_db_instance.mysql.endpoint
}

This is the power of IaC: One file, version-controlled, reproducible infrastructure. Want to tear it all down? terraform destroy. Want to recreate it? terraform apply. No clicking in the AWS console for 30 minutes.

🎭 Ansible: Configuration Management

Once infrastructure exists, you need to configure it. That's Ansible's job.

Question I asked: "Why not just SSH in and install everything manually?"

Because:

Manual steps aren't reproducible
Humans make mistakes
There's no audit trail
You can't scale (do this 100 times? Good luck)

Ansible solution:

---
- name: Common setup
  hosts: all
  become: yes
  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes
    - name: Install system packages
      apt:
        name:
          - git
          - curl
          - wget
          - python3-pip
        state: present

- name: Setup Database
  hosts: database
  become: yes
  tasks:
    - name: Install MySQL
      apt:
        name: mysql-server
        state: present
    - name: Start MySQL
      systemd:
        name: mysql
        state: started
        enabled: yes

- name: Setup Backend API
  hosts: backend
  become: yes
  tasks:
    - name: Install Python packages
      pip:
        name:
          - flask
          - flask-cors
          - pymysql
          - gunicorn
    - name: Clone backend repo
      git:
        repo: https://github.com/pravinmishra/book-review-app.git
        dest: /app/backend
        version: main
    - name: Create systemd service for API
      copy:
        dest: /etc/systemd/system/api.service
        content: |
          [Unit]
          Description=Book Review API
          After=network.target

          [Service]
          Type=simple
          User=ubuntu
          WorkingDirectory=/app/backend
          ExecStart=/usr/bin/python3 -m gunicorn -w 4 -b 0.0.0.0:5000 app:app
          Restart=always

          [Install]
          WantedBy=multi-user.target
    - name: Start API service
      systemd:
        name: api
        state: started
        enabled: yes

- name: Setup Frontend Web Server
  hosts: frontend
  become: yes
  tasks:
    - name: Install Nginx
      apt:
        name: nginx
        state: present
    - name: Clone frontend repo
      git:
        repo: https://github.com/pravinmishra/book-review-app.git
        dest: /var/www/html
        version: main
    - name: Create Nginx config
      copy:
        dest: /etc/nginx/sites-available/default
        content: |
          server {
            listen 80 default_server;
            root /var/www/html;
            index index.php index.html;

            location ~ \.php$ {
              include snippets/fastcgi-php.conf;
              fastcgi_pass unix:/var/run/php/php-fpm.sock;
            }
          }
    - name: Restart Nginx
      systemd:
        name: nginx
        state: restarted
        enabled: yes

Key concept: Ansible is idempotent. Running the playbook once, twice, or 100 times produces the same result. It checks "is MySQL installed?" before installing.

⚠️ Part 4: Project 3 – Debugging and Production Hardening

The Reality: Things Break

From this project, I realized real DevOps is debugging.

🔴 Real Errors I Encountered

Error: Terraform State Corruption

After multiple terraform apply failures, the state file got confused:

Error: resource already exists

Root cause: State file tracked resources that no longer existed in AWS

The fix:

terraform state list  # See what Terraform thinks exists
terraform state show module.database.aws_db_instance.mysql  # Check one resource
terraform state rm module.database.aws_db_instance.mysql  # Remove it from state
terraform refresh  # Sync state with actual AWS
terraform apply  # Re-create the resource

Lesson: Terraform state is the source of truth. If it gets corrupted, you manually fix it. This is why production uses remote state backends (S3 with locking).

Error: Pipeline Hangs on Long-Running Deployment

MySQL took 4+ minutes to deploy. Azure DevOps timeout configuration became critical.

The fix:

- task: TerraformTaskV4@4
  inputs:
    provider: aws
    command: apply
    commandOptions: -auto-approve
  timeoutInMinutes: 20  # Increase from default 60

Error: Permissions Nightmare in Multi-Layer Deployment

When Terraform created resources, Ansible couldn't SSH into them because:

Terraform created the key pair but didn't output the private key path correctly
Ansible inventory had the wrong private key file location

The fix: Hardcoded the path in Ansible config:

[all]
backend ansible_host=10.0.2.15 ansible_user=ubuntu ansible_private_key_file=~/.ssh/book-review-key.pem

✅ Production Hardening Checklist

After this project assignment, I created a hardening checklist:

🔐 Security:

✅ Security groups follow least-privilege (only necessary ports open)
✅ Private databases not exposed to internet
✅ SSH keys stored securely in Azure DevOps Secure Files
✅ Credentials never in code (use environment variables)

🔧 Infrastructure:

✅ Terraform state in remote backend (S3) with locking (DynamoDB)
✅ Multiple availability zones for high availability
✅ Database backups enabled
✅ Monitoring and alerts configured

📋 CI/CD:

✅ All stages have dependencies (prevent bad deployments)
✅ Manual approval gates before production
✅ Artifact versioning for rollback capability
✅ Comprehensive logging for debugging

🧠 Part 5: Deep Dive – Concepts I Mastered

Concept 1: Idempotency

Definition: Running an operation multiple times produces the same result.

Example – Not Idempotent:

# If run twice, increments counter twice
count=0
count=$((count + 1))
echo $count  # First run: 1, Second run: 2

Example – Idempotent:

# If run twice, ensures state is "installed"
if ! command -v terraform &> /dev/null; then
    install_terraform
fi
# First run: installs, Second run: skips

Why this matters: Ansible operations are idempotent. Running a playbook twice is safe. You can re-run deployments without fear of double-applying changes.

Concept 2: Infrastructure as Code State Files

Terraform maintains a terraform.tfstate file:

{
  "version": 4,
  "terraform_version": "1.5.0",
  "resources": [
    {
      "type": "aws_instance",
      "name": "frontend",
      "instances": [
        {
          "id": "i-0f5b3a1997b955765",
          "attributes": {
            "instance_type": "t2.micro",
            "public_ip": "13.234.56.78"
          }
        }
      ]
    }
  ]
}

This file is sacred. It maps Terraform code to actual AWS resources. If you lose it, Terraform thinks resources don't exist and creates duplicates.

Production solution: Store state in remote backend:

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "book-review/terraform.tfstate"
    region         = "ap-south-1"
    dynamodb_table = "terraform-locks"  # Prevents concurrent applies
    encrypt        = true
  }
}

Concept 3: Artifact-Driven Deployment

Instead of deploying raw code, deploy built artifacts.

Pattern:

Code (source files)
  ↓
Build Stage: Compile to artifact
  ↓
Artifact (binary, Docker image, or static files)
  ↓
Publish Stage: Store artifact in repository
  ↓
Deploy Stage: Download and run artifact (no recompilation)

Benefits:

✅ Compile once, deploy many times
✅ Smaller deployment size
✅ Faster deployments
✅ Reproducible (same artifact = same behavior)

Concept 4: Service Connections – The Bridge Between Systems

Azure DevOps needs to authenticate to external systems. Service Connections are how.

Example: Deploying to an Nginx server requires SSH authentication.

- task: SSH@0
  displayName: 'Deploy to Server'
  inputs:
    sshEndpoint: 'ubuntu-nginx-ssh'  # Service connection name
    runOptions: 'inline'
    inline: |
      # Commands execute on remote server

Behind the scenes:

Azure DevOps stores SSH credentials securely
When task runs, Azure DevOps provides credentials to agent
Agent connects to server using stored credentials
Commands execute on remote server
Credentials are never logged or visible

📊 Part 6: Real-World Impact – Time, Cost, Quality

⏱️ Time Savings

Before Pipeline (Manual Deployment):

Developer finishes code: 1 hour
Waits for DevOps engineer: 2-4 hours
DevOps SSH into server: 5 minutes
Manually copy files: 5 minutes
Restart services: 3 minutes
Test in browser: 5 minutes
Total: 2-4+ hours

With Pipeline (Automated Deployment):

Developer finishes code: 1 hour
Push to main branch: 30 seconds
Pipeline runs automatically: 8 minutes
- Build: 3 minutes
- Test: 2 minutes
- Deploy: 3 minutes
Total: 1 hour 9 minutes

Savings: 1-3 hours per deployment

If your team deploys 5 times per day: 5-15 hours saved per day

💰 Cost Optimization

AWS Free Tier: 12 months of free usage

Component	Original Cost	Free Tier Cost
Frontend EC2 (t2.micro)	$10/month	$0
Backend EC2 (t2.micro)	$10/month	$0
Database (MySQL on EC2)	$0 (free tier)	$0
Networking (VPC, IGW, NAT)	~$5/month	$0
Total	~$25/month	$0/month

Post-free tier: ~$25/month for the same infrastructure

Manual approach (no IaC):

Senior DevOps engineer time to build: 40 hours @ $150/hour = $6,000
Configuration drift, manual fixes, rebuilds: $2,000/year
Total cost: $8,000+ one-time, plus ongoing maintenance

Infrastructure as Code approach:

Initial Terraform code: 8 hours @ $100/hour = $800
Reusable for unlimited deployments: $0 additional
Version-controlled, fully reproducible: $0 maintenance
Total cost: $800 one-time, plus $25/month infrastructure

ROI: Infrastructure cost drops 95%. Time to deploy drops 90%. Reproducibility increases infinitely.

🎯 Quality Improvements

Automated Testing Prevents Bugs:

Pipeline catches unit test failures before deployment
No broken code reaches production
Confidence increases (tests pass = app works)

Consistent Deployments:

Same steps every time = same result every time
No "it works for me" scenarios
Rollback is one button click (re-run previous artifact version)

🚨 Part 7: Critical Debugging Patterns I Learned

Pattern 1: Isolate the Layer

When something breaks:

Is it code? (Test locally: npm test)
  No → Is it build? (Check npm run build output)
    No → Is it deployment? (SSH to server, check files)
      No → Is it permissions? (ls -l, check ownership)
        No → Is it service? (systemctl status)

Each layer can be debugged independently.

Pattern 2: Check Logs Everywhere

Azure DevOps pipeline logs
  ↓
SSH to server, check systemctl status
  ↓
Application logs (/var/log/app.log)
  ↓
Nginx logs (/var/log/nginx/error.log)
  ↓
Database logs (SELECT ... FROM mysql.error_log)

The bug is always in the last place you look. So check everywhere systematically.

Pattern 3: Reproduce Locally First

Before running in pipeline:

# Reproduce the failure locally
npm run build  # Does it succeed?
npm test  # Do tests pass?
ssh ubuntu@13.234.56.78  # Can I connect?
curl http://13.234.56.78  # Does the app respond?

If it fails locally, fix it locally. Don't debug via pipeline.

🎓 Part 8: What Week 9 Taught Me – The Big Picture

🔑 Key Insight 1: DevOps is About Removing Friction

Infrastructure → Code (Terraform)
Configuration → Code (Ansible)
Deployment → Code (Azure DevOps YAML)

Everything is code. Everything is automated. Everything is reproducible.

🔑 Key Insight 2: Separation of Concerns Matters

Infrastructure team ≠ Application team
They're independent. They're scalable. They're clear.

This mirrors how large organizations actually work. You're learning enterprise patterns.

🔑 Key Insight 3: Production is Different From Development

Testing locally ≠ Works in production

Why?

Different permissions
Different network topology
Different resource constraints
Different failure modes

Understanding this gap is the difference between a developer and a DevOps engineer.

🔑 Key Insight 4: Debugging is 80% of the Job

Code works. Architecture makes sense. But something's wrong.

This is real DevOps. Being able to systematically debug, isolate, and fix production issues is worth more than knowing 10 cloud platforms.

🎯 Part 9: Practical Tips For Your Journey

💡 Tip 1: Use Meaningful Names

# Bad
resource "aws_instance" "server" {
  instance_type = "t2.micro"
}

# Good
resource "aws_instance" "book_review_frontend" {
  instance_type = "t2.micro"
  tags = {
    Name = "book-review-frontend"
    Role = "web-server"
  }
}

Future you will thank present you.

💡 Tip 2: Default to Secure

# Bad: Allow all IPs
ingress {
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["0.0.0.0/0"]  # Dangerous!
}

# Good: Allow only what's needed
ingress {
  from_port       = 22
  to_port         = 22
  protocol        = "tcp"
  security_groups = [aws_security_group.bastion.id]  # Only bastion
}

💡 Tip 3: Version Everything

# README with versions used
- Terraform: 1.5.0
- Ansible: 2.14.1
- Python: 3.11
- Node: 18.x

# .gitignore to exclude state files
.tfstate
.tfstate.*
.tfvars (except .tfvars.example)

💡 Tip 4: Document Assumptions

# In variables.tf
variable "ssh_public_key_path" {
  description = "Path to SSH public key"
  type        = string
  default     = "~/.ssh/book-review-key.pub"
  # NOTE: Must match the key used to launch EC2 instances
}

When someone runs this code next year, they'll know exactly what's expected.

📈 Part 10: Next Steps – The Path Forward

🚀 Where This Leads

Week 9 took you from "I know CI/CD exists" to "I can build production infrastructure and automate its deployment."

Next natural steps:

Containerization (Docker):
- Package applications instead of deploying raw code
- Consistent environment across dev → prod
Orchestration (Kubernetes):
- Run containers at scale
- Automatic scaling, self-healing
Monitoring and Logging:
- CloudWatch, Prometheus, ELK Stack
- Know what's happening in production
Security Hardening:
- Secrets management (HashiCorp Vault, AWS Secrets Manager)
- Infrastructure security scanning
- Compliance automation
Advanced Patterns:
- Blue-green deployments (zero downtime)
- Canary releases (gradual rollout)
- GitOps (Git as source of truth for infrastructure)

🎬 Conclusion: Week 9 Reflection

This week was transformational. I went from clicking buttons in consoles to writing code that provisions, configures, and deploys entire systems.

The three projects built a coherent narrative:

Project 1: Learn pipeline mechanics
Project 2: Apply them to real infrastructure
Project 3: Debug like a real DevOps engineer

I've learned that DevOps isn't magic. It's:

Clear thinking (what's the desired state?)
Systematic debugging (where's the failure?)
Continuous improvement (how can we do this better?)

Most importantly, I've learned that automation is a mindset. If you're doing it manually twice, automate it. Your future self will be grateful.

🙏 Week 9 Learning Outcomes

📚 Hashtags

#Azure DevOps #CI/CD #InfrastructureAsCode #Terraform #Ansible #AWS #DevOps #CloudArchitecture #Automation #Pipeline #Production #AzurePipelines #YAML #GitOps #MultiStageDeployment #LearningJourney #TechCommunity

This is week 9 of 12 of a free DevOps cohort. In continuation of 🧩Ansible Roles Unleashed: From Ad-Hoc Automation to Production-Grade Cloud Deployments [Week-8] 🚀

Your learning doesn't end here—it accelerates. The patterns you've learned this week are the foundation for everything that comes next. Keep shipping. Keep learning.

📌 Introduction: Why This Week Changed Everything

🎯 Understanding the Week 9 Journey: Three Projects, One Cohesive Story

📚 Part 1: The Foundation – What is Azure DevOps, Really?

🤔 Question I Asked Myself: "Why Do We Need Azure DevOps When GitHub Exists?"

What Azure DevOps Actually Solves

Core Components You MUST Understand

🏗️ Part 2: Project 1 Assignment – The Gateway Drug to CI/CD

The Challenge: Deploy a React App via Multi-Stage Pipeline

🔴 The Errors That Made Me Question Everything

Error #1: YAML Syntax – The AND vs and() Confusion

Error #2: Permission Denied on Nginx Root (/var/www/html)

✅ The Solution: Multi-Stage YAML Pipeline

🏛️ Part 3: Project 2 – Infrastructure as Code + Application Deployment

The Challenge: Full 3-Tier Architecture on AWS

🤔 Question: "Why Separate Infrastructure and Application?"

🔴 The Infrastructure Challenges

Challenge #1: Database Version Mismatch Hell

Challenge #2: Security Group Configuration

🔷 The Three-Tier Architecture I Built

🟢 The Terraform Infrastructure Code

🎭 Ansible: Configuration Management

⚠️ Part 4: Project 3 – Debugging and Production Hardening

The Reality: Things Break

🔴 Real Errors I Encountered

Error: Terraform State Corruption

Error: Pipeline Hangs on Long-Running Deployment

Error: Permissions Nightmare in Multi-Layer Deployment

✅ Production Hardening Checklist

🧠 Part 5: Deep Dive – Concepts I Mastered

Concept 1: Idempotency

Concept 2: Infrastructure as Code State Files

Concept 3: Artifact-Driven Deployment

Concept 4: Service Connections – The Bridge Between Systems

📊 Part 6: Real-World Impact – Time, Cost, Quality

⏱️ Time Savings

💰 Cost Optimization

🎯 Quality Improvements

🚨 Part 7: Critical Debugging Patterns I Learned

Pattern 1: Isolate the Layer

Pattern 2: Check Logs Everywhere

Pattern 3: Reproduce Locally First

🎓 Part 8: What Week 9 Taught Me – The Big Picture

🔑 Key Insight 1: DevOps is About Removing Friction

🔑 Key Insight 2: Separation of Concerns Matters

🔑 Key Insight 3: Production is Different From Development

🔑 Key Insight 4: Debugging is 80% of the Job

🎯 Part 9: Practical Tips For Your Journey

💡 Tip 1: Use Meaningful Names

💡 Tip 2: Default to Secure

💡 Tip 3: Version Everything

💡 Tip 4: Document Assumptions

📈 Part 10: Next Steps – The Path Forward

🚀 Where This Leads

🎬 Conclusion: Week 9 Reflection

🙏 Week 9 Learning Outcomes

📚 Hashtags

Error #1: YAML Syntax – The `AND` vs `and()` Confusion

Error #2: Permission Denied on Nginx Root (`/var/www/html`)