DEV Community

Cover image for πŸš€ From Chaos to Orchestration: Mastering Azure DevOps CI/CD Pipelines [Week-9] βš™οΈ
Suvrajeet Banerjee
Suvrajeet Banerjee Subscriber

Posted on

πŸš€ From Chaos to Orchestration: Mastering Azure DevOps CI/CD Pipelines [Week-9] βš™οΈ

πŸ“Œ Introduction: Why This Week Changed Everything

intro

Picture this: You're pushing code to GitHub. Minutes later, it's running in production. No manual SSH sessions. No "it works on my machine" moments. No 2 AM deployments gone wrong. That's not fantasyβ€”that's Azure DevOps CI/CD Pipelines in action.

This week, I went from basic pipeline awareness to orchestrating three interconnected projects that took my DevOps journey from theoretical to battle-tested production-grade. We're talking infrastructure provisioning, configuration management, multi-stage automation, real error debugging, and end-to-end deployment workflows. This isn't "hello world" territory anymore.

Let me walk you through what I discovered, the walls I hit, and how I demolished them.


🎯 Understanding the Week 9 Journey: Three Projects, One Cohesive Story

Before diving into the deep end, let's map the landscape. Week 9 consists of three projects that build on each other like a DevOps Jenga tower:

  • Project 1: React App CI/CD Pipeline to Nginx (foundational multi-stage automation)
  • Project 2: Book Review App on AWS Infrastructure (full-stack IaC + configuration management)
  • Project 3: Advanced Troubleshooting & Production Hardening (real-world debugging patterns)

The beauty? They're not isolated. Each project adds a layer. Project 1 teaches you pipeline mechanics. Project 2 teaches you infrastructure-application orchestration. Project 3 teaches you how to survive when things breakβ€”which is 80% of DevOps.


πŸ“š Part 1: The Foundation – What is Azure DevOps, Really?

azd

πŸ€” Question I Asked Myself: "Why Do We Need Azure DevOps When GitHub Exists?"

Good question. Here's the honest answer:

GitHub is version control. Azure DevOps is the entire delivery orchestration platform. It's like comparing a calendar to a full project management system.

What Azure DevOps Actually Solves

Imagine deploying code manually:

Developer writes code
  ↓
Developer emails "deploy please"
  ↓
DevOps engineer SSHs into server
  ↓
DevOps engineer runs scripts
  ↓
DevOps engineer tests manually
  ↓
App is live (hopefully)
Enter fullscreen mode Exit fullscreen mode

Azure DevOps eliminates this:

Developer pushes to main branch
  ↓
Azure Pipeline triggers automatically
  ↓ Build stage (compile, package)
  ↓ Test stage (run unit tests)
  ↓ Publish stage (create artifact)
  ↓ Deploy stage (push to production)
  ↓ App is live (verified by tests)
Enter fullscreen mode Exit fullscreen mode

No human intervention. No email chains. No 2 AM scrambles. Just velocity + reliability.

Core Components You MUST Understand

πŸ”· Pipelines: Automated workflows that execute when you push code
πŸ”· Stages: Logical groupings (Build β†’ Test β†’ Deploy)
πŸ”· Jobs: Collections of tasks within a stage
πŸ”· Tasks: Individual actions (compile, test, copy files)
πŸ”· Artifacts: Build outputs (compiled binaries, Docker images, static files)
πŸ”· Service Connections: Secure authentication to external systems (AWS, GitHub, SSH servers)


πŸ—οΈ Part 2: Project 1 Assignment – The Gateway Drug to CI/CD

cicd

The Challenge: Deploy a React App via Multi-Stage Pipeline

What was tasked:
Build a 4-stage Azure DevOps pipeline that:

  1. Builds a React app (npm install + npm run build)
  2. Tests the app (npm test)
  3. Publishes build artifacts
  4. Deploys to an Nginx server via SSH

Why this matters: This is the canonical CI/CD pattern. Every enterprise uses this exact structure.

πŸ”΄ The Errors That Made Me Question Everything

Error #1: YAML Syntax – The AND vs and() Confusion

I wrote this and Azure DevOps exploded:

condition: succeeded('Build') AND succeeded('Test')
Enter fullscreen mode Exit fullscreen mode

Error message:

Unexpected symbol AND. Located at position 20 within expression succeeded('Build') AND succeeded('Test')
Enter fullscreen mode Exit fullscreen mode

What I didn't know: Azure DevOps uses function syntax, not C-style operators.

The fix:

condition: and(succeeded('Build'), succeeded('Test'))
Enter fullscreen mode Exit fullscreen mode

Learning: This one error taught me that DevOps is very particular about syntax. One typo, one wrong operator, and the entire pipeline fails. There's no fuzzy matching.

Error #2: Permission Denied on Nginx Root (/var/www/html)

After the pipeline deployed files, the app returned 403 Forbidden. Why?

Because permissions were broken.

The flow was:

  • Ubuntu user copies files (has write permission)
  • Nginx runs as www-data user (doesn't have read permission)
  • Result: Nginx can't read files β†’ 403 error

The three-step permission fix I discovered:

# BEFORE copying files
sudo chown -R ubuntu:ubuntu /var/www/html
sudo chmod 755 /var/www/html

# Copy files via Azure Pipeline

# AFTER copying files
sudo chown -R www-data:www-data /var/www/html
sudo chmod -R 755 /var/www/html
sudo chmod -R 644 /var/www/html/*
Enter fullscreen mode Exit fullscreen mode

Why this matters: This permission handoff pattern is everywhere in DevOps. Different users, different roles, different permissions. You must think in layers.

βœ… The Solution: Multi-Stage YAML Pipeline

Here's the working 4-stage pipeline:

trigger:
  branches:
    include:
      - main

variables:
  sshEndpoint: 'ubuntu-nginx-ssh'
  artifactName: 'react_build'
  webRoot: '/var/www/html'

stages:

# Stage 1: Build
- stage: Build
  displayName: 'πŸ”¨ Build React App'
  jobs:
    - job: BuildJob
      pool:
        vmImage: 'SelfHostedPool'
      steps:
        - checkout: self
        - task: NodeTool@0
          inputs:
            versionSpec: '18.x'
        - script: npm install
          displayName: 'Install Dependencies'
        - script: npm run build
          displayName: 'Build React App'
        - publish: $(Build.SourcesDirectory)/build
          artifact: $(artifactName)

# Stage 2: Test
- stage: Test
  displayName: 'βœ… Test React App'
  dependsOn: Build
  condition: succeeded('Build')
  jobs:
    - job: TestJob
      pool:
        vmImage: 'SelfHostedPool'
      steps:
        - checkout: self
        - task: NodeTool@0
          inputs:
            versionSpec: '18.x'
        - script: npm install
        - script: npm test -- --watchAll=false
          displayName: 'Run Tests'

# Stage 3: Publish
- stage: Publish
  displayName: 'πŸ“¦ Publish Artifact'
  dependsOn: Test
  condition: succeeded('Test')
  jobs:
    - job: PublishJob
      pool:
        vmImage: 'SelfHostedPool'
      steps:
        - download: current
          artifact: $(artifactName)
        - script: echo "Artifact published and ready for deployment"

# Stage 4: Deploy
- stage: Deploy
  displayName: 'πŸš€ Deploy to Nginx'
  dependsOn: 
    - Build
    - Test
  condition: and(succeeded('Build'), succeeded('Test'))
  jobs:
    - job: DeployJob
      pool:
        vmImage: 'SelfHostedPool'
      steps:
        - download: current
          artifact: $(artifactName)

        # FIX 1: Set permissions for upload
        - task: SSH@0
          displayName: 'Set Write Permissions'
          inputs:
            sshEndpoint: $(sshEndpoint)
            runOptions: 'inline'
            inline: |
              sudo chown -R ubuntu:ubuntu $(webRoot)
              sudo chmod 755 $(webRoot)

        # Copy files
        - task: CopyFilesOverSSH@0
          displayName: 'Copy React Build'
          inputs:
            sshEndpoint: $(sshEndpoint)
            sourceFolder: '$(Pipeline.Workspace)/$(artifactName)'
            contents: '**'
            targetFolder: $(webRoot)
            overwrite: true

        # FIX 2: Set permissions for Nginx
        - task: SSH@0
          displayName: 'Set Nginx Permissions'
          inputs:
            sshEndpoint: $(sshEndpoint)
            runOptions: 'inline'
            inline: |
              sudo chown -R www-data:www-data $(webRoot)
              sudo chmod -R 755 $(webRoot)
              sudo chmod -R 644 $(webRoot)/*

        # Restart Nginx
        - task: SSH@0
          displayName: 'Restart Nginx'
          inputs:
            sshEndpoint: $(sshEndpoint)
            runOptions: 'inline'
            inline: |
              sudo systemctl restart nginx
              sudo systemctl status nginx
Enter fullscreen mode Exit fullscreen mode

Key insight: Each stage depends on the previous one. If Build fails, Test never runs. This is atomic safetyβ€”you can't deploy broken code.


πŸ›οΈ Part 3: Project 2 – Infrastructure as Code + Application Deployment

iac

The Challenge: Full 3-Tier Architecture on AWS

What was tasked:
Deploy a Book Review Application using:

  • Terraform to provision infrastructure (VPC, EC2 instances, RDS database)
  • Ansible to configure servers (install Nginx, Flask, MySQL)
  • Azure DevOps to orchestrate both

This is where things got seriously complex.

πŸ€” Question: "Why Separate Infrastructure and Application?"

In enterprise environments, infrastructure and application teams have different responsibilities:

  • Infrastructure Team (mine for this project):

    • Provisions cloud resources (VMs, databases, networking)
    • Manages security groups, firewalls
    • Ensures infrastructure stability
    • Changes infrequently (quarterly updates)
  • Application Team:

    • Writes code, runs tests
    • Deploys applications
    • Changes frequently (multiple times per day)

Separating them means:
βœ… One team doesn't block the other
βœ… Each team owns their concerns
βœ… Easier rollback if something breaks
βœ… Clear accountability

πŸ”΄ The Infrastructure Challenges

Challenge #1: Database Version Mismatch Hell

I started with RDS (Relational Database Service) thinking it was the "easy" option.

Error:

Error: Creating DB instance: DBParameterGroupNotFound
Cannot find MySQL version 8.0.28 in region ap-south-1
Enter fullscreen mode Exit fullscreen mode

Why: Azure region compatibility is finicky. Not all database versions are available in all regions.

The pivot: Instead of RDS, I deployed MySQL on an EC2 instance. Same database, no version drama. Plus, it's free-tier compatible.

Lesson learned: Sometimes the "managed" solution (RDS) creates more problems than a simple "unmanaged" solution (MySQL on EC2). Know when to keep it simple.

Challenge #2: Security Group Configuration

Three resources needed to communicate:

  • Frontend EC2 (public subnet) β†’ needs HTTP from internet
  • Backend API (private subnet) β†’ needs Flask port 5000 from frontend
  • Database (private subnet) β†’ needs MySQL port 3306 from backend

If any security group rule was wrong, nothing would talk to anything.

Example issue:

Backend EC2 can't connect to MySQL database
Enter fullscreen mode Exit fullscreen mode

Root cause: RDS security group didn't have an inbound rule allowing port 3306 from backend EC2's security group.

The fix:

resource "aws_security_group_rule" "rds_from_backend" {
  type                     = "ingress"
  from_port                = 3306
  to_port                  = 3306
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.backend.id  # Allow from backend SG
  security_group_id        = aws_security_group.rds.id
}
Enter fullscreen mode Exit fullscreen mode

Key concept: Security groups are stateful firewalls. Think of them as "who can talk to whom".

πŸ”· The Three-Tier Architecture I Built

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           AWS VPC 10.0.0.0/16           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Public Subnet   β”‚  Private Subnets     β”‚
β”‚  10.0.1.0/24     β”‚  10.0.2.0/24 & ...   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Frontend EC2    β”‚  Backend EC2         β”‚
β”‚  Nginx + PHP     β”‚  Flask API           β”‚
β”‚  Public IP βœ“     β”‚  Private IP only     β”‚
β”‚  Port 80 open    β”‚  Port 5000 (internal)β”‚
β”‚  to internet     β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
         Database EC2 / RDS
         MySQL 5.7.44
         Port 3306 (internal)
Enter fullscreen mode Exit fullscreen mode

Why this layout:

  • Frontend must be public (users visit it)
  • Backend must be private (only frontend talks to it)
  • Database must be private (only backend talks to it)
  • Security groups enforce these rules

🟒 The Terraform Infrastructure Code

I'll spare you 500 lines, but here's the conceptual structure:

# Define VPC
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

# Public subnet for frontend
resource "aws_subnet" "public" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.1.0/24"
  map_public_ip_on_launch = true
}

# Private subnets for backend and database
resource "aws_subnet" "private_1" {
  vpc_id     = aws_vpc.main.id
  cidr_block = "10.0.2.0/24"
}

resource "aws_subnet" "private_2" {
  vpc_id     = aws_vpc.main.id
  cidr_block = "10.0.3.0/24"
}

# Internet Gateway for public subnet
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

# Route public subnet traffic to internet gateway
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block      = "0.0.0.0/0"
    gateway_id      = aws_internet_gateway.main.id
  }
}

# Security groups
resource "aws_security_group" "frontend" {
  name_prefix = "frontend-"
  vpc_id      = aws_vpc.main.id

  # Allow HTTP from anywhere
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "backend" {
  name_prefix = "backend-"
  vpc_id      = aws_vpc.main.id

  # Allow Flask port from frontend SG only
  ingress {
    from_port       = 5000
    to_port         = 5000
    protocol        = "tcp"
    security_groups = [aws_security_group.frontend.id]
  }
}

resource "aws_security_group" "rds" {
  name_prefix = "rds-"
  vpc_id      = aws_vpc.main.id

  # Allow MySQL from backend SG only
  ingress {
    from_port       = 3306
    to_port         = 3306
    protocol        = "tcp"
    security_groups = [aws_security_group.backend.id]
  }
}

# EC2 instances
resource "aws_instance" "frontend" {
  ami           = "ami-02b8269d5e85954ef"  # Ubuntu 24.04
  instance_type = "t2.micro"
  subnet_id     = aws_subnet.public.id
  vpc_security_group_ids = [aws_security_group.frontend.id]
  key_name      = aws_key_pair.deployer.key_name
}

resource "aws_instance" "backend" {
  ami           = "ami-02b8269d5e85954ef"  # Ubuntu 24.04
  instance_type = "t2.micro"
  subnet_id     = aws_subnet.private_1.id
  vpc_security_group_ids = [aws_security_group.backend.id]
  key_name      = aws_key_pair.deployer.key_name
}

# RDS Database
resource "aws_db_instance" "mysql" {
  identifier       = "book-review-mysql"
  engine           = "mysql"
  instance_class   = "db.t2.micro"
  allocated_storage = 20
  username         = "admin"
  password         = var.db_password
  db_subnet_group_name = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
  skip_final_snapshot = true
}

# Outputs for Ansible
output "frontend_public_ip" {
  value = aws_instance.frontend.public_ip
}

output "backend_private_ip" {
  value = aws_instance.backend.private_ip
}

output "rds_endpoint" {
  value = aws_db_instance.mysql.endpoint
}
Enter fullscreen mode Exit fullscreen mode

This is the power of IaC: One file, version-controlled, reproducible infrastructure. Want to tear it all down? terraform destroy. Want to recreate it? terraform apply. No clicking in the AWS console for 30 minutes.

🎭 Ansible: Configuration Management

Once infrastructure exists, you need to configure it. That's Ansible's job.

Question I asked: "Why not just SSH in and install everything manually?"

Because:

  • Manual steps aren't reproducible
  • Humans make mistakes
  • There's no audit trail
  • You can't scale (do this 100 times? Good luck)

Ansible solution:

---
- name: Common setup
  hosts: all
  become: yes
  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes
    - name: Install system packages
      apt:
        name:
          - git
          - curl
          - wget
          - python3-pip
        state: present

- name: Setup Database
  hosts: database
  become: yes
  tasks:
    - name: Install MySQL
      apt:
        name: mysql-server
        state: present
    - name: Start MySQL
      systemd:
        name: mysql
        state: started
        enabled: yes

- name: Setup Backend API
  hosts: backend
  become: yes
  tasks:
    - name: Install Python packages
      pip:
        name:
          - flask
          - flask-cors
          - pymysql
          - gunicorn
    - name: Clone backend repo
      git:
        repo: https://github.com/pravinmishra/book-review-app.git
        dest: /app/backend
        version: main
    - name: Create systemd service for API
      copy:
        dest: /etc/systemd/system/api.service
        content: |
          [Unit]
          Description=Book Review API
          After=network.target

          [Service]
          Type=simple
          User=ubuntu
          WorkingDirectory=/app/backend
          ExecStart=/usr/bin/python3 -m gunicorn -w 4 -b 0.0.0.0:5000 app:app
          Restart=always

          [Install]
          WantedBy=multi-user.target
    - name: Start API service
      systemd:
        name: api
        state: started
        enabled: yes

- name: Setup Frontend Web Server
  hosts: frontend
  become: yes
  tasks:
    - name: Install Nginx
      apt:
        name: nginx
        state: present
    - name: Clone frontend repo
      git:
        repo: https://github.com/pravinmishra/book-review-app.git
        dest: /var/www/html
        version: main
    - name: Create Nginx config
      copy:
        dest: /etc/nginx/sites-available/default
        content: |
          server {
            listen 80 default_server;
            root /var/www/html;
            index index.php index.html;

            location ~ \.php$ {
              include snippets/fastcgi-php.conf;
              fastcgi_pass unix:/var/run/php/php-fpm.sock;
            }
          }
    - name: Restart Nginx
      systemd:
        name: nginx
        state: restarted
        enabled: yes
Enter fullscreen mode Exit fullscreen mode

Key concept: Ansible is idempotent. Running the playbook once, twice, or 100 times produces the same result. It checks "is MySQL installed?" before installing.


⚠️ Part 4: Project 3 – Debugging and Production Hardening

debg

The Reality: Things Break

From this project, I realized real DevOps is debugging.

πŸ”΄ Real Errors I Encountered

Error: Terraform State Corruption

After multiple terraform apply failures, the state file got confused:

Error: resource already exists

Root cause: State file tracked resources that no longer existed in AWS
Enter fullscreen mode Exit fullscreen mode

The fix:

terraform state list  # See what Terraform thinks exists
terraform state show module.database.aws_db_instance.mysql  # Check one resource
terraform state rm module.database.aws_db_instance.mysql  # Remove it from state
terraform refresh  # Sync state with actual AWS
terraform apply  # Re-create the resource
Enter fullscreen mode Exit fullscreen mode

Lesson: Terraform state is the source of truth. If it gets corrupted, you manually fix it. This is why production uses remote state backends (S3 with locking).

Error: Pipeline Hangs on Long-Running Deployment

MySQL took 4+ minutes to deploy. Azure DevOps timeout configuration became critical.

The fix:

- task: TerraformTaskV4@4
  inputs:
    provider: aws
    command: apply
    commandOptions: -auto-approve
  timeoutInMinutes: 20  # Increase from default 60
Enter fullscreen mode Exit fullscreen mode

Error: Permissions Nightmare in Multi-Layer Deployment

When Terraform created resources, Ansible couldn't SSH into them because:

  • Terraform created the key pair but didn't output the private key path correctly
  • Ansible inventory had the wrong private key file location

The fix: Hardcoded the path in Ansible config:

[all]
backend ansible_host=10.0.2.15 ansible_user=ubuntu ansible_private_key_file=~/.ssh/book-review-key.pem
Enter fullscreen mode Exit fullscreen mode

βœ… Production Hardening Checklist

After this project assignment, I created a hardening checklist:

πŸ” Security:

  • βœ… Security groups follow least-privilege (only necessary ports open)
  • βœ… Private databases not exposed to internet
  • βœ… SSH keys stored securely in Azure DevOps Secure Files
  • βœ… Credentials never in code (use environment variables)

πŸ”§ Infrastructure:

  • βœ… Terraform state in remote backend (S3) with locking (DynamoDB)
  • βœ… Multiple availability zones for high availability
  • βœ… Database backups enabled
  • βœ… Monitoring and alerts configured

πŸ“‹ CI/CD:

  • βœ… All stages have dependencies (prevent bad deployments)
  • βœ… Manual approval gates before production
  • βœ… Artifact versioning for rollback capability
  • βœ… Comprehensive logging for debugging

🧠 Part 5: Deep Dive – Concepts I Mastered

cncpts

Concept 1: Idempotency

Definition: Running an operation multiple times produces the same result.

Example – Not Idempotent:

# If run twice, increments counter twice
count=0
count=$((count + 1))
echo $count  # First run: 1, Second run: 2
Enter fullscreen mode Exit fullscreen mode

Example – Idempotent:

# If run twice, ensures state is "installed"
if ! command -v terraform &> /dev/null; then
    install_terraform
fi
# First run: installs, Second run: skips
Enter fullscreen mode Exit fullscreen mode

Why this matters: Ansible operations are idempotent. Running a playbook twice is safe. You can re-run deployments without fear of double-applying changes.

Concept 2: Infrastructure as Code State Files

Terraform maintains a terraform.tfstate file:

{
  "version": 4,
  "terraform_version": "1.5.0",
  "resources": [
    {
      "type": "aws_instance",
      "name": "frontend",
      "instances": [
        {
          "id": "i-0f5b3a1997b955765",
          "attributes": {
            "instance_type": "t2.micro",
            "public_ip": "13.234.56.78"
          }
        }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This file is sacred. It maps Terraform code to actual AWS resources. If you lose it, Terraform thinks resources don't exist and creates duplicates.

Production solution: Store state in remote backend:

terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "book-review/terraform.tfstate"
    region         = "ap-south-1"
    dynamodb_table = "terraform-locks"  # Prevents concurrent applies
    encrypt        = true
  }
}
Enter fullscreen mode Exit fullscreen mode

Concept 3: Artifact-Driven Deployment

Instead of deploying raw code, deploy built artifacts.

Pattern:

Code (source files)
  ↓
Build Stage: Compile to artifact
  ↓
Artifact (binary, Docker image, or static files)
  ↓
Publish Stage: Store artifact in repository
  ↓
Deploy Stage: Download and run artifact (no recompilation)
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • βœ… Compile once, deploy many times
  • βœ… Smaller deployment size
  • βœ… Faster deployments
  • βœ… Reproducible (same artifact = same behavior)

Concept 4: Service Connections – The Bridge Between Systems

Azure DevOps needs to authenticate to external systems. Service Connections are how.

Example: Deploying to an Nginx server requires SSH authentication.

- task: SSH@0
  displayName: 'Deploy to Server'
  inputs:
    sshEndpoint: 'ubuntu-nginx-ssh'  # Service connection name
    runOptions: 'inline'
    inline: |
      # Commands execute on remote server
Enter fullscreen mode Exit fullscreen mode

Behind the scenes:

  1. Azure DevOps stores SSH credentials securely
  2. When task runs, Azure DevOps provides credentials to agent
  3. Agent connects to server using stored credentials
  4. Commands execute on remote server
  5. Credentials are never logged or visible

πŸ“Š Part 6: Real-World Impact – Time, Cost, Quality

⏱️ Time Savings

Before Pipeline (Manual Deployment):

  • Developer finishes code: 1 hour
  • Waits for DevOps engineer: 2-4 hours
  • DevOps SSH into server: 5 minutes
  • Manually copy files: 5 minutes
  • Restart services: 3 minutes
  • Test in browser: 5 minutes
  • Total: 2-4+ hours

With Pipeline (Automated Deployment):

  • Developer finishes code: 1 hour
  • Push to main branch: 30 seconds
  • Pipeline runs automatically: 8 minutes
    • Build: 3 minutes
    • Test: 2 minutes
    • Deploy: 3 minutes
  • Total: 1 hour 9 minutes

Savings: 1-3 hours per deployment

If your team deploys 5 times per day: 5-15 hours saved per day

πŸ’° Cost Optimization

AWS Free Tier: 12 months of free usage

Component Original Cost Free Tier Cost
Frontend EC2 (t2.micro) $10/month $0
Backend EC2 (t2.micro) $10/month $0
Database (MySQL on EC2) $0 (free tier) $0
Networking (VPC, IGW, NAT) ~$5/month $0
Total ~$25/month $0/month

Post-free tier: ~$25/month for the same infrastructure

Manual approach (no IaC):

  • Senior DevOps engineer time to build: 40 hours @ $150/hour = $6,000
  • Configuration drift, manual fixes, rebuilds: $2,000/year
  • Total cost: $8,000+ one-time, plus ongoing maintenance

Infrastructure as Code approach:

  • Initial Terraform code: 8 hours @ $100/hour = $800
  • Reusable for unlimited deployments: $0 additional
  • Version-controlled, fully reproducible: $0 maintenance
  • Total cost: $800 one-time, plus $25/month infrastructure

ROI: Infrastructure cost drops 95%. Time to deploy drops 90%. Reproducibility increases infinitely.

🎯 Quality Improvements

Automated Testing Prevents Bugs:

  • Pipeline catches unit test failures before deployment
  • No broken code reaches production
  • Confidence increases (tests pass = app works)

Consistent Deployments:

  • Same steps every time = same result every time
  • No "it works for me" scenarios
  • Rollback is one button click (re-run previous artifact version)

🚨 Part 7: Critical Debugging Patterns I Learned

Pattern 1: Isolate the Layer

When something breaks:

Is it code? (Test locally: npm test)
  No β†’ Is it build? (Check npm run build output)
    No β†’ Is it deployment? (SSH to server, check files)
      No β†’ Is it permissions? (ls -l, check ownership)
        No β†’ Is it service? (systemctl status)
Enter fullscreen mode Exit fullscreen mode

Each layer can be debugged independently.

Pattern 2: Check Logs Everywhere

Azure DevOps pipeline logs
  ↓
SSH to server, check systemctl status
  ↓
Application logs (/var/log/app.log)
  ↓
Nginx logs (/var/log/nginx/error.log)
  ↓
Database logs (SELECT ... FROM mysql.error_log)
Enter fullscreen mode Exit fullscreen mode

The bug is always in the last place you look. So check everywhere systematically.

Pattern 3: Reproduce Locally First

Before running in pipeline:

# Reproduce the failure locally
npm run build  # Does it succeed?
npm test  # Do tests pass?
ssh ubuntu@13.234.56.78  # Can I connect?
curl http://13.234.56.78  # Does the app respond?
Enter fullscreen mode Exit fullscreen mode

If it fails locally, fix it locally. Don't debug via pipeline.


πŸŽ“ Part 8: What Week 9 Taught Me – The Big Picture

conc

πŸ”‘ Key Insight 1: DevOps is About Removing Friction

Infrastructure β†’ Code (Terraform)
Configuration β†’ Code (Ansible)
Deployment β†’ Code (Azure DevOps YAML)

Everything is code. Everything is automated. Everything is reproducible.

πŸ”‘ Key Insight 2: Separation of Concerns Matters

Infrastructure team β‰  Application team
They're independent. They're scalable. They're clear.

This mirrors how large organizations actually work. You're learning enterprise patterns.

πŸ”‘ Key Insight 3: Production is Different From Development

Testing locally β‰  Works in production

Why?

  • Different permissions
  • Different network topology
  • Different resource constraints
  • Different failure modes

Understanding this gap is the difference between a developer and a DevOps engineer.

πŸ”‘ Key Insight 4: Debugging is 80% of the Job

Code works. Architecture makes sense. But something's wrong.

This is real DevOps. Being able to systematically debug, isolate, and fix production issues is worth more than knowing 10 cloud platforms.


🎯 Part 9: Practical Tips For Your Journey

πŸ’‘ Tip 1: Use Meaningful Names

# Bad
resource "aws_instance" "server" {
  instance_type = "t2.micro"
}

# Good
resource "aws_instance" "book_review_frontend" {
  instance_type = "t2.micro"
  tags = {
    Name = "book-review-frontend"
    Role = "web-server"
  }
}
Enter fullscreen mode Exit fullscreen mode

Future you will thank present you.

πŸ’‘ Tip 2: Default to Secure

# Bad: Allow all IPs
ingress {
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["0.0.0.0/0"]  # Dangerous!
}

# Good: Allow only what's needed
ingress {
  from_port       = 22
  to_port         = 22
  protocol        = "tcp"
  security_groups = [aws_security_group.bastion.id]  # Only bastion
}
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Tip 3: Version Everything

# README with versions used
- Terraform: 1.5.0
- Ansible: 2.14.1
- Python: 3.11
- Node: 18.x

# .gitignore to exclude state files
.tfstate
.tfstate.*
.tfvars (except .tfvars.example)
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Tip 4: Document Assumptions

# In variables.tf
variable "ssh_public_key_path" {
  description = "Path to SSH public key"
  type        = string
  default     = "~/.ssh/book-review-key.pub"
  # NOTE: Must match the key used to launch EC2 instances
}
Enter fullscreen mode Exit fullscreen mode

When someone runs this code next year, they'll know exactly what's expected.


πŸ“ˆ Part 10: Next Steps – The Path Forward

πŸš€ Where This Leads

Week 9 took you from "I know CI/CD exists" to "I can build production infrastructure and automate its deployment."

Next natural steps:

  1. Containerization (Docker):

    • Package applications instead of deploying raw code
    • Consistent environment across dev β†’ prod
  2. Orchestration (Kubernetes):

    • Run containers at scale
    • Automatic scaling, self-healing
  3. Monitoring and Logging:

    • CloudWatch, Prometheus, ELK Stack
    • Know what's happening in production
  4. Security Hardening:

    • Secrets management (HashiCorp Vault, AWS Secrets Manager)
    • Infrastructure security scanning
    • Compliance automation
  5. Advanced Patterns:

    • Blue-green deployments (zero downtime)
    • Canary releases (gradual rollout)
    • GitOps (Git as source of truth for infrastructure)

🎬 Conclusion: Week 9 Reflection

This week was transformational. I went from clicking buttons in consoles to writing code that provisions, configures, and deploys entire systems.

The three projects built a coherent narrative:

  • Project 1: Learn pipeline mechanics
  • Project 2: Apply them to real infrastructure
  • Project 3: Debug like a real DevOps engineer

I've learned that DevOps isn't magic. It's:

  • Clear thinking (what's the desired state?)
  • Systematic debugging (where's the failure?)
  • Continuous improvement (how can we do this better?)

Most importantly, I've learned that automation is a mindset. If you're doing it manually twice, automate it. Your future self will be grateful.


πŸ™ Week 9 Learning Outcomes

outcms


πŸ“š Hashtags

#Azure DevOps #CI/CD #InfrastructureAsCode #Terraform #Ansible #AWS #DevOps #CloudArchitecture #Automation #Pipeline #Production #AzurePipelines #YAML #GitOps #MultiStageDeployment #LearningJourney #TechCommunity


This is week 9 of 12 of a free DevOps cohort. In continuation of 🧩Ansible Roles Unleashed: From Ad-Hoc Automation to Production-Grade Cloud Deployments [Week-8] πŸš€

Your learning doesn't end hereβ€”it accelerates. The patterns you've learned this week are the foundation for everything that comes next. Keep shipping. Keep learning.


Top comments (0)