π Introduction: Why This Week Changed Everything
Picture this: You're pushing code to GitHub. Minutes later, it's running in production. No manual SSH sessions. No "it works on my machine" moments. No 2 AM deployments gone wrong. That's not fantasyβthat's Azure DevOps CI/CD Pipelines in action.
This week, I went from basic pipeline awareness to orchestrating three interconnected projects that took my DevOps journey from theoretical to battle-tested production-grade. We're talking infrastructure provisioning, configuration management, multi-stage automation, real error debugging, and end-to-end deployment workflows. This isn't "hello world" territory anymore.
Let me walk you through what I discovered, the walls I hit, and how I demolished them.
π― Understanding the Week 9 Journey: Three Projects, One Cohesive Story
Before diving into the deep end, let's map the landscape. Week 9 consists of three projects that build on each other like a DevOps Jenga tower:
- Project 1: React App CI/CD Pipeline to Nginx (foundational multi-stage automation)
- Project 2: Book Review App on AWS Infrastructure (full-stack IaC + configuration management)
- Project 3: Advanced Troubleshooting & Production Hardening (real-world debugging patterns)
The beauty? They're not isolated. Each project adds a layer. Project 1 teaches you pipeline mechanics. Project 2 teaches you infrastructure-application orchestration. Project 3 teaches you how to survive when things breakβwhich is 80% of DevOps.
π Part 1: The Foundation β What is Azure DevOps, Really?
π€ Question I Asked Myself: "Why Do We Need Azure DevOps When GitHub Exists?"
Good question. Here's the honest answer:
GitHub is version control. Azure DevOps is the entire delivery orchestration platform. It's like comparing a calendar to a full project management system.
What Azure DevOps Actually Solves
Imagine deploying code manually:
Developer writes code
β
Developer emails "deploy please"
β
DevOps engineer SSHs into server
β
DevOps engineer runs scripts
β
DevOps engineer tests manually
β
App is live (hopefully)
Azure DevOps eliminates this:
Developer pushes to main branch
β
Azure Pipeline triggers automatically
β Build stage (compile, package)
β Test stage (run unit tests)
β Publish stage (create artifact)
β Deploy stage (push to production)
β App is live (verified by tests)
No human intervention. No email chains. No 2 AM scrambles. Just velocity + reliability.
Core Components You MUST Understand
π· Pipelines: Automated workflows that execute when you push code
π· Stages: Logical groupings (Build β Test β Deploy)
π· Jobs: Collections of tasks within a stage
π· Tasks: Individual actions (compile, test, copy files)
π· Artifacts: Build outputs (compiled binaries, Docker images, static files)
π· Service Connections: Secure authentication to external systems (AWS, GitHub, SSH servers)
ποΈ Part 2: Project 1 Assignment β The Gateway Drug to CI/CD
The Challenge: Deploy a React App via Multi-Stage Pipeline
What was tasked:
Build a 4-stage Azure DevOps pipeline that:
- Builds a React app (
npm install+npm run build) - Tests the app (
npm test) - Publishes build artifacts
- Deploys to an Nginx server via SSH
Why this matters: This is the canonical CI/CD pattern. Every enterprise uses this exact structure.
π΄ The Errors That Made Me Question Everything
Error #1: YAML Syntax β The AND vs and() Confusion
I wrote this and Azure DevOps exploded:
condition: succeeded('Build') AND succeeded('Test')
Error message:
Unexpected symbol AND. Located at position 20 within expression succeeded('Build') AND succeeded('Test')
What I didn't know: Azure DevOps uses function syntax, not C-style operators.
The fix:
condition: and(succeeded('Build'), succeeded('Test'))
Learning: This one error taught me that DevOps is very particular about syntax. One typo, one wrong operator, and the entire pipeline fails. There's no fuzzy matching.
Error #2: Permission Denied on Nginx Root (/var/www/html)
After the pipeline deployed files, the app returned 403 Forbidden. Why?
Because permissions were broken.
The flow was:
- Ubuntu user copies files (has write permission)
- Nginx runs as
www-datauser (doesn't have read permission) - Result: Nginx can't read files β 403 error
The three-step permission fix I discovered:
# BEFORE copying files
sudo chown -R ubuntu:ubuntu /var/www/html
sudo chmod 755 /var/www/html
# Copy files via Azure Pipeline
# AFTER copying files
sudo chown -R www-data:www-data /var/www/html
sudo chmod -R 755 /var/www/html
sudo chmod -R 644 /var/www/html/*
Why this matters: This permission handoff pattern is everywhere in DevOps. Different users, different roles, different permissions. You must think in layers.
β The Solution: Multi-Stage YAML Pipeline
Here's the working 4-stage pipeline:
trigger:
branches:
include:
- main
variables:
sshEndpoint: 'ubuntu-nginx-ssh'
artifactName: 'react_build'
webRoot: '/var/www/html'
stages:
# Stage 1: Build
- stage: Build
displayName: 'π¨ Build React App'
jobs:
- job: BuildJob
pool:
vmImage: 'SelfHostedPool'
steps:
- checkout: self
- task: NodeTool@0
inputs:
versionSpec: '18.x'
- script: npm install
displayName: 'Install Dependencies'
- script: npm run build
displayName: 'Build React App'
- publish: $(Build.SourcesDirectory)/build
artifact: $(artifactName)
# Stage 2: Test
- stage: Test
displayName: 'β
Test React App'
dependsOn: Build
condition: succeeded('Build')
jobs:
- job: TestJob
pool:
vmImage: 'SelfHostedPool'
steps:
- checkout: self
- task: NodeTool@0
inputs:
versionSpec: '18.x'
- script: npm install
- script: npm test -- --watchAll=false
displayName: 'Run Tests'
# Stage 3: Publish
- stage: Publish
displayName: 'π¦ Publish Artifact'
dependsOn: Test
condition: succeeded('Test')
jobs:
- job: PublishJob
pool:
vmImage: 'SelfHostedPool'
steps:
- download: current
artifact: $(artifactName)
- script: echo "Artifact published and ready for deployment"
# Stage 4: Deploy
- stage: Deploy
displayName: 'π Deploy to Nginx'
dependsOn:
- Build
- Test
condition: and(succeeded('Build'), succeeded('Test'))
jobs:
- job: DeployJob
pool:
vmImage: 'SelfHostedPool'
steps:
- download: current
artifact: $(artifactName)
# FIX 1: Set permissions for upload
- task: SSH@0
displayName: 'Set Write Permissions'
inputs:
sshEndpoint: $(sshEndpoint)
runOptions: 'inline'
inline: |
sudo chown -R ubuntu:ubuntu $(webRoot)
sudo chmod 755 $(webRoot)
# Copy files
- task: CopyFilesOverSSH@0
displayName: 'Copy React Build'
inputs:
sshEndpoint: $(sshEndpoint)
sourceFolder: '$(Pipeline.Workspace)/$(artifactName)'
contents: '**'
targetFolder: $(webRoot)
overwrite: true
# FIX 2: Set permissions for Nginx
- task: SSH@0
displayName: 'Set Nginx Permissions'
inputs:
sshEndpoint: $(sshEndpoint)
runOptions: 'inline'
inline: |
sudo chown -R www-data:www-data $(webRoot)
sudo chmod -R 755 $(webRoot)
sudo chmod -R 644 $(webRoot)/*
# Restart Nginx
- task: SSH@0
displayName: 'Restart Nginx'
inputs:
sshEndpoint: $(sshEndpoint)
runOptions: 'inline'
inline: |
sudo systemctl restart nginx
sudo systemctl status nginx
Key insight: Each stage depends on the previous one. If Build fails, Test never runs. This is atomic safetyβyou can't deploy broken code.
ποΈ Part 3: Project 2 β Infrastructure as Code + Application Deployment
The Challenge: Full 3-Tier Architecture on AWS
What was tasked:
Deploy a Book Review Application using:
- Terraform to provision infrastructure (VPC, EC2 instances, RDS database)
- Ansible to configure servers (install Nginx, Flask, MySQL)
- Azure DevOps to orchestrate both
This is where things got seriously complex.
π€ Question: "Why Separate Infrastructure and Application?"
In enterprise environments, infrastructure and application teams have different responsibilities:
-
Infrastructure Team (mine for this project):
- Provisions cloud resources (VMs, databases, networking)
- Manages security groups, firewalls
- Ensures infrastructure stability
- Changes infrequently (quarterly updates)
-
Application Team:
- Writes code, runs tests
- Deploys applications
- Changes frequently (multiple times per day)
Separating them means:
β
One team doesn't block the other
β
Each team owns their concerns
β
Easier rollback if something breaks
β
Clear accountability
π΄ The Infrastructure Challenges
Challenge #1: Database Version Mismatch Hell
I started with RDS (Relational Database Service) thinking it was the "easy" option.
Error:
Error: Creating DB instance: DBParameterGroupNotFound
Cannot find MySQL version 8.0.28 in region ap-south-1
Why: Azure region compatibility is finicky. Not all database versions are available in all regions.
The pivot: Instead of RDS, I deployed MySQL on an EC2 instance. Same database, no version drama. Plus, it's free-tier compatible.
Lesson learned: Sometimes the "managed" solution (RDS) creates more problems than a simple "unmanaged" solution (MySQL on EC2). Know when to keep it simple.
Challenge #2: Security Group Configuration
Three resources needed to communicate:
- Frontend EC2 (public subnet) β needs HTTP from internet
- Backend API (private subnet) β needs Flask port 5000 from frontend
- Database (private subnet) β needs MySQL port 3306 from backend
If any security group rule was wrong, nothing would talk to anything.
Example issue:
Backend EC2 can't connect to MySQL database
Root cause: RDS security group didn't have an inbound rule allowing port 3306 from backend EC2's security group.
The fix:
resource "aws_security_group_rule" "rds_from_backend" {
type = "ingress"
from_port = 3306
to_port = 3306
protocol = "tcp"
source_security_group_id = aws_security_group.backend.id # Allow from backend SG
security_group_id = aws_security_group.rds.id
}
Key concept: Security groups are stateful firewalls. Think of them as "who can talk to whom".
π· The Three-Tier Architecture I Built
βββββββββββββββββββββββββββββββββββββββββββ
β AWS VPC 10.0.0.0/16 β
ββββββββββββββββββββ¬βββββββββββββββββββββββ€
β Public Subnet β Private Subnets β
β 10.0.1.0/24 β 10.0.2.0/24 & ... β
ββββββββββββββββββββΌβββββββββββββββββββββββ€
β Frontend EC2 β Backend EC2 β
β Nginx + PHP β Flask API β
β Public IP β β Private IP only β
β Port 80 open β Port 5000 (internal)β
β to internet β β
ββββββββββββββββββββΌβββββββββββββββββββββββ
β
βΌ
Database EC2 / RDS
MySQL 5.7.44
Port 3306 (internal)
Why this layout:
- Frontend must be public (users visit it)
- Backend must be private (only frontend talks to it)
- Database must be private (only backend talks to it)
- Security groups enforce these rules
π’ The Terraform Infrastructure Code
I'll spare you 500 lines, but here's the conceptual structure:
# Define VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
}
# Public subnet for frontend
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
map_public_ip_on_launch = true
}
# Private subnets for backend and database
resource "aws_subnet" "private_1" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.2.0/24"
}
resource "aws_subnet" "private_2" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.3.0/24"
}
# Internet Gateway for public subnet
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
}
# Route public subnet traffic to internet gateway
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
}
# Security groups
resource "aws_security_group" "frontend" {
name_prefix = "frontend-"
vpc_id = aws_vpc.main.id
# Allow HTTP from anywhere
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "backend" {
name_prefix = "backend-"
vpc_id = aws_vpc.main.id
# Allow Flask port from frontend SG only
ingress {
from_port = 5000
to_port = 5000
protocol = "tcp"
security_groups = [aws_security_group.frontend.id]
}
}
resource "aws_security_group" "rds" {
name_prefix = "rds-"
vpc_id = aws_vpc.main.id
# Allow MySQL from backend SG only
ingress {
from_port = 3306
to_port = 3306
protocol = "tcp"
security_groups = [aws_security_group.backend.id]
}
}
# EC2 instances
resource "aws_instance" "frontend" {
ami = "ami-02b8269d5e85954ef" # Ubuntu 24.04
instance_type = "t2.micro"
subnet_id = aws_subnet.public.id
vpc_security_group_ids = [aws_security_group.frontend.id]
key_name = aws_key_pair.deployer.key_name
}
resource "aws_instance" "backend" {
ami = "ami-02b8269d5e85954ef" # Ubuntu 24.04
instance_type = "t2.micro"
subnet_id = aws_subnet.private_1.id
vpc_security_group_ids = [aws_security_group.backend.id]
key_name = aws_key_pair.deployer.key_name
}
# RDS Database
resource "aws_db_instance" "mysql" {
identifier = "book-review-mysql"
engine = "mysql"
instance_class = "db.t2.micro"
allocated_storage = 20
username = "admin"
password = var.db_password
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.rds.id]
skip_final_snapshot = true
}
# Outputs for Ansible
output "frontend_public_ip" {
value = aws_instance.frontend.public_ip
}
output "backend_private_ip" {
value = aws_instance.backend.private_ip
}
output "rds_endpoint" {
value = aws_db_instance.mysql.endpoint
}
This is the power of IaC: One file, version-controlled, reproducible infrastructure. Want to tear it all down? terraform destroy. Want to recreate it? terraform apply. No clicking in the AWS console for 30 minutes.
π Ansible: Configuration Management
Once infrastructure exists, you need to configure it. That's Ansible's job.
Question I asked: "Why not just SSH in and install everything manually?"
Because:
- Manual steps aren't reproducible
- Humans make mistakes
- There's no audit trail
- You can't scale (do this 100 times? Good luck)
Ansible solution:
---
- name: Common setup
hosts: all
become: yes
tasks:
- name: Update apt cache
apt:
update_cache: yes
- name: Install system packages
apt:
name:
- git
- curl
- wget
- python3-pip
state: present
- name: Setup Database
hosts: database
become: yes
tasks:
- name: Install MySQL
apt:
name: mysql-server
state: present
- name: Start MySQL
systemd:
name: mysql
state: started
enabled: yes
- name: Setup Backend API
hosts: backend
become: yes
tasks:
- name: Install Python packages
pip:
name:
- flask
- flask-cors
- pymysql
- gunicorn
- name: Clone backend repo
git:
repo: https://github.com/pravinmishra/book-review-app.git
dest: /app/backend
version: main
- name: Create systemd service for API
copy:
dest: /etc/systemd/system/api.service
content: |
[Unit]
Description=Book Review API
After=network.target
[Service]
Type=simple
User=ubuntu
WorkingDirectory=/app/backend
ExecStart=/usr/bin/python3 -m gunicorn -w 4 -b 0.0.0.0:5000 app:app
Restart=always
[Install]
WantedBy=multi-user.target
- name: Start API service
systemd:
name: api
state: started
enabled: yes
- name: Setup Frontend Web Server
hosts: frontend
become: yes
tasks:
- name: Install Nginx
apt:
name: nginx
state: present
- name: Clone frontend repo
git:
repo: https://github.com/pravinmishra/book-review-app.git
dest: /var/www/html
version: main
- name: Create Nginx config
copy:
dest: /etc/nginx/sites-available/default
content: |
server {
listen 80 default_server;
root /var/www/html;
index index.php index.html;
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_pass unix:/var/run/php/php-fpm.sock;
}
}
- name: Restart Nginx
systemd:
name: nginx
state: restarted
enabled: yes
Key concept: Ansible is idempotent. Running the playbook once, twice, or 100 times produces the same result. It checks "is MySQL installed?" before installing.
β οΈ Part 4: Project 3 β Debugging and Production Hardening
The Reality: Things Break
From this project, I realized real DevOps is debugging.
π΄ Real Errors I Encountered
Error: Terraform State Corruption
After multiple terraform apply failures, the state file got confused:
Error: resource already exists
Root cause: State file tracked resources that no longer existed in AWS
The fix:
terraform state list # See what Terraform thinks exists
terraform state show module.database.aws_db_instance.mysql # Check one resource
terraform state rm module.database.aws_db_instance.mysql # Remove it from state
terraform refresh # Sync state with actual AWS
terraform apply # Re-create the resource
Lesson: Terraform state is the source of truth. If it gets corrupted, you manually fix it. This is why production uses remote state backends (S3 with locking).
Error: Pipeline Hangs on Long-Running Deployment
MySQL took 4+ minutes to deploy. Azure DevOps timeout configuration became critical.
The fix:
- task: TerraformTaskV4@4
inputs:
provider: aws
command: apply
commandOptions: -auto-approve
timeoutInMinutes: 20 # Increase from default 60
Error: Permissions Nightmare in Multi-Layer Deployment
When Terraform created resources, Ansible couldn't SSH into them because:
- Terraform created the key pair but didn't output the private key path correctly
- Ansible inventory had the wrong private key file location
The fix: Hardcoded the path in Ansible config:
[all]
backend ansible_host=10.0.2.15 ansible_user=ubuntu ansible_private_key_file=~/.ssh/book-review-key.pem
β Production Hardening Checklist
After this project assignment, I created a hardening checklist:
π Security:
- β Security groups follow least-privilege (only necessary ports open)
- β Private databases not exposed to internet
- β SSH keys stored securely in Azure DevOps Secure Files
- β Credentials never in code (use environment variables)
π§ Infrastructure:
- β Terraform state in remote backend (S3) with locking (DynamoDB)
- β Multiple availability zones for high availability
- β Database backups enabled
- β Monitoring and alerts configured
π CI/CD:
- β All stages have dependencies (prevent bad deployments)
- β Manual approval gates before production
- β Artifact versioning for rollback capability
- β Comprehensive logging for debugging
π§ Part 5: Deep Dive β Concepts I Mastered
Concept 1: Idempotency
Definition: Running an operation multiple times produces the same result.
Example β Not Idempotent:
# If run twice, increments counter twice
count=0
count=$((count + 1))
echo $count # First run: 1, Second run: 2
Example β Idempotent:
# If run twice, ensures state is "installed"
if ! command -v terraform &> /dev/null; then
install_terraform
fi
# First run: installs, Second run: skips
Why this matters: Ansible operations are idempotent. Running a playbook twice is safe. You can re-run deployments without fear of double-applying changes.
Concept 2: Infrastructure as Code State Files
Terraform maintains a terraform.tfstate file:
{
"version": 4,
"terraform_version": "1.5.0",
"resources": [
{
"type": "aws_instance",
"name": "frontend",
"instances": [
{
"id": "i-0f5b3a1997b955765",
"attributes": {
"instance_type": "t2.micro",
"public_ip": "13.234.56.78"
}
}
]
}
]
}
This file is sacred. It maps Terraform code to actual AWS resources. If you lose it, Terraform thinks resources don't exist and creates duplicates.
Production solution: Store state in remote backend:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "book-review/terraform.tfstate"
region = "ap-south-1"
dynamodb_table = "terraform-locks" # Prevents concurrent applies
encrypt = true
}
}
Concept 3: Artifact-Driven Deployment
Instead of deploying raw code, deploy built artifacts.
Pattern:
Code (source files)
β
Build Stage: Compile to artifact
β
Artifact (binary, Docker image, or static files)
β
Publish Stage: Store artifact in repository
β
Deploy Stage: Download and run artifact (no recompilation)
Benefits:
- β Compile once, deploy many times
- β Smaller deployment size
- β Faster deployments
- β Reproducible (same artifact = same behavior)
Concept 4: Service Connections β The Bridge Between Systems
Azure DevOps needs to authenticate to external systems. Service Connections are how.
Example: Deploying to an Nginx server requires SSH authentication.
- task: SSH@0
displayName: 'Deploy to Server'
inputs:
sshEndpoint: 'ubuntu-nginx-ssh' # Service connection name
runOptions: 'inline'
inline: |
# Commands execute on remote server
Behind the scenes:
- Azure DevOps stores SSH credentials securely
- When task runs, Azure DevOps provides credentials to agent
- Agent connects to server using stored credentials
- Commands execute on remote server
- Credentials are never logged or visible
π Part 6: Real-World Impact β Time, Cost, Quality
β±οΈ Time Savings
Before Pipeline (Manual Deployment):
- Developer finishes code: 1 hour
- Waits for DevOps engineer: 2-4 hours
- DevOps SSH into server: 5 minutes
- Manually copy files: 5 minutes
- Restart services: 3 minutes
- Test in browser: 5 minutes
- Total: 2-4+ hours
With Pipeline (Automated Deployment):
- Developer finishes code: 1 hour
- Push to main branch: 30 seconds
- Pipeline runs automatically: 8 minutes
- Build: 3 minutes
- Test: 2 minutes
- Deploy: 3 minutes
- Total: 1 hour 9 minutes
Savings: 1-3 hours per deployment
If your team deploys 5 times per day: 5-15 hours saved per day
π° Cost Optimization
AWS Free Tier: 12 months of free usage
| Component | Original Cost | Free Tier Cost |
|---|---|---|
| Frontend EC2 (t2.micro) | $10/month | $0 |
| Backend EC2 (t2.micro) | $10/month | $0 |
| Database (MySQL on EC2) | $0 (free tier) | $0 |
| Networking (VPC, IGW, NAT) | ~$5/month | $0 |
| Total | ~$25/month | $0/month |
Post-free tier: ~$25/month for the same infrastructure
Manual approach (no IaC):
- Senior DevOps engineer time to build: 40 hours @ $150/hour = $6,000
- Configuration drift, manual fixes, rebuilds: $2,000/year
- Total cost: $8,000+ one-time, plus ongoing maintenance
Infrastructure as Code approach:
- Initial Terraform code: 8 hours @ $100/hour = $800
- Reusable for unlimited deployments: $0 additional
- Version-controlled, fully reproducible: $0 maintenance
- Total cost: $800 one-time, plus $25/month infrastructure
ROI: Infrastructure cost drops 95%. Time to deploy drops 90%. Reproducibility increases infinitely.
π― Quality Improvements
Automated Testing Prevents Bugs:
- Pipeline catches unit test failures before deployment
- No broken code reaches production
- Confidence increases (tests pass = app works)
Consistent Deployments:
- Same steps every time = same result every time
- No "it works for me" scenarios
- Rollback is one button click (re-run previous artifact version)
π¨ Part 7: Critical Debugging Patterns I Learned
Pattern 1: Isolate the Layer
When something breaks:
Is it code? (Test locally: npm test)
No β Is it build? (Check npm run build output)
No β Is it deployment? (SSH to server, check files)
No β Is it permissions? (ls -l, check ownership)
No β Is it service? (systemctl status)
Each layer can be debugged independently.
Pattern 2: Check Logs Everywhere
Azure DevOps pipeline logs
β
SSH to server, check systemctl status
β
Application logs (/var/log/app.log)
β
Nginx logs (/var/log/nginx/error.log)
β
Database logs (SELECT ... FROM mysql.error_log)
The bug is always in the last place you look. So check everywhere systematically.
Pattern 3: Reproduce Locally First
Before running in pipeline:
# Reproduce the failure locally
npm run build # Does it succeed?
npm test # Do tests pass?
ssh ubuntu@13.234.56.78 # Can I connect?
curl http://13.234.56.78 # Does the app respond?
If it fails locally, fix it locally. Don't debug via pipeline.
π Part 8: What Week 9 Taught Me β The Big Picture
π Key Insight 1: DevOps is About Removing Friction
Infrastructure β Code (Terraform)
Configuration β Code (Ansible)
Deployment β Code (Azure DevOps YAML)
Everything is code. Everything is automated. Everything is reproducible.
π Key Insight 2: Separation of Concerns Matters
Infrastructure team β Application team
They're independent. They're scalable. They're clear.
This mirrors how large organizations actually work. You're learning enterprise patterns.
π Key Insight 3: Production is Different From Development
Testing locally β Works in production
Why?
- Different permissions
- Different network topology
- Different resource constraints
- Different failure modes
Understanding this gap is the difference between a developer and a DevOps engineer.
π Key Insight 4: Debugging is 80% of the Job
Code works. Architecture makes sense. But something's wrong.
This is real DevOps. Being able to systematically debug, isolate, and fix production issues is worth more than knowing 10 cloud platforms.
π― Part 9: Practical Tips For Your Journey
π‘ Tip 1: Use Meaningful Names
# Bad
resource "aws_instance" "server" {
instance_type = "t2.micro"
}
# Good
resource "aws_instance" "book_review_frontend" {
instance_type = "t2.micro"
tags = {
Name = "book-review-frontend"
Role = "web-server"
}
}
Future you will thank present you.
π‘ Tip 2: Default to Secure
# Bad: Allow all IPs
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # Dangerous!
}
# Good: Allow only what's needed
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
security_groups = [aws_security_group.bastion.id] # Only bastion
}
π‘ Tip 3: Version Everything
# README with versions used
- Terraform: 1.5.0
- Ansible: 2.14.1
- Python: 3.11
- Node: 18.x
# .gitignore to exclude state files
.tfstate
.tfstate.*
.tfvars (except .tfvars.example)
π‘ Tip 4: Document Assumptions
# In variables.tf
variable "ssh_public_key_path" {
description = "Path to SSH public key"
type = string
default = "~/.ssh/book-review-key.pub"
# NOTE: Must match the key used to launch EC2 instances
}
When someone runs this code next year, they'll know exactly what's expected.
π Part 10: Next Steps β The Path Forward
π Where This Leads
Week 9 took you from "I know CI/CD exists" to "I can build production infrastructure and automate its deployment."
Next natural steps:
-
Containerization (Docker):
- Package applications instead of deploying raw code
- Consistent environment across dev β prod
-
Orchestration (Kubernetes):
- Run containers at scale
- Automatic scaling, self-healing
-
Monitoring and Logging:
- CloudWatch, Prometheus, ELK Stack
- Know what's happening in production
-
Security Hardening:
- Secrets management (HashiCorp Vault, AWS Secrets Manager)
- Infrastructure security scanning
- Compliance automation
-
Advanced Patterns:
- Blue-green deployments (zero downtime)
- Canary releases (gradual rollout)
- GitOps (Git as source of truth for infrastructure)
π¬ Conclusion: Week 9 Reflection
This week was transformational. I went from clicking buttons in consoles to writing code that provisions, configures, and deploys entire systems.
The three projects built a coherent narrative:
- Project 1: Learn pipeline mechanics
- Project 2: Apply them to real infrastructure
- Project 3: Debug like a real DevOps engineer
I've learned that DevOps isn't magic. It's:
- Clear thinking (what's the desired state?)
- Systematic debugging (where's the failure?)
- Continuous improvement (how can we do this better?)
Most importantly, I've learned that automation is a mindset. If you're doing it manually twice, automate it. Your future self will be grateful.
π Week 9 Learning Outcomes
π Hashtags
#Azure DevOps #CI/CD #InfrastructureAsCode #Terraform #Ansible #AWS #DevOps #CloudArchitecture #Automation #Pipeline #Production #AzurePipelines #YAML #GitOps #MultiStageDeployment #LearningJourney #TechCommunity
This is week 9 of 12 of a free DevOps cohort. In continuation of π§©Ansible Roles Unleashed: From Ad-Hoc Automation to Production-Grade Cloud Deployments [Week-8] π
Your learning doesn't end hereβit accelerates. The patterns you've learned this week are the foundation for everything that comes next. Keep shipping. Keep learning.








Top comments (0)