DEV Community

Cover image for βš™οΈ Ansible Roles Unleashed: From Ad-Hoc Automation to Production-Grade Cloud Deployments [Week-8] πŸš€
Suvrajeet Banerjee
Suvrajeet Banerjee Subscriber

Posted on

βš™οΈ Ansible Roles Unleashed: From Ad-Hoc Automation to Production-Grade Cloud Deployments [Week-8] πŸš€

πŸ“– Introduction: The Tale of Two Revolutions

When you first provision a cloud VM, it's like moving into a bare house. The structure is there (thanks, Terraform), but it's empty. Someone still needs to paint the walls, install the plumbing, and wire the electricity. That someone, in the DevOps world, is Ansible.

This post marks Week 8 of our 12-week DevOps Micro Internship Cohort, where I journeyed through HandsOn assignment projects β€” that collectively taught me how to go from clicking buttons in Azure Portal to orchestrating thousands of infrastructure deployments with a few lines of YAML.

Let me ask myself the critical questions first, then unfold the answers:

❓ What's the real difference between Terraform and Ansible?

❓ Why do we need ad-hoc commands when we have playbooks?

❓ How do Ansible roles turn chaos into reusable components?

❓ What causes 70% of real-world DevOps failures?

By the end of this post, you'll have answers to all of these questions β€”backed by real errors I faced, how I debugged, and what lessons I learned from 'em.


πŸ›οΈ PART 1: The Foundation β€” Infrastructure as Code with Terraform

Section 1.1: From Portal Clicks to Code

Question: Why does clicking buttons in Azure Portal feel like the enemy?

Answer: Because it doesn't scale. Imagine deploying 100 identical servers. Clicking a button 100 times is human; clicking it 1 time to provision 100 servers is DevOps.

mindset

πŸ›’ Provision 4 Azure VMs using Terraform.

The naive approach would be writing 4 separate azurerm_linux_virtual_machine resource blocks. The smarter approach? Use the count parameter:

resource "azurerm_public_ip" "vms" {
  count               = var.vm_count
  name                = "pip-vm${count.index}-${var.environment}"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_linux_virtual_machine" "vms" {
  count = var.vm_count
  name  = "vm-${var.environment}-${count.index}"
  # ... rest of config
}
Enter fullscreen mode Exit fullscreen mode

This single block, repeated via count, creates N VMs dynamically. Change vm_count from 4 to 3, and Terraform automatically adds/removes resources while maintaining state. That's not just convenienceβ€”that's reproducibility at scale.

Section 1.2: The Azure Quota Lesson

The Error I Hit:

Error: "PublicIPCountLimitReached": Cannot create more than 3 public IP addresses 
for this subscription in this region.
Enter fullscreen mode Exit fullscreen mode

What Happened:
I provisioned 4 VMs, each with a public IP. Azure Free Tier allows only 3 public IPs per region. My infrastructure demand exceeded the platform's constraints.

The Debugging Process:

# Step 1: Validate quota
az network list-usages --location centralindia \
  --query "[?localName=='Public IP Addresses']"

# Output showed:
# "name": { "value": "PublicIPAddresses" },
# "limit": 3,
# "currentValue": 3
Enter fullscreen mode Exit fullscreen mode

The Fix:
Edited variables.tf:

variable "vm_count" {
  default = 3  # Changed from 4 to 3
}
Enter fullscreen mode Exit fullscreen mode

Reapplied:

terraform plan
terraform apply
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Key Learning: Infrastructure constraints are real. They're not bugs; they're limits. Always read your platform's quotas before coding. Free tiers teach discipline.


πŸ”‘ PART 2: Connecting Machines β€” The SSH Key Saga

Section 2.1: Why Passwordless SSH Matters

Question: Why do we obsess over SSH keys in DevOps?

Answer: Because they're the gateway to everything. If you can SSH to a machine, you can deploy, configure, or destroy it. Passwords are dinosaurs; keys are present-day security.

ssh

πŸ›’ Set up passwordless SSH to 4 Azure VMs.

The naive approach: Generate a key, hope it works.

The professional approach: Generate a key, validate the format, ensure the VM accepts it, test the connection.

Section 2.2: The SSH Key Format Error

The Error I Hit:

Error: "admin_ssh_key.0.public_key" is not a complete SSH2 Public Key
Enter fullscreen mode Exit fullscreen mode

Root Cause:
I passed the file path to Terraform instead of the file contents.

# WRONG:
terraform apply -var="ssh_public_key=~/.ssh/id_rsa.pub"

# RIGHT:
terraform apply -var="ssh_public_key=$(cat ~/.ssh/id_rsa.pub)"
Enter fullscreen mode Exit fullscreen mode

The Debugging Process:

# Step 1: Inspect what Terraform received
terraform console
> var.ssh_public_key
=> "~/.ssh/id_rsa.pub"  # <- TILDE NOT EXPANDED!

# Step 2: Generate the actual key content
cat ~/.ssh/id_rsa.pub
# Output: ssh-rsa AAAAB3NzaC1yc2E... user@machine

# Step 3: Use $(...) to inject the content
terraform apply -var="ssh_public_key=$(cat ~/.ssh/id_rsa.pub)"
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Key Learning: Variables are strings until evaluated. Terraform won't expand ~ unless told explicitly. Always inspect your variable values in terraform console.

Section 2.3: Testing SSH Connectivity

Once deployed, I verified access:

# Get public IP
PUBLIC_IP=$(terraform output -raw public_ip)

# Test SSH (no password!)
ssh azureuser@$PUBLIC_IP hostname

# Output: vm-demo-0
Enter fullscreen mode Exit fullscreen mode

πŸ’Š This single command confirms: VM is reachable, SSH key is accepted, and we have command execution. It's the foundation for all automation that follows.


πŸ“‹ PART 3: Ad-Hoc Automation β€” The Speed Advantage

Section 3.1: What Are Ad-Hoc Commands?

Question: Why write a playbook when you can run a single command?

Answer: Sometimes you don't. Ad-hoc commands are your rapid-response team for:

  • Quick checks (is nginx running?)
  • Emergency fixes (restart a service)
  • One-time deployments (install a package)

timeline

πŸ›’ Ansible ad-hoc commands on 3 VMs:

# Update all packages
ansible web -i inventory.ini -m ansible.builtin.apt \
  -a "update_cache=yes" --become

# Output:
# [OK] vm-demo-0
# [OK] vm-demo-1
# [OK] vm-demo-2
Enter fullscreen mode Exit fullscreen mode

In 5 seconds, I updated package caches on 3 servers. Try doing that manually. I'll wait. 😊

Section 3.2: Building inventory.ini for Dynamic Hosts

The magic link between Terraform and Ansible is inventory.ini:

[web]
52.172.1.10
52.172.1.11
52.172.1.12

[all:vars]
ansible_user=azureuser
ansible_ssh_private_key_file=~/.ssh/id_rsa
ansible_ssh_common_args='-o StrictHostKeyChecking=accept-new'
Enter fullscreen mode Exit fullscreen mode

This file tells Ansible:

  • Which servers to target ([web] group)
  • How to connect (SSH with this user and key)
  • Trust new hosts automatically (first connection safety)

I generated this dynamically:

# Get IPs from Terraform output
PUBLIC_IPS=$(terraform output -json public_ips)

# Build inventory
cat > inventory.ini << EOF
[web]
$(echo $PUBLIC_IPS | jq -r '.[]')

[all:vars]
ansible_user=azureuser
ansible_ssh_private_key_file=~/.ssh/id_rsa
EOF
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Key Learning: Repeatable automation starts with repeatable data. Inventory files are that data. Generate them programmatically; never hardcode IPs.

Section 3.3: Ad-Hoc Commands for System Discovery

I used ad-hoc commands to validate my infrastructure:

# Check uptime
ansible web -i inventory.ini -m ansible.builtin.command -a "uptime"

# Check disk usage
ansible web -i inventory.ini -m ansible.builtin.command -a "df -h"

# Check listening ports
ansible web -i inventory.ini -m ansible.builtin.command -a "netstat -tlnp"
Enter fullscreen mode Exit fullscreen mode

πŸ’Š These one-liners became my infrastructure audit. Before writing any playbooks, I knew:

  • All servers are online
  • They have enough disk space
  • No unexpected services are running

🎭 PART 4: Multi-Play Orchestration β€” The Playbook Revolution

Section 4.1: Why Playbooks Beat Ad-Hoc Commands

Question: If ad-hoc is faster, why do playbooks exist?

Answer: Reusability. A playbook written today runs identically tomorrow, on 3 VMs or 3,000 VMs, with zero drift.

evol

πŸ›’ Multi-play playbooks for static web deployment:

---
# Play 1: Install and configure Nginx
- name: "Install Nginx"
  hosts: web
  become: true
  tasks:
    - name: Update package lists
      ansible.builtin.apt:
        update_cache: true

    - name: Install Nginx
      ansible.builtin.apt:
        name: nginx
        state: present

# Play 2: Deploy content
- name: "Deploy Static Content"
  hosts: web
  become: true
  tasks:
    - name: Copy index.html
      ansible.builtin.copy:
        src: files/index.html
        dest: /var/www/html/index.html
        owner: www-data
        group: www-data
        mode: '0644'

# Play 3: Verify
- name: "Verify Deployment"
  hosts: localhost
  tasks:
    - name: Test HTTP connectivity
      ansible.builtin.uri:
        url: "http://{{ item }}"
        status_code: 200
      loop: "{{ groups['web'] }}"
Enter fullscreen mode Exit fullscreen mode

Why Three Plays?

  1. Play 1 runs on all web servers, installs nginx
  2. Play 2 runs on all web servers, deploys content
  3. Play 3 runs on localhost (your machine), verifies each server responds with HTTP 200

πŸ’Š If Play 1 fails, Play 2 never runs. This fail-fast approach prevents half-deployed states. This is reliability.

Section 4.2: The strftime Filter Saga

The Error:

Error while resolving value for 'msg': The filter plugin 'ansible.builtin.strftime' 
failed: Invalid value for epoch value (%Y-%m-%d %H:%M:%S)
Enter fullscreen mode Exit fullscreen mode

What Happened:
I tried to format a file's modification time (mtime) using the strftime filter:

- name: Print file stats
  ansible.builtin.debug:
    msg: "Modified: {{ file_stat.stat.mtime | int | strftime('%Y-%m-%d %H:%M:%S') }}"
Enter fullscreen mode Exit fullscreen mode

The issue: file_stat.stat.mtime returned a value that couldn't be cleanly cast to an integer for strftime. Different Linux distributions return mtime in different formats.

The Fix:
Added defensive filtering:

- name: Print file stats
  ansible.builtin.debug:
    msg: |
      βœ“ File deployed:
        Path: {{ file_stat.stat.path | default('unknown') }}
        Modified (epoch): {{ file_stat.stat.mtime | default('unknown') }}
Enter fullscreen mode Exit fullscreen mode

By using default(), I ensured the playbook wouldn't fail if mtime was missing or malformed. The output might say "unknown," but the playbook continues.

πŸ’‘ Key Learning: Production code assumes inputs may be invalid. Always handle edge cases gracefully.


🌐 PART 5: Git-Based Deployment β€” The Dynamic Content Era

Section 5.1: Cloning Repositories During Deployment

Question: How do we deploy applications that live in GitHub?

Answer: We let Ansible clone them directly during provisioning.

git

πŸ›’ Terraform + Ansible for end-to-end deployment:

# Step 1: Provision VM with Terraform
terraform apply

# Step 2: Get IP and build inventory
PUBLIC_IP=$(terraform output -raw public_ip)

# Step 3: Deploy app with Ansible
ansible-playbook -i inventory.ini site.yml
Enter fullscreen mode Exit fullscreen mode

The Ansible playbook included a git clone task:

- name: Clone Mini Finance Repository
  ansible.builtin.git:
    repo: "https://github.com/suvrajeetbanerjee/mini_finance.git"
    dest: /var/www/html
    version: main
    depth: 1
    force: true
Enter fullscreen mode Exit fullscreen mode

Parameters:

  • repo: GitHub URL (public repo, no auth needed)
  • dest: Where to clone on the target VM
  • version: Branch or tag to checkout
  • depth: 1: Shallow clone (latest commit only, faster)
  • force: true: Overwrite if directory exists

Section 5.2: The Permission Boundary Problem

The Error:

fatal: [74.225.240.117]: FAILED! => 
{
  "msg": "Unexpected AnsibleActionFail error: Could not find or access '/tmp/epicbook_clone/' 
  on the Ansible Controller."
}
Enter fullscreen mode Exit fullscreen mode

Root Cause:
I was trying to use the copy module to move files from /tmp/epicbook_clone/ (on the remote VM) to /var/www/html/ (also on the remote VM). But copy by default operates controller β†’ remote; it looked for the source on my local machine.

Initial (Wrong) Solution:

- name: Copy App Content
  ansible.builtin.copy:
    src: "/tmp/epicbook_clone/"
    dest: "/var/www/html/"
    owner: www-data
    mode: '0755'
  become: true
Enter fullscreen mode Exit fullscreen mode

This failed because Ansible couldn't find /tmp/epicbook_clone/ locally.

Correct Solution:

Use the command module to copy on the remote machine:

- name: Copy App Content
  ansible.builtin.command: >
    cp -r /tmp/epicbook_clone/* {{ epicbook_app_path }}/
  become: true

- name: Set Permissions Recursively
  ansible.builtin.file:
    path: "{{ epicbook_app_path }}"
    owner: www-data
    group: www-data
    mode: '0755'
    recurse: true
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Key Learning: Understand the execution context. Some Ansible modules run on the controller; others run on the remote. Know the difference, and you save hours of debugging.


🏭 PART 6: Ansible Roles β€” The Modularization Game-Changer

Section 6.1: What Are Ansible Roles?

Question: How do we scale from 1 playbook to 100 playbooks without chaos?

Answer: We decompose them into reusable, self-contained roles.

file structure

A role is a collection of:

  • tasks/ β€” What to do
  • handlers/ β€” React to changes
  • templates/ β€” Configuration files with variables
  • files/ β€” Static content to copy
  • vars/ β€” Default variables
  • defaults/ β€” Overridable defaults

πŸ’Š Learning to structure deployment as 3 independent roles:

roles/
β”œβ”€β”€ common/
β”‚   └── tasks/main.yml           # System updates, baseline packages
β”œβ”€β”€ nginx/
β”‚   β”œβ”€β”€ tasks/main.yml           # Install, configure Nginx
β”‚   └── templates/epicbook.conf.j2 # Nginx site config with Jinja2
└── epicbook/
    └── tasks/main.yml           # Clone repo, deploy app
Enter fullscreen mode Exit fullscreen mode

Section 6.2: Role Syntax and Execution

The site.yml that orchestrates all roles:

---
- name: "Play 1 - Common Role"
  hosts: web
  become: true
  roles:
    - common

- name: "Play 2 - Nginx Role"
  hosts: web
  become: true
  roles:
    - nginx

- name: "Play 3 - EpicBook Role"
  hosts: web
  become: true
  roles:
    - epicbook
Enter fullscreen mode Exit fullscreen mode

Execution Flow:

Play 1: Run all tasks in roles/common/tasks/main.yml on [web] hosts
  ↓
Play 2: Run all tasks in roles/nginx/tasks/main.yml on [web] hosts
  ↓
Play 3: Run all tasks in roles/epicbook/tasks/main.yml on [web] hosts
Enter fullscreen mode Exit fullscreen mode

πŸ’Š Each role is isolated, testable, and reusable. If I need the "common" role on a database server, I simply add it to that server's group.

Section 6.3: Inside roles/common/tasks/main.yml

---
- name: Update Package Lists
  ansible.builtin.apt:
    update_cache: true
    cache_valid_time: 3600

- name: Install Essential Packages
  ansible.builtin.apt:
    name:
      - curl
      - wget
      - git
      - python3
    state: present

- name: Set Hostname
  ansible.builtin.hostname:
    name: "epicbook-server"
Enter fullscreen mode Exit fullscreen mode

Why separate this into a role?

  • Every infrastructure deployment needs baseline updates
  • Every new server needs curl, git, Python
  • Other projects can reuse this exact role without modification
  • Changes to common baseline propagate everywhere

Section 6.4: Inside roles/nginx/tasks/main.yml

---
- name: Install Nginx
  ansible.builtin.apt:
    name: nginx
    state: present

- name: Create App Directory
  ansible.builtin.file:
    path: "{{ epicbook_app_path }}"
    state: directory
    owner: "{{ nginx_user }}"
    group: "{{ nginx_group }}"
    mode: '0755'

- name: Copy Nginx Configuration
  ansible.builtin.template:
    src: epicbook.conf.j2
    dest: /etc/nginx/sites-available/epicbook
    owner: root
    group: root
    mode: '0644'

- name: Enable Nginx Site
  ansible.builtin.file:
    src: /etc/nginx/sites-available/epicbook
    dest: /etc/nginx/sites-enabled/epicbook
    state: link

- name: Start And Enable Nginx
  ansible.builtin.service:
    name: nginx
    state: started
    enabled: true
Enter fullscreen mode Exit fullscreen mode

Notice the use of variables like {{ epicbook_app_path }} and {{ nginx_user }}. These are defined in group_vars/web.yml:

---
epicbook_app_repo: "https://github.com/pravinmishraaws/theepicbook.git"
epicbook_app_path: /var/www/epicbook
nginx_user: www-data
nginx_group: www-data
Enter fullscreen mode Exit fullscreen mode

Why? DRY (Don't Repeat Yourself). If I need to change the app path, I change it once in group_vars, and all 3 roles automatically use the new value.

Section 6.5: The Template Game β€” Jinja2 in roles/nginx/templates/epicbook.conf.j2

server {
    listen 80 default_server;
    listen [::]:80 default_server;

    server_name _;
    root {{ epicbook_app_path }};
    index index.html index.htm;

    location / {
        try_files $uri $uri/ =404;
    }

    error_page 404 /404.html;
    error_page 500 502 503 504 /50x.html;
}
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ The {{ epicbook_app_path }} variable is interpolated at deployment time. So if I deploy to /var/www/app1 on server-A and /var/www/app2 on server-B, the template automatically adapts for each server. That's the power of templates.


🎯 PART 7: Handlers β€” The Change-Reactive Automation

Section 7.1: What Are Handlers?

Question: How do we ensure services reload only when configuration changes?

Answer: Handlers. They're tasks that run only if a prior task reports a change.

handler

Example from roles/nginx/tasks/main.yml:

- name: Copy Nginx Configuration
  ansible.builtin.template:
    src: epicbook.conf.j2
    dest: /etc/nginx/sites-available/epicbook
  notify: Reload Nginx

- name: Enable Nginx Site
  ansible.builtin.file:
    src: /etc/nginx/sites-available/epicbook
    dest: /etc/nginx/sites-enabled/epicbook
    state: link
  notify: Reload Nginx

- name: Reload Nginx
  ansible.builtin.service:
    name: nginx
    state: reloaded
Enter fullscreen mode Exit fullscreen mode

Flow:

  1. Copy template task runs β†’ config file changes β†’ task reports changed: true
  2. Task sends notify: Reload Nginx
  3. At end of play, Nginx reload handler runs (once, even if notified multiple times)

Idempotency Example:

# First run: config changed, handler fires
CHANGED [web] β†’ Reload Nginx [web]

# Second run: config unchanged, handler doesn't fire
OK [web] β†’ (no reload)
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ This is intelligent automationβ€”reload only when needed, not on every run.


❌ PART 8: Real-World Errors and How I Fixed Them

Error 1: Parser Error from YAML Syntax

The Error:

unexpected parameter type in action: <class 'ansible.module_utils._internal._datatag._AnsibleTaggedList'>
Enter fullscreen mode Exit fullscreen mode

Cause:
I accidentally used list syntax where a string was expected:

# WRONG:
state: [link]  # or state: ['link']

# RIGHT:
state: link
Enter fullscreen mode Exit fullscreen mode

Fix:
Reviewed entire role with ansible-lint, corrected all such instances.

ansible-lint roles/nginx/tasks/main.yml
# Errors reported with line numbers; fixed each one
Enter fullscreen mode Exit fullscreen mode

Error 2: Risky File Permissions

The Error:

risky-file-permissions: File permissions unset or incorrect.
Enter fullscreen mode Exit fullscreen mode

Cause:
File operations lacked explicit mode parameter:

# WRONG:
- name: Copy config
  ansible.builtin.template:
    src: config.j2
    dest: /etc/nginx/sites-available/config

# RIGHT:
- name: Copy config
  ansible.builtin.template:
    src: config.j2
    dest: /etc/nginx/sites-available/config
    mode: '0644'
Enter fullscreen mode Exit fullscreen mode

Fix:
Added explicit mode to every file operation.

Error 3: Missing index.html (The 403 Forbidden)

The Error (Browser):

403 Forbidden
nginx/1.18.0 (Ubuntu)
Enter fullscreen mode Exit fullscreen mode

Cause:
Deployment succeeded, but /var/www/epicbook/index.html didn't exist. Nginx had no content to serve.

Debug:

ssh azureuser@VM_IP
ls -l /var/www/epicbook/
# Empty directory!
Enter fullscreen mode Exit fullscreen mode

Fix:
Verified upstream repo structure, ensured git clone captured all files:

ssh azureuser@VM_IP
find /tmp/epicbook_clone -name index.html
# Found index.html in repo root, so copy succeeded
Enter fullscreen mode Exit fullscreen mode

Created placeholder for testing:

echo '<h1>EpicBook Deployed!</h1>' | sudo tee /var/www/epicbook/index.html
sudo chown www-data:www-data /var/www/epicbook/index.html
sudo chmod 644 /var/www/epicbook/index.html
sudo systemctl reload nginx
Enter fullscreen mode Exit fullscreen mode

Browser now showed content βœ“


πŸš€ PART 9: Ansible Galaxy and Extending Your Roles

Section 9.1: What Is Ansible Galaxy?

Question: Do I need to write every role from scratch?

Answer: No. Ansible Galaxy is the package manager for rolesβ€”like npm for Node, pip for Python.

Accessing Galaxy:

# Search for nginx role
ansible-galaxy search nginx

# Install a community role
ansible-galaxy install geerlingguy.nginx

# Use it in a playbook
- hosts: web
  roles:
    - geerlingguy.nginx  # Uses defaults, or override with vars
Enter fullscreen mode Exit fullscreen mode

Section 9.2: When to Use Galaxy Roles vs. Custom Roles

Use Galaxy roles when:

  • βœ“ Community role exactly matches your needs (nginx, docker, postgresql)
  • βœ“ Role is well-maintained (100+ GitHub stars, recent updates)
  • βœ“ You want to avoid reinventing the wheel

Write custom roles when:

  • βœ“ Your deployment is unique (custom app, specific company standards)
  • βœ“ You need tight control over every detail
  • βœ“ Team knowledge is in-house code

My Approach:
I wrote custom roles because:

  1. This is a learning assignmentβ€”I needed to understand every line
  2. My requirements are specific (EpicBook, not generic nginx)
  3. Custom roles document my infrastructure decisions for future reference

πŸ“š PART 10: Documentation and References

Section 10.1: Where to Learn Ansible

Official Resources:

For This HandOn Assignments Demonstration:

Section 10.2: Ansible Execution End-to-End Flow Diagram

flow

[Control Node]
      ↓
1. Load playbook (site.yml)
      ↓
2. Parse variables (group_vars/web.yml)
      ↓
3. Resolve inventory (inventory.ini)
      ↓
4. For each play:
      β”œβ”€ For each host in group:
      β”‚  β”œβ”€ Connect via SSH
      β”‚  β”œβ”€ Execute tasks sequentially
      β”‚  β”œβ”€ If task notifies β†’ trigger handler
      β”‚  β”œβ”€ Collect results
      β”‚  └─ Close connection
      β”œβ”€ After all hosts complete:
      β”‚  β”œβ”€ Execute handlers (once per notified handler)
      β”‚  └─ Move to next play
      ↓
5. Generate play recap (OK/CHANGED/FAILED)
      ↓
[Fully Deployed, Idempotent Infrastructure]
Enter fullscreen mode Exit fullscreen mode

What This Shows:

  • Plays execute sequentially
  • Hosts within a play execute in parallel (faster!)
  • Handlers run after all tasks, not immediately
  • If any task fails, subsequent tasks/plays may be skipped (depending on failed_when, ignore_errors)

πŸŽ“ PART 11: Advanced Concepts I Learned

Section 11.1: Variable Precedence and Scoping

Question: If I define a variable in multiple places, which one wins?

Answer: Ansible has strict precedence (highest to lowest):

  1. Extra vars (--extra-vars)
  2. Task vars
  3. Block vars
  4. Play vars
  5. Group vars (specific group first)
  6. Host vars
  7. Default vars
  8. Discovered vars

Example:

# This wins (highest precedence)
ansible-playbook site.yml --extra-vars "epicbook_app_path=/custom/path"

# Falls back to group_vars/web.yml
epicbook_app_path: /var/www/epicbook

# Falls back to role defaults/
epicbook_app_path: /var/www/html
Enter fullscreen mode Exit fullscreen mode

Section 11.2: Conditionals and Loops

When to conditionally run tasks:

- name: Install packages if not exists
  ansible.builtin.apt:
    name: nginx
    state: present
  when: ansible_distribution == "Ubuntu"
Enter fullscreen mode Exit fullscreen mode

When to loop over lists:

- name: Install multiple packages
  ansible.builtin.apt:
    name: "{{ item }}"
    state: present
  loop:
    - curl
    - wget
    - git
Enter fullscreen mode Exit fullscreen mode

Section 11.3: Error Handling and Recovery

- name: Deploy app
  ansible.builtin.command: /opt/deploy.sh
  register: deploy_result
  ignore_errors: true  # Don't stop if this fails

- name: Notify on failure
  ansible.builtin.debug:
    msg: "Deployment failed: {{ deploy_result.stderr }}"
  when: deploy_result.rc != 0
Enter fullscreen mode Exit fullscreen mode

🏁 PART 12: Lessons from the Week

Section 12.1: The 70% Rule

My biggest realization: 70% of DevOps is not writing codeβ€”it's debugging permissions, networking, and assumptions.

In this week alone:

  • 30% time: Writing Terraform, Ansible, YAML
  • 70% time: Fixing SSH key formats, file permissions, missing files, YAML syntax errors

🎲 This is the craft. Embrace it.

Section 12.2: IaC is About Control, Not Typing

Terraform and Ansible aren't about saving keystrokes. They're about:

  • Reproducibility: Same code β†’ same result, always
  • Auditability: Git history shows who changed what and when
  • Scalability: 1 server or 1,000; same codebase
  • Safety: Mistakes are caught in terraform plan, not in production

Section 12.3: Ansible Roles Are Career Capital

Understanding roles is a superpower because:

  • Every production Ansible deployment uses them
  • They're portable across companies, industries, tech stacks
  • Once you master 1 role (nginx), you can architect 50 (docker, mysql, postgresql, etc.)

πŸ“Š PART 13: Visual Diagram of Complete Ansible Architecture

architec

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           CONTROL NODE (Your Machine)                   β”‚
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  site.yml (Main Playbook)                       β”‚  β”‚
β”‚  β”‚  - Play 1: common role                          β”‚  β”‚
β”‚  β”‚  - Play 2: nginx role                           β”‚  β”‚
β”‚  β”‚  - Play 3: epicbook role                        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                      ↓                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  inventory.ini   β”‚   group_vars/web.yml        β”‚  β”‚
β”‚  β”‚  [web]           β”‚   epicbook_app_path: /var.. β”‚  β”‚
β”‚  β”‚  52.172.1.10     β”‚   nginx_user: www-data      β”‚  β”‚
β”‚  β”‚  52.172.1.11     β”‚   epicbook_app_repo: https..β”‚  β”‚
β”‚  β”‚  52.172.1.12     β”‚                             β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                      ↓                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Ansible Parser                                 β”‚  β”‚
β”‚  β”‚  - Load playbook                                β”‚  β”‚
β”‚  β”‚  - Resolve variables                            β”‚  β”‚
β”‚  β”‚  - Parse roles                                  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                      ↓                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            SSH Connections (Port 22)
                ↙       ↓       β†˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  REMOTE VM1  β”‚ β”‚  REMOTE VM2  β”‚ β”‚  REMOTE VM3  β”‚
β”‚ 52.172.1.10  β”‚ β”‚ 52.172.1.11  β”‚ β”‚ 52.172.1.12  β”‚
β”‚              β”‚ β”‚              β”‚ β”‚              β”‚
β”‚ Execute:     β”‚ β”‚ Execute:     β”‚ β”‚ Execute:     β”‚
β”‚ β”œβ”€common     β”‚ β”‚ β”œβ”€common     β”‚ β”‚ β”œβ”€common     β”‚
β”‚ β”œβ”€nginx      β”‚ β”‚ β”œβ”€nginx      β”‚ β”‚ β”œβ”€nginx      β”‚
β”‚ └─epicbook   β”‚ β”‚ └─epicbook   β”‚ β”‚ └─epicbook   β”‚
β”‚              β”‚ β”‚              β”‚ β”‚              β”‚
β”‚ Result:      β”‚ β”‚ Result:      β”‚ β”‚ Result:      β”‚
β”‚ βœ“ App        β”‚ β”‚ βœ“ App        β”‚ β”‚ βœ“ App        β”‚
β”‚ βœ“ Running    β”‚ β”‚ βœ“ Running    β”‚ β”‚ βœ“ Running    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

🎬 Conclusion: From Learner to Production Engineer

This week transformed my understanding of automation. I didn't just learn tools; I learned the mindset:

Start simple. Ad-hoc commands teach you what's possible.

Scale progressively. Playbooks teach you repeatability.

Decompose ruthlessly. Roles teach you sustainability.

Automate everything. Terraform + Ansible together teach you true IaC.

The errors I facedβ€”SSH key formats, file permissions, missing filesβ€”aren't bugs. They're lessons. Every error message is Ansible telling you exactly what went wrong. Read it, fix it, commit the lesson.


πŸ“ Learning Outcomes

By completing HandsOn Assignments Practicals, I can now:

βœ… Provision cloud infrastructure using Terraform with zero manual clicks

βœ… Connect to servers via passwordless SSH with validated key formats

βœ… Use Ansible ad-hoc commands for rapid system checks and updates

βœ… Orchestrate multi-play deployments with handlers and idempotency

βœ… Deploy applications from GitHub repositories automatically

βœ… Architect reusable Ansible roles for any infrastructure need

βœ… Debug Ansible errors methodically and fix them with confidence

βœ… Scale from 1 VM to 1,000 VMs with identical, version-controlled code


πŸ™ Reflection and Next Steps

This journey is week 8 of 12 of our free DevOps Micro Internship Cohort, organized by Pravin Mishra sir πŸ™, in continuation of πŸ” Terraform Production Battle-Tested: Remote State, Workspaces & Full-Stack AWS Deployment [Week-7β€”P2] πŸš€.

Week 9 will dive into Azure DevOps. The skills built this weekβ€”roles, modular architecture, error handlingβ€”are the foundation for enterprise deployments.

To anyone reading this: If you can write a 3-role Ansible deployment, you've climbed the steepest hill. Everything else is refinement.


πŸ“š Resources for Further Learning


Thank you for reading! Drop your questions and learnings in the comments. Let's automate the world together! πŸš€

🏷️ Tags:

#Ansible #Terraform #DevOps #Azure #IaC #ConfigurationManagement #Automation #LearningJourney #AWS


πŸ™ Github Link

πŸ”— Ansible-HandsOn-Practical-Demonstration-DMI

Top comments (0)