Suvrajeet Banerjee

Posted on Nov 8

⚙️ Ansible Roles Unleashed: From Ad-Hoc Automation to Production-Grade Cloud Deployments [Week-8] 🚀

#ansible #automation #devops #aws

📖 Introduction: The Tale of Two Revolutions

When you first provision a cloud VM, it's like moving into a bare house. The structure is there (thanks, Terraform), but it's empty. Someone still needs to paint the walls, install the plumbing, and wire the electricity. That someone, in the DevOps world, is Ansible.

This post marks Week 8 of our 12-week DevOps Micro Internship Cohort, where I journeyed through HandsOn assignment projects — that collectively taught me how to go from clicking buttons in Azure Portal to orchestrating thousands of infrastructure deployments with a few lines of YAML.

Let me ask myself the critical questions first, then unfold the answers:

❓ What's the real difference between Terraform and Ansible?

❓ Why do we need ad-hoc commands when we have playbooks?

❓ How do Ansible roles turn chaos into reusable components?

❓ What causes 70% of real-world DevOps failures?

By the end of this post, you'll have answers to all of these questions —backed by real errors I faced, how I debugged, and what lessons I learned from 'em.

🏛️ PART 1: The Foundation — Infrastructure as Code with Terraform

Section 1.1: From Portal Clicks to Code

Question: Why does clicking buttons in Azure Portal feel like the enemy?

Answer: Because it doesn't scale. Imagine deploying 100 identical servers. Clicking a button 100 times is human; clicking it 1 time to provision 100 servers is DevOps.

🛢 Provision 4 Azure VMs using Terraform.

The naive approach would be writing 4 separate azurerm_linux_virtual_machine resource blocks. The smarter approach? Use the count parameter:

resource "azurerm_public_ip" "vms" {
  count               = var.vm_count
  name                = "pip-vm${count.index}-${var.environment}"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_linux_virtual_machine" "vms" {
  count = var.vm_count
  name  = "vm-${var.environment}-${count.index}"
  # ... rest of config
}

This single block, repeated via count, creates N VMs dynamically. Change vm_count from 4 to 3, and Terraform automatically adds/removes resources while maintaining state. That's not just convenience—that's reproducibility at scale.

Section 1.2: The Azure Quota Lesson

The Error I Hit:

Error: "PublicIPCountLimitReached": Cannot create more than 3 public IP addresses 
for this subscription in this region.

What Happened:
I provisioned 4 VMs, each with a public IP. Azure Free Tier allows only 3 public IPs per region. My infrastructure demand exceeded the platform's constraints.

The Debugging Process:

# Step 1: Validate quota
az network list-usages --location centralindia \
  --query "[?localName=='Public IP Addresses']"

# Output showed:
# "name": { "value": "PublicIPAddresses" },
# "limit": 3,
# "currentValue": 3

The Fix:
Edited variables.tf:

variable "vm_count" {
  default = 3  # Changed from 4 to 3
}

Reapplied:

terraform plan
terraform apply

💡 Key Learning: Infrastructure constraints are real. They're not bugs; they're limits. Always read your platform's quotas before coding. Free tiers teach discipline.

🔑 PART 2: Connecting Machines — The SSH Key Saga

Section 2.1: Why Passwordless SSH Matters

Question: Why do we obsess over SSH keys in DevOps?

Answer: Because they're the gateway to everything. If you can SSH to a machine, you can deploy, configure, or destroy it. Passwords are dinosaurs; keys are present-day security.

🛢 Set up passwordless SSH to 4 Azure VMs.

The naive approach: Generate a key, hope it works.

The professional approach: Generate a key, validate the format, ensure the VM accepts it, test the connection.

Section 2.2: The SSH Key Format Error

The Error I Hit:

Error: "admin_ssh_key.0.public_key" is not a complete SSH2 Public Key

Root Cause:
I passed the file path to Terraform instead of the file contents.

# WRONG:
terraform apply -var="ssh_public_key=~/.ssh/id_rsa.pub"

# RIGHT:
terraform apply -var="ssh_public_key=$(cat ~/.ssh/id_rsa.pub)"

The Debugging Process:

# Step 1: Inspect what Terraform received
terraform console
> var.ssh_public_key
=> "~/.ssh/id_rsa.pub"  # <- TILDE NOT EXPANDED!

# Step 2: Generate the actual key content
cat ~/.ssh/id_rsa.pub
# Output: ssh-rsa AAAAB3NzaC1yc2E... user@machine

# Step 3: Use $(...) to inject the content
terraform apply -var="ssh_public_key=$(cat ~/.ssh/id_rsa.pub)"

💡 Key Learning: Variables are strings until evaluated. Terraform won't expand ~ unless told explicitly. Always inspect your variable values in terraform console.

Section 2.3: Testing SSH Connectivity

Once deployed, I verified access:

# Get public IP
PUBLIC_IP=$(terraform output -raw public_ip)

# Test SSH (no password!)
ssh azureuser@$PUBLIC_IP hostname

# Output: vm-demo-0

💊 This single command confirms: VM is reachable, SSH key is accepted, and we have command execution. It's the foundation for all automation that follows.

📋 PART 3: Ad-Hoc Automation — The Speed Advantage

Section 3.1: What Are Ad-Hoc Commands?

Question: Why write a playbook when you can run a single command?

Answer: Sometimes you don't. Ad-hoc commands are your rapid-response team for:

Quick checks (is nginx running?)
Emergency fixes (restart a service)
One-time deployments (install a package)

🛢 Ansible ad-hoc commands on 3 VMs:

# Update all packages
ansible web -i inventory.ini -m ansible.builtin.apt \
  -a "update_cache=yes" --become

# Output:
# [OK] vm-demo-0
# [OK] vm-demo-1
# [OK] vm-demo-2

In 5 seconds, I updated package caches on 3 servers. Try doing that manually. I'll wait. 😊

Section 3.2: Building inventory.ini for Dynamic Hosts

The magic link between Terraform and Ansible is inventory.ini:

[web]
52.172.1.10
52.172.1.11
52.172.1.12

[all:vars]
ansible_user=azureuser
ansible_ssh_private_key_file=~/.ssh/id_rsa
ansible_ssh_common_args='-o StrictHostKeyChecking=accept-new'

This file tells Ansible:

Which servers to target ([web] group)
How to connect (SSH with this user and key)
Trust new hosts automatically (first connection safety)

I generated this dynamically:

# Get IPs from Terraform output
PUBLIC_IPS=$(terraform output -json public_ips)

# Build inventory
cat > inventory.ini << EOF
[web]
$(echo $PUBLIC_IPS | jq -r '.[]')

[all:vars]
ansible_user=azureuser
ansible_ssh_private_key_file=~/.ssh/id_rsa
EOF

💡 Key Learning: Repeatable automation starts with repeatable data. Inventory files are that data. Generate them programmatically; never hardcode IPs.

Section 3.3: Ad-Hoc Commands for System Discovery

I used ad-hoc commands to validate my infrastructure:

# Check uptime
ansible web -i inventory.ini -m ansible.builtin.command -a "uptime"

# Check disk usage
ansible web -i inventory.ini -m ansible.builtin.command -a "df -h"

# Check listening ports
ansible web -i inventory.ini -m ansible.builtin.command -a "netstat -tlnp"

💊 These one-liners became my infrastructure audit. Before writing any playbooks, I knew:

All servers are online
They have enough disk space
No unexpected services are running

🎭 PART 4: Multi-Play Orchestration — The Playbook Revolution

Section 4.1: Why Playbooks Beat Ad-Hoc Commands

Question: If ad-hoc is faster, why do playbooks exist?

Answer: Reusability. A playbook written today runs identically tomorrow, on 3 VMs or 3,000 VMs, with zero drift.

🛢 Multi-play playbooks for static web deployment:

---
# Play 1: Install and configure Nginx
- name: "Install Nginx"
  hosts: web
  become: true
  tasks:
    - name: Update package lists
      ansible.builtin.apt:
        update_cache: true

    - name: Install Nginx
      ansible.builtin.apt:
        name: nginx
        state: present

# Play 2: Deploy content
- name: "Deploy Static Content"
  hosts: web
  become: true
  tasks:
    - name: Copy index.html
      ansible.builtin.copy:
        src: files/index.html
        dest: /var/www/html/index.html
        owner: www-data
        group: www-data
        mode: '0644'

# Play 3: Verify
- name: "Verify Deployment"
  hosts: localhost
  tasks:
    - name: Test HTTP connectivity
      ansible.builtin.uri:
        url: "http://{{ item }}"
        status_code: 200
      loop: "{{ groups['web'] }}"

Why Three Plays?

Play 1 runs on all web servers, installs nginx
Play 2 runs on all web servers, deploys content
Play 3 runs on localhost (your machine), verifies each server responds with HTTP 200

💊 If Play 1 fails, Play 2 never runs. This fail-fast approach prevents half-deployed states. This is reliability.

Section 4.2: The strftime Filter Saga

The Error:

Error while resolving value for 'msg': The filter plugin 'ansible.builtin.strftime' 
failed: Invalid value for epoch value (%Y-%m-%d %H:%M:%S)

What Happened:
I tried to format a file's modification time (mtime) using the strftime filter:

- name: Print file stats
  ansible.builtin.debug:
    msg: "Modified: {{ file_stat.stat.mtime | int | strftime('%Y-%m-%d %H:%M:%S') }}"

The issue: file_stat.stat.mtime returned a value that couldn't be cleanly cast to an integer for strftime. Different Linux distributions return mtime in different formats.

The Fix:
Added defensive filtering:

- name: Print file stats
  ansible.builtin.debug:
    msg: |
      ✓ File deployed:
        Path: {{ file_stat.stat.path | default('unknown') }}
        Modified (epoch): {{ file_stat.stat.mtime | default('unknown') }}

By using default(), I ensured the playbook wouldn't fail if mtime was missing or malformed. The output might say "unknown," but the playbook continues.

💡 Key Learning: Production code assumes inputs may be invalid. Always handle edge cases gracefully.

🌐 PART 5: Git-Based Deployment — The Dynamic Content Era

Section 5.1: Cloning Repositories During Deployment

Question: How do we deploy applications that live in GitHub?

Answer: We let Ansible clone them directly during provisioning.

🛢 Terraform + Ansible for end-to-end deployment:

# Step 1: Provision VM with Terraform
terraform apply

# Step 2: Get IP and build inventory
PUBLIC_IP=$(terraform output -raw public_ip)

# Step 3: Deploy app with Ansible
ansible-playbook -i inventory.ini site.yml

The Ansible playbook included a git clone task:

- name: Clone Mini Finance Repository
  ansible.builtin.git:
    repo: "https://github.com/suvrajeetbanerjee/mini_finance.git"
    dest: /var/www/html
    version: main
    depth: 1
    force: true

Parameters:

repo: GitHub URL (public repo, no auth needed)
dest: Where to clone on the target VM
version: Branch or tag to checkout
depth: 1: Shallow clone (latest commit only, faster)
force: true: Overwrite if directory exists

Section 5.2: The Permission Boundary Problem

The Error:

fatal: [74.225.240.117]: FAILED! => 
{
  "msg": "Unexpected AnsibleActionFail error: Could not find or access '/tmp/epicbook_clone/' 
  on the Ansible Controller."
}

Root Cause:
I was trying to use the copy module to move files from /tmp/epicbook_clone/ (on the remote VM) to /var/www/html/ (also on the remote VM). But copy by default operates controller → remote; it looked for the source on my local machine.

Initial (Wrong) Solution:

- name: Copy App Content
  ansible.builtin.copy:
    src: "/tmp/epicbook_clone/"
    dest: "/var/www/html/"
    owner: www-data
    mode: '0755'
  become: true

This failed because Ansible couldn't find /tmp/epicbook_clone/ locally.

Correct Solution:

Use the command module to copy on the remote machine:

- name: Copy App Content
  ansible.builtin.command: >
    cp -r /tmp/epicbook_clone/* {{ epicbook_app_path }}/
  become: true

- name: Set Permissions Recursively
  ansible.builtin.file:
    path: "{{ epicbook_app_path }}"
    owner: www-data
    group: www-data
    mode: '0755'
    recurse: true

💡 Key Learning: Understand the execution context. Some Ansible modules run on the controller; others run on the remote. Know the difference, and you save hours of debugging.

🏭 PART 6: Ansible Roles — The Modularization Game-Changer

Section 6.1: What Are Ansible Roles?

Question: How do we scale from 1 playbook to 100 playbooks without chaos?

Answer: We decompose them into reusable, self-contained roles.

A role is a collection of:

tasks/ — What to do
handlers/ — React to changes
templates/ — Configuration files with variables
files/ — Static content to copy
vars/ — Default variables
defaults/ — Overridable defaults

💊 Learning to structure deployment as 3 independent roles:

roles/
├── common/
│   └── tasks/main.yml           # System updates, baseline packages
├── nginx/
│   ├── tasks/main.yml           # Install, configure Nginx
│   └── templates/epicbook.conf.j2 # Nginx site config with Jinja2
└── epicbook/
    └── tasks/main.yml           # Clone repo, deploy app

Section 6.2: Role Syntax and Execution

The site.yml that orchestrates all roles:

---
- name: "Play 1 - Common Role"
  hosts: web
  become: true
  roles:
    - common

- name: "Play 2 - Nginx Role"
  hosts: web
  become: true
  roles:
    - nginx

- name: "Play 3 - EpicBook Role"
  hosts: web
  become: true
  roles:
    - epicbook

Execution Flow:

Play 1: Run all tasks in roles/common/tasks/main.yml on [web] hosts
  ↓
Play 2: Run all tasks in roles/nginx/tasks/main.yml on [web] hosts
  ↓
Play 3: Run all tasks in roles/epicbook/tasks/main.yml on [web] hosts

💊 Each role is isolated, testable, and reusable. If I need the "common" role on a database server, I simply add it to that server's group.

Section 6.3: Inside roles/common/tasks/main.yml

---
- name: Update Package Lists
  ansible.builtin.apt:
    update_cache: true
    cache_valid_time: 3600

- name: Install Essential Packages
  ansible.builtin.apt:
    name:
      - curl
      - wget
      - git
      - python3
    state: present

- name: Set Hostname
  ansible.builtin.hostname:
    name: "epicbook-server"

Why separate this into a role?

Every infrastructure deployment needs baseline updates
Every new server needs curl, git, Python
Other projects can reuse this exact role without modification
Changes to common baseline propagate everywhere

Section 6.4: Inside roles/nginx/tasks/main.yml

---
- name: Install Nginx
  ansible.builtin.apt:
    name: nginx
    state: present

- name: Create App Directory
  ansible.builtin.file:
    path: "{{ epicbook_app_path }}"
    state: directory
    owner: "{{ nginx_user }}"
    group: "{{ nginx_group }}"
    mode: '0755'

- name: Copy Nginx Configuration
  ansible.builtin.template:
    src: epicbook.conf.j2
    dest: /etc/nginx/sites-available/epicbook
    owner: root
    group: root
    mode: '0644'

- name: Enable Nginx Site
  ansible.builtin.file:
    src: /etc/nginx/sites-available/epicbook
    dest: /etc/nginx/sites-enabled/epicbook
    state: link

- name: Start And Enable Nginx
  ansible.builtin.service:
    name: nginx
    state: started
    enabled: true

Notice the use of variables like {{ epicbook_app_path }} and {{ nginx_user }}. These are defined in group_vars/web.yml:

---
epicbook_app_repo: "https://github.com/pravinmishraaws/theepicbook.git"
epicbook_app_path: /var/www/epicbook
nginx_user: www-data
nginx_group: www-data

Why? DRY (Don't Repeat Yourself). If I need to change the app path, I change it once in group_vars, and all 3 roles automatically use the new value.

Section 6.5: The Template Game — Jinja2 in roles/nginx/templates/epicbook.conf.j2

server {
    listen 80 default_server;
    listen [::]:80 default_server;

    server_name _;
    root {{ epicbook_app_path }};
    index index.html index.htm;

    location / {
        try_files $uri $uri/ =404;
    }

    error_page 404 /404.html;
    error_page 500 502 503 504 /50x.html;
}

💡 The {{ epicbook_app_path }} variable is interpolated at deployment time. So if I deploy to /var/www/app1 on server-A and /var/www/app2 on server-B, the template automatically adapts for each server. That's the power of templates.

🎯 PART 7: Handlers — The Change-Reactive Automation

Section 7.1: What Are Handlers?

Question: How do we ensure services reload only when configuration changes?

Answer: Handlers. They're tasks that run only if a prior task reports a change.

Example from roles/nginx/tasks/main.yml:

- name: Copy Nginx Configuration
  ansible.builtin.template:
    src: epicbook.conf.j2
    dest: /etc/nginx/sites-available/epicbook
  notify: Reload Nginx

- name: Enable Nginx Site
  ansible.builtin.file:
    src: /etc/nginx/sites-available/epicbook
    dest: /etc/nginx/sites-enabled/epicbook
    state: link
  notify: Reload Nginx

- name: Reload Nginx
  ansible.builtin.service:
    name: nginx
    state: reloaded

Flow:

Copy template task runs → config file changes → task reports changed: true
Task sends notify: Reload Nginx
At end of play, Nginx reload handler runs (once, even if notified multiple times)

Idempotency Example:

# First run: config changed, handler fires
CHANGED [web] → Reload Nginx [web]

# Second run: config unchanged, handler doesn't fire
OK [web] → (no reload)

💡 This is intelligent automation—reload only when needed, not on every run.

❌ PART 8: Real-World Errors and How I Fixed Them

Error 1: Parser Error from YAML Syntax

The Error:

unexpected parameter type in action: <class 'ansible.module_utils._internal._datatag._AnsibleTaggedList'>

Cause:
I accidentally used list syntax where a string was expected:

# WRONG:
state: [link]  # or state: ['link']

# RIGHT:
state: link

Fix:
Reviewed entire role with ansible-lint, corrected all such instances.

ansible-lint roles/nginx/tasks/main.yml
# Errors reported with line numbers; fixed each one

Error 2: Risky File Permissions

The Error:

risky-file-permissions: File permissions unset or incorrect.

Cause:
File operations lacked explicit mode parameter:

# WRONG:
- name: Copy config
  ansible.builtin.template:
    src: config.j2
    dest: /etc/nginx/sites-available/config

# RIGHT:
- name: Copy config
  ansible.builtin.template:
    src: config.j2
    dest: /etc/nginx/sites-available/config
    mode: '0644'

Fix:
Added explicit mode to every file operation.

Error 3: Missing index.html (The 403 Forbidden)

The Error (Browser):

403 Forbidden
nginx/1.18.0 (Ubuntu)

Cause:
Deployment succeeded, but /var/www/epicbook/index.html didn't exist. Nginx had no content to serve.

Debug:

ssh azureuser@VM_IP
ls -l /var/www/epicbook/
# Empty directory!

Fix:
Verified upstream repo structure, ensured git clone captured all files:

ssh azureuser@VM_IP
find /tmp/epicbook_clone -name index.html
# Found index.html in repo root, so copy succeeded

Created placeholder for testing:

echo '<h1>EpicBook Deployed!</h1>' | sudo tee /var/www/epicbook/index.html
sudo chown www-data:www-data /var/www/epicbook/index.html
sudo chmod 644 /var/www/epicbook/index.html
sudo systemctl reload nginx

Browser now showed content ✓

🚀 PART 9: Ansible Galaxy and Extending Your Roles

Section 9.1: What Is Ansible Galaxy?

Question: Do I need to write every role from scratch?

Answer: No. Ansible Galaxy is the package manager for roles—like npm for Node, pip for Python.

Accessing Galaxy:

# Search for nginx role
ansible-galaxy search nginx

# Install a community role
ansible-galaxy install geerlingguy.nginx

# Use it in a playbook
- hosts: web
  roles:
    - geerlingguy.nginx  # Uses defaults, or override with vars

Section 9.2: When to Use Galaxy Roles vs. Custom Roles

Use Galaxy roles when:

✓ Community role exactly matches your needs (nginx, docker, postgresql)
✓ Role is well-maintained (100+ GitHub stars, recent updates)
✓ You want to avoid reinventing the wheel

Write custom roles when:

✓ Your deployment is unique (custom app, specific company standards)
✓ You need tight control over every detail
✓ Team knowledge is in-house code

My Approach:
I wrote custom roles because:

This is a learning assignment—I needed to understand every line
My requirements are specific (EpicBook, not generic nginx)
Custom roles document my infrastructure decisions for future reference

📚 PART 10: Documentation and References

Section 10.1: Where to Learn Ansible

Official Resources:

Ansible Documentation — Comprehensive module reference
Ansible Galaxy — Community roles and collections
Ansible Best Practices — Production-grade guidelines

For This HandOn Assignments Demonstration:

Modules Used: ansible.builtin.apt, ansible.builtin.template, ansible.builtin.file, ansible.builtin.service, ansible.builtin.git
apt Module Docs — Package management
template Module Docs — Jinja2 file generation
file Module Docs — File/directory permissions
service Module Docs — Service lifecycle

Section 10.2: Ansible Execution End-to-End Flow Diagram

[Control Node]
      ↓
1. Load playbook (site.yml)
      ↓
2. Parse variables (group_vars/web.yml)
      ↓
3. Resolve inventory (inventory.ini)
      ↓
4. For each play:
      ├─ For each host in group:
      │  ├─ Connect via SSH
      │  ├─ Execute tasks sequentially
      │  ├─ If task notifies → trigger handler
      │  ├─ Collect results
      │  └─ Close connection
      ├─ After all hosts complete:
      │  ├─ Execute handlers (once per notified handler)
      │  └─ Move to next play
      ↓
5. Generate play recap (OK/CHANGED/FAILED)
      ↓
[Fully Deployed, Idempotent Infrastructure]

What This Shows:

Plays execute sequentially
Hosts within a play execute in parallel (faster!)
Handlers run after all tasks, not immediately
If any task fails, subsequent tasks/plays may be skipped (depending on failed_when, ignore_errors)

🎓 PART 11: Advanced Concepts I Learned

Section 11.1: Variable Precedence and Scoping

Question: If I define a variable in multiple places, which one wins?

Answer: Ansible has strict precedence (highest to lowest):

Extra vars (--extra-vars)
Task vars
Block vars
Play vars
Group vars (specific group first)
Host vars
Default vars
Discovered vars

Example:

# This wins (highest precedence)
ansible-playbook site.yml --extra-vars "epicbook_app_path=/custom/path"

# Falls back to group_vars/web.yml
epicbook_app_path: /var/www/epicbook

# Falls back to role defaults/
epicbook_app_path: /var/www/html

Section 11.2: Conditionals and Loops

When to conditionally run tasks:

- name: Install packages if not exists
  ansible.builtin.apt:
    name: nginx
    state: present
  when: ansible_distribution == "Ubuntu"

When to loop over lists:

- name: Install multiple packages
  ansible.builtin.apt:
    name: "{{ item }}"
    state: present
  loop:
    - curl
    - wget
    - git

Section 11.3: Error Handling and Recovery

- name: Deploy app
  ansible.builtin.command: /opt/deploy.sh
  register: deploy_result
  ignore_errors: true  # Don't stop if this fails

- name: Notify on failure
  ansible.builtin.debug:
    msg: "Deployment failed: {{ deploy_result.stderr }}"
  when: deploy_result.rc != 0

🏁 PART 12: Lessons from the Week

Section 12.1: The 70% Rule

My biggest realization: 70% of DevOps is not writing code—it's debugging permissions, networking, and assumptions.

In this week alone:

30% time: Writing Terraform, Ansible, YAML
70% time: Fixing SSH key formats, file permissions, missing files, YAML syntax errors

🎲 This is the craft. Embrace it.

Section 12.2: IaC is About Control, Not Typing

Terraform and Ansible aren't about saving keystrokes. They're about:

Reproducibility: Same code → same result, always
Auditability: Git history shows who changed what and when
Scalability: 1 server or 1,000; same codebase
Safety: Mistakes are caught in terraform plan, not in production

Section 12.3: Ansible Roles Are Career Capital

Understanding roles is a superpower because:

Every production Ansible deployment uses them
They're portable across companies, industries, tech stacks
Once you master 1 role (nginx), you can architect 50 (docker, mysql, postgresql, etc.)

📊 PART 13: Visual Diagram of Complete Ansible Architecture

┌─────────────────────────────────────────────────────────┐
│           CONTROL NODE (Your Machine)                   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐  │
│  │  site.yml (Main Playbook)                       │  │
│  │  - Play 1: common role                          │  │
│  │  - Play 2: nginx role                           │  │
│  │  - Play 3: epicbook role                        │  │
│  └─────────────────────────────────────────────────┘  │
│                      ↓                                  │
│  ┌──────────────────┬─────────────────────────────┐  │
│  │  inventory.ini   │   group_vars/web.yml        │  │
│  │  [web]           │   epicbook_app_path: /var.. │  │
│  │  52.172.1.10     │   nginx_user: www-data      │  │
│  │  52.172.1.11     │   epicbook_app_repo: https..│  │
│  │  52.172.1.12     │                             │  │
│  └──────────────────┴─────────────────────────────┘  │
│                      ↓                                  │
│  ┌─────────────────────────────────────────────────┐  │
│  │  Ansible Parser                                 │  │
│  │  - Load playbook                                │  │
│  │  - Resolve variables                            │  │
│  │  - Parse roles                                  │  │
│  └─────────────────────────────────────────────────┘  │
│                      ↓                                  │
└─────────────────────────────────────────────────────────┘
            SSH Connections (Port 22)
                ↙       ↓       ↘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  REMOTE VM1  │ │  REMOTE VM2  │ │  REMOTE VM3  │
│ 52.172.1.10  │ │ 52.172.1.11  │ │ 52.172.1.12  │
│              │ │              │ │              │
│ Execute:     │ │ Execute:     │ │ Execute:     │
│ ├─common     │ │ ├─common     │ │ ├─common     │
│ ├─nginx      │ │ ├─nginx      │ │ ├─nginx      │
│ └─epicbook   │ │ └─epicbook   │ │ └─epicbook   │
│              │ │              │ │              │
│ Result:      │ │ Result:      │ │ Result:      │
│ ✓ App        │ │ ✓ App        │ │ ✓ App        │
│ ✓ Running    │ │ ✓ Running    │ │ ✓ Running    │
└──────────────┘ └──────────────┘ └──────────────┘

🎬 Conclusion: From Learner to Production Engineer

This week transformed my understanding of automation. I didn't just learn tools; I learned the mindset:

Start simple. Ad-hoc commands teach you what's possible.

Scale progressively. Playbooks teach you repeatability.

Decompose ruthlessly. Roles teach you sustainability.

Automate everything. Terraform + Ansible together teach you true IaC.

The errors I faced—SSH key formats, file permissions, missing files—aren't bugs. They're lessons. Every error message is Ansible telling you exactly what went wrong. Read it, fix it, commit the lesson.

📝 Learning Outcomes

By completing HandsOn Assignments Practicals, I can now:

✅ Provision cloud infrastructure using Terraform with zero manual clicks

✅ Connect to servers via passwordless SSH with validated key formats

✅ Use Ansible ad-hoc commands for rapid system checks and updates

✅ Orchestrate multi-play deployments with handlers and idempotency

✅ Deploy applications from GitHub repositories automatically

✅ Architect reusable Ansible roles for any infrastructure need

✅ Debug Ansible errors methodically and fix them with confidence

✅ Scale from 1 VM to 1,000 VMs with identical, version-controlled code

🙏 Reflection and Next Steps

This journey is week 8 of 12 of our free DevOps Micro Internship Cohort, organized by Pravin Mishra sir 🙏, in continuation of 🔐 Terraform Production Battle-Tested: Remote State, Workspaces & Full-Stack AWS Deployment [Week-7—P2] 🚀.

Week 9 will dive into Azure DevOps. The skills built this week—roles, modular architecture, error handling—are the foundation for enterprise deployments.

To anyone reading this: If you can write a 3-role Ansible deployment, you've climbed the steepest hill. Everything else is refinement.