DEV Community

James Lee
James Lee

Posted on

Ansible Deep Dive: Architecture, Playbooks, API, Execution Internals & Performance Tuning

1. Ansible Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Ansible Core                           │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Core Modules │  │Custom Modules│  │    Plugins        │  │
│  │ (built-in)   │  │ (extensions) │  │ (supplementary)   │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
│                                                             │
│  ┌──────────────────────┐   ┌──────────────────────────┐   │
│  │  Playbooks           │   │  Connection Plugins       │   │
│  │  (task config files) │   │  (SSH / local / ZeroMQ)   │   │
│  └──────────────────────┘   └──────────────────────────┘   │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Host Inventory  (static /etc/ansible/hosts          │   │
│  │                  or dynamic via CMDB/Zabbix/Cloud)   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                        │ SSH (default)
            ┌───────────┼───────────┐
            ▼           ▼           ▼
        [Host A]    [Host B]    [Host C]
Enter fullscreen mode Exit fullscreen mode

Key components:

  • Core Modules — built-in modules shipped with Ansible
  • Custom Modules — extend functionality when built-ins are insufficient
  • Plugins — supplement module behavior
  • Playbooks — YAML task configuration files; define multi-step automation
  • Connection Plugins — support SSH (default), local, and ZeroMQ connections
  • Host Inventory — defines managed hosts; supports both static files and dynamic sources

Three Execution Modes

Mode Description
ad-hoc Single module, single command executed across a host group
Playbook Multiple tasks combined into a structured YAML file
API Programmatic invocation via Python

Static vs Dynamic Inventory

  • Static Inventory — hosts defined in /etc/ansible/hosts
  • Dynamic Inventory — external scripts query CMDB, Zabbix, Cobbler, or cloud platforms and return host lists in Ansible's expected format; ideal for environments where hosts change frequently

2. Playbook Structure

A playbook is a list of one or more plays. Each play maps a group of hosts to a set of tasks. Tasks are the smallest execution unit — each task calls one Ansible module.

Four Core Sections

Playbook
├── Target section    — which hosts to run on
├── Variable section  — variables used during execution
├── Task section      — ordered list of tasks to execute
└── Handler section   — tasks triggered by change events
Enter fullscreen mode Exit fullscreen mode

Directory Layout

my_role/
├── vars/        # variable definitions
├── tasks/       # task definitions
├── handlers/    # event-triggered tasks
├── files/       # static files to transfer
└── templates/   # Jinja2 templates
Enter fullscreen mode Exit fullscreen mode

Hosts and Users

- hosts: webnodes
  tasks:
    - name: test ping connection
      remote_user: test
      sudo: yes
Enter fullscreen mode Exit fullscreen mode

remote_user can be set globally per play or overridden per task. sudo / sudo_user allow privilege escalation.

Task List and Actions

Tasks execute sequentially across all hosts — all hosts complete task 1 before task 2 begins. If an error occurs mid-run, all completed tasks roll back.

Tasks are idempotent — safe to run multiple times with consistent results.

tasks:
  - name: make sure apache is running
    service: name=httpd state=running
Enter fullscreen mode Exit fullscreen mode

For command and shell modules, handle non-zero exit codes explicitly:

tasks:
  - name: run this command and ignore the result
    shell: /usr/bin/somecommand || /bin/true

  # or use ignore_errors
  - name: run this command and ignore the result
    shell: /usr/bin/somecommand
    ignore_errors: True
Enter fullscreen mode Exit fullscreen mode

Handlers

Handlers are triggered by notify — they run once at the end of the play, even if notified multiple times:

- name: template configuration file
  template: src=template.j2 dest=/etc/foo.conf
  notify:
    - restart memcached
    - restart apache

handlers:
  - name: restart memcached
    service: name=memcached state=restarted
  - name: restart apache
    service: name=apache state=restarted
Enter fullscreen mode Exit fullscreen mode

⚠️ The notify name must exactly match the handler's name field, or the trigger won't fire.

Tags

Tags let you selectively run or skip parts of a playbook:

tasks:
  - name: install nginx
    yum: name=nginx state=present
    tags: install
Enter fullscreen mode Exit fullscreen mode
ansible-playbook site.yml --tags install
ansible-playbook site.yml --skip-tags install
Enter fullscreen mode Exit fullscreen mode

Async Execution & Polling

Ansible defaults to 5 concurrent processes. For large fleets or long-running tasks, use async + poll:

- hosts: all
  tasks:
    - name: Install mlocate
      yum: name=mlocate state=installed

    - name: Run updatedb
      command: /usr/bin/updatedb
      async: 300    # max wait time (seconds)
      poll: 10      # check interval (seconds)
Enter fullscreen mode Exit fullscreen mode
  • poll: 0 — fire and forget (don't wait for completion)
  • async: 0 — wait indefinitely until the task finishes
  • Use wait_for module to check background process status later

When to use async polling:

  • Task may exceed SSH timeout
  • Running across a large number of hosts
  • Task doesn't need to complete before the next step

Looping with with_items

tasks:
  - name: Secure config files
    file: path=/etc/{{ item }} mode=0600 owner=root group=root
    with_items:
      - my.cnf
      - shadow
      - fstab
Enter fullscreen mode Exit fullscreen mode

Using with_fileglob to upload all matching files:

tasks:
  - name: Make key directory
    file: path=/root/.sshkeys state=directory mode=0700 owner=root group=root

  - name: Upload public keys
    copy: src={{ item }} dest=/root/.sshkeys mode=0600 owner=root group=root
    with_fileglob:
      - keys/*.pub

  - name: Assemble keys into authorized_keys file
    assemble: src=/root/.sshkeys dest=/root/.ssh/authorized_keys mode=0600 owner=root group=root
Enter fullscreen mode Exit fullscreen mode

3. Ansible Python API

The Ansible Python API is powerful and straightforward — module parameters are passed directly in the script.

Basic API Example

#!/usr/bin/env python
# coding=utf-8
import ansible.runner
import json

runner = ansible.runner.Runner(
    module_name='ping',
    module_args='',
    pattern='all',
    forks=10
)
datastructure = runner.run()
data = json.dumps(datastructure, indent=4)
print data
Enter fullscreen mode Exit fullscreen mode

Result structure:

{
    "contacted": {
        "web2.example.com": 1
    },
    "dark": {
        "web1.example.com": "failure message"
    }
}
Enter fullscreen mode Exit fullscreen mode
  • contacted — hosts that responded
  • dark — unreachable or failed hosts

Advanced API Example with Result Parsing

import ansible.runner
import sys

results = ansible.runner.Runner(
    pattern='*', forks=10,
    module_name='command', module_args='/usr/bin/uptime',
).run()

if results is None:
    print "No hosts found"
    sys.exit(1)

print "UP ***********"
for (hostname, result) in results['contacted'].items():
    if not 'failed' in result:
        print "%s >>> %s" % (hostname, result['stdout'])

print "FAILED *******"
for (hostname, result) in results['contacted'].items():
    if 'failed' in result:
        print "%s >>> %s" % (hostname, result['msg'])

print "DOWN *********"
for (hostname, result) in results['dark'].items():
    print "%s >>> %s" % (hostname, result)
Enter fullscreen mode Exit fullscreen mode

When to Use the API (vs plain Playbooks)

Scenario Recommendation
Pass output of task A as input to task B API — easier to chain results
Custom result formatting/parsing API — full control over output
Complex inter-playbook dependencies API — clearer call graph
Integration with other systems API — most practical use case
Simple sequential automation Playbook is sufficient

4. Ansible Playbook API

import ansible.playbook
from ansible import callbacks
from ansible import utils
import json

stats = callbacks.AggregateStats()
playbook_cb = callbacks.PlaybookCallbacks(verbose=utils.VERBOSITY)
runner_cb = callbacks.PlaybookRunnerCallbacks(stats, verbose=utils.VERBOSITY)

res = ansible.playbook.PlayBook(
    playbook='/etc/ansible/playbooks/user.yml',
    stats=stats,
    callbacks=playbook_cb,
    runner_callbacks=runner_cb
).run()

data = json.dumps(res, indent=4)
print data
Enter fullscreen mode Exit fullscreen mode

Required parameters — omitting any will raise an exception:

Parameter Purpose
playbook Path to the YAML playbook file
stats Collects and aggregates execution state
callbacks Outputs final playbook results
runner_callbacks Outputs per-task execution results

Sample output:

{
    "10.212.52.16": {
        "unreachable": 0,
        "skipped": 0,
        "ok": 1,
        "changed": 1,
        "failures": 0
    }
}
Enter fullscreen mode Exit fullscreen mode

5. Ansible Execution Internals

ansible-playbook site.yml
     │
     ▼
Parse YAML → Playbook object
     │
     ▼
Generate Play objects
     │
     ▼
Generate Task objects (smallest execution unit)
     │
     ▼
Playbook._run_task_internal()
     │
     ▼
Load ActionModule (default: normal)
     │  handles shell→command translation
     ▼
runner._execute_module()
     │
     ▼
Load module file from library/
(e.g. library/commands/command)
     │
     ▼
module_common.py renders the module
(injects command/args into template)
     │
     ▼
Copy rendered file to remote:
~/.ansible/tmp/ansible-<timestamp>/
     │
     ▼
Execute via SSH (subprocess + PIPE)
ssh user@host python ~/.ansible/tmp/ansible-xxx
     │
     ▼
Capture stdout via PIPE → return to Ansible
Enter fullscreen mode Exit fullscreen mode

Key insight: for each task, Ansible:

  1. Renders a Python script with the task's parameters
  2. SFTPs it to the remote host's temp directory
  3. Executes it via SSH
  4. Captures the result via PIPE and returns it

6. Performance Optimization

6.1 SSH Persistent Connections

Ansible relies heavily on SSH. Enabling SSH multiplexing keeps connections alive, avoiding repeated handshakes:

# ansible.cfg
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
Enter fullscreen mode Exit fullscreen mode

6.2 Enable Pipelining

Without pipelining, each task requires an SFTP transfer + SSH execution (2 connections). With pipelining, the entire task runs within a single SSH session — no SFTP needed:

# ansible.cfg
[ssh_connection]
pipelining = True
Enter fullscreen mode Exit fullscreen mode

⚠️ Requires requiretty to be disabled in /etc/sudoers on managed hosts.

6.3 Execution Strategy

# ansible.cfg
[defaults]
strategy = free   # default is "linear"
Enter fullscreen mode Exit fullscreen mode
Strategy Behavior
linear (default) All hosts complete task N before any host starts task N+1
free Each host moves to the next task immediately after completing the current one — no waiting

free mode is significantly faster when hosts have uneven performance.

6.4 Increase Fork Count

# Default is 5 — increase for large fleets
ansible-playbook site.yml -f 20
Enter fullscreen mode Exit fullscreen mode
# ansible.cfg
[defaults]
forks = 20
Enter fullscreen mode Exit fullscreen mode

6.5 Facts Caching

By default, Ansible collects facts from every host at the start of every run. For large inventories, this adds significant overhead.

# ansible.cfg
[defaults]
gathering = smart           # only gather if not cached
fact_caching = redis
fact_caching_timeout = 86400
fact_caching_connection = localhost:6379:0
Enter fullscreen mode Exit fullscreen mode

Summary of Optimizations

Optimization Impact
SSH persistent connections Eliminates repeated handshake overhead
Pipelining Removes SFTP transfer per task
strategy = free Faster on heterogeneous host performance
Increase forks (-f) More parallel execution
Facts caching (Redis) Eliminates repeated fact collection

7. Celery + Ansible Integration

Why Celery?

Without a task queue, Ansible tasks invoked via a web interface are tied to the HTTP request lifecycle:

Without Celery:
Browser ──HTTP──▶ Web Server ──▶ ansible-playbook (blocking)
Browser closes / network drops ──▶ task KILLED ❌

With Celery:
Browser ──HTTP──▶ Web Server ──▶ Celery task queue
                                      │
                                      ▼
                                 Worker executes ansible-playbook
                                 (independent of browser connection) ✅
Enter fullscreen mode Exit fullscreen mode

This is critical for long-running tasks (e.g., compiling software, rolling deployments) where mid-task interruption could leave systems in an inconsistent state.

Cancelling a Running Task

from celery.task.control import revoke

# Cancel task by task_id
revoke(task_id, terminate=True)
Enter fullscreen mode Exit fullscreen mode

Real-Time Log Streaming Architecture

For long-running Ansible tasks, users need live feedback. The recommended architecture:

Ansible execution
     │
     ▼
Write structured logs + results ──▶ Kafka topic
                                         │
                                         ▼
                            Frontend JS polls every 5s
                                         │
                                         ▼
                            Browser displays scrolling log
                            (auto-scroll when at bottom,
                             freeze when scrolled up)
Enter fullscreen mode Exit fullscreen mode

Goals achieved:

  1. ✅ Show task status as it starts
  2. ✅ Cancel task at any time via revoke()
  3. ✅ Real-time log display
  4. ✅ Auto-scroll log in browser (like browser DevTools console)

Top comments (0)