James Lee

Posted on May 17

Ansible Deep Dive: Architecture, Playbooks, API, Execution Internals & Performance Tuning

#architecture #automation #devops #performance

1. Ansible Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Ansible Core                           │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │ Core Modules │  │Custom Modules│  │    Plugins        │  │
│  │ (built-in)   │  │ (extensions) │  │ (supplementary)   │  │
│  └──────────────┘  └──────────────┘  └──────────────────┘  │
│                                                             │
│  ┌──────────────────────┐   ┌──────────────────────────┐   │
│  │  Playbooks           │   │  Connection Plugins       │   │
│  │  (task config files) │   │  (SSH / local / ZeroMQ)   │   │
│  └──────────────────────┘   └──────────────────────────┘   │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Host Inventory  (static /etc/ansible/hosts          │   │
│  │                  or dynamic via CMDB/Zabbix/Cloud)   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                        │ SSH (default)
            ┌───────────┼───────────┐
            ▼           ▼           ▼
        [Host A]    [Host B]    [Host C]

Key components:

Core Modules — built-in modules shipped with Ansible
Custom Modules — extend functionality when built-ins are insufficient
Plugins — supplement module behavior
Playbooks — YAML task configuration files; define multi-step automation
Connection Plugins — support SSH (default), local, and ZeroMQ connections
Host Inventory — defines managed hosts; supports both static files and dynamic sources

Three Execution Modes

Mode	Description
ad-hoc	Single module, single command executed across a host group
Playbook	Multiple tasks combined into a structured YAML file
API	Programmatic invocation via Python

Static vs Dynamic Inventory

Static Inventory — hosts defined in /etc/ansible/hosts
Dynamic Inventory — external scripts query CMDB, Zabbix, Cobbler, or cloud platforms and return host lists in Ansible's expected format; ideal for environments where hosts change frequently

2. Playbook Structure

A playbook is a list of one or more plays. Each play maps a group of hosts to a set of tasks. Tasks are the smallest execution unit — each task calls one Ansible module.

Four Core Sections

Playbook
├── Target section    — which hosts to run on
├── Variable section  — variables used during execution
├── Task section      — ordered list of tasks to execute
└── Handler section   — tasks triggered by change events

Directory Layout

my_role/
├── vars/        # variable definitions
├── tasks/       # task definitions
├── handlers/    # event-triggered tasks
├── files/       # static files to transfer
└── templates/   # Jinja2 templates

Hosts and Users

- hosts: webnodes
  tasks:
    - name: test ping connection
      remote_user: test
      sudo: yes

remote_user can be set globally per play or overridden per task. sudo / sudo_user allow privilege escalation.

Task List and Actions

Tasks execute sequentially across all hosts — all hosts complete task 1 before task 2 begins. If an error occurs mid-run, all completed tasks roll back.

Tasks are idempotent — safe to run multiple times with consistent results.

tasks:
  - name: make sure apache is running
    service: name=httpd state=running

For command and shell modules, handle non-zero exit codes explicitly:

tasks:
  - name: run this command and ignore the result
    shell: /usr/bin/somecommand || /bin/true

  # or use ignore_errors
  - name: run this command and ignore the result
    shell: /usr/bin/somecommand
    ignore_errors: True

Handlers

Handlers are triggered by notify — they run once at the end of the play, even if notified multiple times:

- name: template configuration file
  template: src=template.j2 dest=/etc/foo.conf
  notify:
    - restart memcached
    - restart apache

handlers:
  - name: restart memcached
    service: name=memcached state=restarted
  - name: restart apache
    service: name=apache state=restarted

⚠️ The notify name must exactly match the handler's name field, or the trigger won't fire.

Async Execution & Polling

Ansible defaults to 5 concurrent processes. For large fleets or long-running tasks, use async + poll:

- hosts: all
  tasks:
    - name: Install mlocate
      yum: name=mlocate state=installed

    - name: Run updatedb
      command: /usr/bin/updatedb
      async: 300    # max wait time (seconds)
      poll: 10      # check interval (seconds)

poll: 0 — fire and forget (don't wait for completion)
async: 0 — wait indefinitely until the task finishes
Use wait_for module to check background process status later

When to use async polling:

Task may exceed SSH timeout
Running across a large number of hosts
Task doesn't need to complete before the next step

Looping with `with_items`

tasks:
  - name: Secure config files
    file: path=/etc/{{ item }} mode=0600 owner=root group=root
    with_items:
      - my.cnf
      - shadow
      - fstab

Using with_fileglob to upload all matching files:

tasks:
  - name: Make key directory
    file: path=/root/.sshkeys state=directory mode=0700 owner=root group=root

  - name: Upload public keys
    copy: src={{ item }} dest=/root/.sshkeys mode=0600 owner=root group=root
    with_fileglob:
      - keys/*.pub

  - name: Assemble keys into authorized_keys file
    assemble: src=/root/.sshkeys dest=/root/.ssh/authorized_keys mode=0600 owner=root group=root

3. Ansible Python API

The Ansible Python API is powerful and straightforward — module parameters are passed directly in the script.

Basic API Example

#!/usr/bin/env python
# coding=utf-8
import ansible.runner
import json

runner = ansible.runner.Runner(
    module_name='ping',
    module_args='',
    pattern='all',
    forks=10
)
datastructure = runner.run()
data = json.dumps(datastructure, indent=4)
print data

Result structure:

{
    "contacted": {
        "web2.example.com": 1
    },
    "dark": {
        "web1.example.com": "failure message"
    }
}

contacted — hosts that responded
dark — unreachable or failed hosts

Advanced API Example with Result Parsing

import ansible.runner
import sys

results = ansible.runner.Runner(
    pattern='*', forks=10,
    module_name='command', module_args='/usr/bin/uptime',
).run()

if results is None:
    print "No hosts found"
    sys.exit(1)

print "UP ***********"
for (hostname, result) in results['contacted'].items():
    if not 'failed' in result:
        print "%s >>> %s" % (hostname, result['stdout'])

print "FAILED *******"
for (hostname, result) in results['contacted'].items():
    if 'failed' in result:
        print "%s >>> %s" % (hostname, result['msg'])

print "DOWN *********"
for (hostname, result) in results['dark'].items():
    print "%s >>> %s" % (hostname, result)

When to Use the API (vs plain Playbooks)

Scenario	Recommendation
Pass output of task A as input to task B	API — easier to chain results
Custom result formatting/parsing	API — full control over output
Complex inter-playbook dependencies	API — clearer call graph
Integration with other systems	API — most practical use case
Simple sequential automation	Playbook is sufficient

4. Ansible Playbook API

import ansible.playbook
from ansible import callbacks
from ansible import utils
import json

stats = callbacks.AggregateStats()
playbook_cb = callbacks.PlaybookCallbacks(verbose=utils.VERBOSITY)
runner_cb = callbacks.PlaybookRunnerCallbacks(stats, verbose=utils.VERBOSITY)

res = ansible.playbook.PlayBook(
    playbook='/etc/ansible/playbooks/user.yml',
    stats=stats,
    callbacks=playbook_cb,
    runner_callbacks=runner_cb
).run()

data = json.dumps(res, indent=4)
print data

Required parameters — omitting any will raise an exception:

Parameter	Purpose
`playbook`	Path to the YAML playbook file
`stats`	Collects and aggregates execution state
`callbacks`	Outputs final playbook results
`runner_callbacks`	Outputs per-task execution results

Sample output:

{
    "10.212.52.16": {
        "unreachable": 0,
        "skipped": 0,
        "ok": 1,
        "changed": 1,
        "failures": 0
    }
}

5. Ansible Execution Internals

ansible-playbook site.yml
     │
     ▼
Parse YAML → Playbook object
     │
     ▼
Generate Play objects
     │
     ▼
Generate Task objects (smallest execution unit)
     │
     ▼
Playbook._run_task_internal()
     │
     ▼
Load ActionModule (default: normal)
     │  handles shell→command translation
     ▼
runner._execute_module()
     │
     ▼
Load module file from library/
(e.g. library/commands/command)
     │
     ▼
module_common.py renders the module
(injects command/args into template)
     │
     ▼
Copy rendered file to remote:
~/.ansible/tmp/ansible-<timestamp>/
     │
     ▼
Execute via SSH (subprocess + PIPE)
ssh user@host python ~/.ansible/tmp/ansible-xxx
     │
     ▼
Capture stdout via PIPE → return to Ansible

Key insight: for each task, Ansible:

Renders a Python script with the task's parameters
SFTPs it to the remote host's temp directory
Executes it via SSH
Captures the result via PIPE and returns it

6. Performance Optimization

6.1 SSH Persistent Connections

Ansible relies heavily on SSH. Enabling SSH multiplexing keeps connections alive, avoiding repeated handshakes:

# ansible.cfg
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

6.2 Enable Pipelining

Without pipelining, each task requires an SFTP transfer + SSH execution (2 connections). With pipelining, the entire task runs within a single SSH session — no SFTP needed:

# ansible.cfg
[ssh_connection]
pipelining = True

⚠️ Requires requiretty to be disabled in /etc/sudoers on managed hosts.

6.3 Execution Strategy

# ansible.cfg
[defaults]
strategy = free   # default is "linear"

Strategy	Behavior
`linear` (default)	All hosts complete task N before any host starts task N+1
`free`	Each host moves to the next task immediately after completing the current one — no waiting

free mode is significantly faster when hosts have uneven performance.

6.4 Increase Fork Count

# Default is 5 — increase for large fleets
ansible-playbook site.yml -f 20

# ansible.cfg
[defaults]
forks = 20

6.5 Facts Caching

By default, Ansible collects facts from every host at the start of every run. For large inventories, this adds significant overhead.

# ansible.cfg
[defaults]
gathering = smart           # only gather if not cached
fact_caching = redis
fact_caching_timeout = 86400
fact_caching_connection = localhost:6379:0

Summary of Optimizations

Optimization	Impact
SSH persistent connections	Eliminates repeated handshake overhead
Pipelining	Removes SFTP transfer per task
`strategy = free`	Faster on heterogeneous host performance
Increase forks (`-f`)	More parallel execution
Facts caching (Redis)	Eliminates repeated fact collection

7. Celery + Ansible Integration

Why Celery?

Without a task queue, Ansible tasks invoked via a web interface are tied to the HTTP request lifecycle:

Without Celery:
Browser ──HTTP──▶ Web Server ──▶ ansible-playbook (blocking)
Browser closes / network drops ──▶ task KILLED ❌

With Celery:
Browser ──HTTP──▶ Web Server ──▶ Celery task queue
                                      │
                                      ▼
                                 Worker executes ansible-playbook
                                 (independent of browser connection) ✅

This is critical for long-running tasks (e.g., compiling software, rolling deployments) where mid-task interruption could leave systems in an inconsistent state.

Cancelling a Running Task

from celery.task.control import revoke

# Cancel task by task_id
revoke(task_id, terminate=True)

Real-Time Log Streaming Architecture

For long-running Ansible tasks, users need live feedback. The recommended architecture:

Ansible execution
     │
     ▼
Write structured logs + results ──▶ Kafka topic
                                         │
                                         ▼
                            Frontend JS polls every 5s
                                         │
                                         ▼
                            Browser displays scrolling log
                            (auto-scroll when at bottom,
                             freeze when scrolled up)

Goals achieved:

✅ Show task status as it starts
✅ Cancel task at any time via revoke()
✅ Real-time log display
✅ Auto-scroll log in browser (like browser DevTools console)

DEV Community

Ansible Deep Dive: Architecture, Playbooks, API, Execution Internals & Performance Tuning

1. Ansible Architecture

Three Execution Modes

Static vs Dynamic Inventory

2. Playbook Structure

Four Core Sections

Directory Layout

Hosts and Users

Task List and Actions

Handlers

Tags

Async Execution & Polling

Looping with `with_items`

3. Ansible Python API

Basic API Example

Advanced API Example with Result Parsing

When to Use the API (vs plain Playbooks)

4. Ansible Playbook API

5. Ansible Execution Internals

6. Performance Optimization

6.1 SSH Persistent Connections

6.2 Enable Pipelining

6.3 Execution Strategy

6.4 Increase Fork Count

6.5 Facts Caching

Summary of Optimizations

7. Celery + Ansible Integration

Why Celery?

Cancelling a Running Task

Real-Time Log Streaming Architecture

Top comments (0)

1. Ansible Architecture

Three Execution Modes

Static vs Dynamic Inventory

2. Playbook Structure

Four Core Sections

Directory Layout

Hosts and Users

Task List and Actions

Handlers

Tags

Async Execution & Polling

Looping with with_items

3. Ansible Python API

Basic API Example

Advanced API Example with Result Parsing

When to Use the API (vs plain Playbooks)

4. Ansible Playbook API

5. Ansible Execution Internals

6. Performance Optimization

6.1 SSH Persistent Connections

6.2 Enable Pipelining

6.3 Execution Strategy

6.4 Increase Fork Count

6.5 Facts Caching

Summary of Optimizations

7. Celery + Ansible Integration

Why Celery?

Cancelling a Running Task

Real-Time Log Streaming Architecture

Looping with `with_items`