Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Ansible Playbook Collection: Ansible Best Practices Guide

#devops #docker #ansible #kubernetes

Ansible Best Practices Guide

A comprehensive guide to writing maintainable, idempotent, and secure Ansible playbooks. This document covers project structure, variable management, role design, secrets handling, testing, and production deployment patterns.

Project Structure
Inventory Management
Variable Precedence & Organization
Role Design Patterns
Idempotency
Secrets Management with Ansible Vault
Error Handling & Debugging
Performance Optimization
Testing Ansible Code
Security Considerations
CI/CD Integration
Common Anti-Patterns

Project Structure

A well-organized project structure is the foundation of maintainable Ansible automation. The recommended layout follows Ansible's directory conventions:

ansible-project/
├── ansible.cfg              # Project-level configuration
├── inventory/
│   └── hosts.yml            # Inventory with groups and host vars
├── group_vars/
│   ├── all.yml              # Variables for all hosts
│   ├── webservers.yml       # Variables for webserver group
│   └── dbservers.yml        # Variables for database group
├── host_vars/
│   └── web-prod-01.yml      # Variables for a specific host
├── playbooks/
│   ├── server-setup.yml     # Base provisioning
│   ├── docker-install.yml   # Docker setup
│   └── security-hardening.yml
├── roles/
│   └── common/
│       ├── tasks/main.yml
│       ├── handlers/main.yml
│       ├── defaults/main.yml
│       ├── templates/
│       └── files/
└── guides/

Key principles:

Keep ansible.cfg in the project root so it's automatically picked up
Use group_vars/ and host_vars/ for environment-specific configuration
One playbook per concern (server setup, Docker, database, etc.)
Shared logic goes in roles; playbooks orchestrate roles and one-off tasks

Inventory Management

Your inventory defines what you manage. Organize it by environment and role.

Static YAML inventory (recommended for small-to-medium deployments):

all:
  children:
    production:
      children:
        prod_web:
          hosts:
            web-prod-01:
              ansible_host: 10.0.1.10
        prod_db:
          hosts:
            db-prod-01:
              ansible_host: 10.0.2.10
    staging:
      children:
        staging_web:
          hosts:
            web-staging-01:
              ansible_host: 10.1.1.10

Dynamic inventory for cloud environments — use Ansible's built-in plugins:

# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
  - us-east-1
filters:
  tag:Environment: production
keyed_groups:
  - key: tags.Role
    prefix: role

Best practices:

Use meaningful hostnames (not IPs) for readability
Group hosts by both environment (prod/staging) and role (web/db/monitoring)
Create logical groups that span environments for running playbooks across all webservers
Use --limit to target specific environments: ansible-playbook site.yml --limit staging

Variable Precedence & Organization

Ansible has 22 levels of variable precedence. Understanding the most common ones prevents surprises:

(lowest priority)
1. role defaults (roles/x/defaults/main.yml)
2. inventory group_vars
3. inventory host_vars
4. playbook group_vars
5. playbook host_vars
6. play vars
7. play vars_prompt
8. play vars_files
9. role vars (roles/x/vars/main.yml)
10. task vars
11. extra vars (-e "key=value")
(highest priority)

Practical guidelines:

Role defaults — safe defaults that always work. Users override these.
group_vars/all.yml — project-wide settings (timezone, deploy user, package lists)
group_vars/production.yml — environment-specific settings (database sizes, replica counts)
host_vars/ — host-specific overrides (IP addresses, disk paths)
Extra vars (-e) — one-time overrides for testing. Never depend on these.

Never duplicate variables across multiple files. Define them at the lowest appropriate level and override where needed.

Role Design Patterns

Roles are Ansible's primary reuse mechanism. A well-designed role is self-contained and configurable through variables.

Role structure:

roles/nginx/
├── defaults/main.yml     # Default variables (user overrides these)
├── tasks/main.yml        # Main task list
├── handlers/main.yml     # Service restart/reload handlers
├── templates/             # Jinja2 templates (.j2 files)
├── files/                 # Static files to copy
├── vars/main.yml         # Internal variables (not meant to be overridden)
└── meta/main.yml         # Role dependencies and metadata

Design principles:

Every run: step should use ansible.builtin. fully qualified collection names (FQCN)
Use defaults/main.yml for all configurable values — never hardcode
Use handlers for service restarts so they only run once, even if notified multiple times
Keep tasks focused: one logical operation per task
Always add tags: to tasks for selective execution
Use ansible.builtin.assert for pre-flight checks (OS version, required variables)

Avoid monolithic roles. If a role exceeds 200 lines of tasks, split it into smaller roles or use include_tasks to organize sections.

Idempotency

Every Ansible task should produce the same result whether run once or ten times. This is the most important principle in Ansible.

Idempotent modules (prefer these):

ansible.builtin.apt — installs packages only if not present
ansible.builtin.copy / ansible.builtin.template — writes files only if content changed
ansible.builtin.lineinfile — modifies lines only if they differ
ansible.builtin.systemd — ensures service state, doesn't restart unnecessarily

Non-idempotent modules (use with care):

ansible.builtin.command / ansible.builtin.shell — always reports "changed"
ansible.builtin.raw — no state tracking

Making command/shell idempotent:

# Bad — runs every time
- name: Create swap file
  ansible.builtin.command: fallocate -l 2G /swapfile

# Good — only runs if file doesn't exist
- name: Check if swap file exists
  ansible.builtin.stat:
    path: /swapfile
  register: swap_file

- name: Create swap file
  ansible.builtin.command: fallocate -l 2G /swapfile
  when: not swap_file.stat.exists

Or use the creates parameter:

- name: Create swap file
  ansible.builtin.command:
    cmd: fallocate -l 2G /swapfile
    creates: /swapfile  # Skip if this file exists

Secrets Management with Ansible Vault

Never store passwords, API keys, or certificates in plain text. Ansible Vault encrypts sensitive data at rest.

Creating encrypted variables:

# Encrypt a single variable
ansible-vault encrypt_string 'my_secret_password' --name 'db_password'

# Encrypt an entire file
ansible-vault encrypt group_vars/production/vault.yml

Using encrypted variables in playbooks:

# group_vars/production/vault.yml (encrypted)
vault_db_password: "encrypted_value_here"
vault_api_key: "encrypted_value_here"

# group_vars/production/main.yml (plain text, references vault)
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"

Running playbooks with vault:

# Prompt for password
ansible-playbook site.yml --ask-vault-pass

# Use a password file (for CI/CD)
ansible-playbook site.yml --vault-password-file ~/.vault_pass

Best practices:

Prefix vault variables with vault_ for clarity
Keep one vault file per environment: group_vars/production/vault.yml
Use no_log: true on tasks that handle secrets to prevent console leakage
Store the vault password file outside the repository (.gitignore it)

Error Handling & Debugging

Graceful error handling:

# Continue on failure for non-critical tasks
- name: Clean temporary files
  ansible.builtin.file:
    path: /tmp/build-artifacts
    state: absent
  ignore_errors: true

# Custom failure conditions
- name: Check application health
  ansible.builtin.uri:
    url: "http://localhost:8000/health"
    return_content: true
  register: health_check
  failed_when: "'healthy' not in health_check.content"
  retries: 5
  delay: 10
  until: health_check.status == 200

Debugging:

# Verbose output (1-4 levels)
ansible-playbook site.yml -vvv

# Check mode — dry run without making changes
ansible-playbook site.yml --check --diff

# Step through tasks interactively
ansible-playbook site.yml --step

# Start at a specific task
ansible-playbook site.yml --start-at-task="Install Docker"

Use ansible.builtin.debug for variable inspection:

- name: Show gathered facts
  ansible.builtin.debug:
    var: ansible_distribution

- name: Show custom message
  ansible.builtin.debug:
    msg: "Deploying version {{ app_version }} to {{ inventory_hostname }}"

Performance Optimization

SSH pipelining — the single biggest performance gain:

[ssh_connection]
pipelining = True

This requires requiretty to be disabled in /etc/sudoers. Most modern distributions don't set it.

Fact caching — avoid re-gathering facts on every run:

[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 3600

Parallel execution:

[defaults]
forks = 20  # Default is 5, increase for large inventories

Other optimizations:

Use gather_facts: false when you don't need host facts
Use strategy: free to let fast hosts proceed without waiting for slow ones
Minimize ansible.builtin.command/ansible.builtin.shell usage (module-based tasks are faster)
Use async and poll for long-running tasks that don't need sequential execution

Testing Ansible Code

Syntax checking:

ansible-playbook site.yml --syntax-check

Linting with ansible-lint:

pip install ansible-lint
ansible-lint playbooks/*.yml roles/*/tasks/*.yml

Molecule for role testing:

pip install molecule molecule-docker
cd roles/common
molecule init scenario --driver-name docker
molecule test

Molecule creates a Docker container, runs your role against it, verifies the result, and cleans up. It's the standard for CI/CD testing of Ansible roles.

Check mode for safe verification:

# Show what would change without making changes
ansible-playbook site.yml --check --diff --limit staging

Security Considerations

Least privilege — don't use become: true globally. Set it per-task where needed.
Audit trail — enable logging in ansible.cfg:

   [defaults]
   log_path = /var/log/ansible.log

Restrict SSH access — use dedicated deploy keys, not personal keys
Validate inputs — use ansible.builtin.assert to check variables before applying
Pin versions — specify exact package versions for reproducibility
Use no_log: true — on any task that handles passwords, tokens, or keys

CI/CD Integration

Run Ansible from CI/CD pipelines for automated provisioning:

# GitHub Actions example
- name: Run Ansible playbook
  run: |
    ansible-playbook \
      -i inventory/hosts.yml \
      --vault-password-file <(echo "${{ secrets.VAULT_PASSWORD }}") \
      --limit production \
      playbooks/deploy.yml

CI/CD checklist:

Store vault password as a CI secret
Use --check --diff in PR pipelines for dry-run verification
Apply changes only on merge to main
Use --limit to target the correct environment
Add ansible-lint to your PR checks

Common Anti-Patterns

Anti-Pattern	Why It's Bad	Do This Instead
Using `shell:` for everything	Not idempotent, slow	Use built-in modules
Hardcoded values in tasks	Not reusable	Use `defaults/main.yml` variables
Giant monolithic playbooks	Hard to debug and reuse	Split into roles and focused playbooks
No tags on tasks	Can't run selectively	Add tags to every task block
Storing secrets in plain text	Security risk	Use Ansible Vault
Ignoring `--check` mode	No safe preview	Test with `--check --diff` first
Running as root everywhere	Excessive privileges	Use `become:` per-task
No error handling	Silent failures	Use `failed_when`, `retries`, `assert`

Summary

Writing production-quality Ansible automation comes down to:

Structure — organize by concern, use roles for reuse
Idempotency — every run should be safe to repeat
Variables — configure at the right level, never hardcode
Security — vault for secrets, least privilege, audit logging
Testing — lint, check mode, and Molecule before production

The playbooks in this collection demonstrate these patterns. Start with server-setup.yml on a staging server, verify with --check --diff, then expand to production.

Part of Ansible Playbook Collection by Datanest Digital

This is 1 of 6 resources in the DevOps Toolkit Pro toolkit. Get the complete [Ansible Playbook Collection] with all files, templates, and documentation for $XX.

Get the Full Kit →

Or grab the entire DevOps Toolkit Pro bundle (6 products) for $178 — save 30%.

Get the Complete Bundle →

DEV Community