DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Ansible Playbook Collection: Ansible Best Practices Guide

Ansible Best Practices Guide

A comprehensive guide to writing maintainable, idempotent, and secure Ansible playbooks. This document covers project structure, variable management, role design, secrets handling, testing, and production deployment patterns.


Table of Contents

  1. Project Structure
  2. Inventory Management
  3. Variable Precedence & Organization
  4. Role Design Patterns
  5. Idempotency
  6. Secrets Management with Ansible Vault
  7. Error Handling & Debugging
  8. Performance Optimization
  9. Testing Ansible Code
  10. Security Considerations
  11. CI/CD Integration
  12. Common Anti-Patterns

Project Structure

A well-organized project structure is the foundation of maintainable Ansible automation. The recommended layout follows Ansible's directory conventions:

ansible-project/
├── ansible.cfg              # Project-level configuration
├── inventory/
│   └── hosts.yml            # Inventory with groups and host vars
├── group_vars/
│   ├── all.yml              # Variables for all hosts
│   ├── webservers.yml       # Variables for webserver group
│   └── dbservers.yml        # Variables for database group
├── host_vars/
│   └── web-prod-01.yml      # Variables for a specific host
├── playbooks/
│   ├── server-setup.yml     # Base provisioning
│   ├── docker-install.yml   # Docker setup
│   └── security-hardening.yml
├── roles/
│   └── common/
│       ├── tasks/main.yml
│       ├── handlers/main.yml
│       ├── defaults/main.yml
│       ├── templates/
│       └── files/
└── guides/
Enter fullscreen mode Exit fullscreen mode

Key principles:

  • Keep ansible.cfg in the project root so it's automatically picked up
  • Use group_vars/ and host_vars/ for environment-specific configuration
  • One playbook per concern (server setup, Docker, database, etc.)
  • Shared logic goes in roles; playbooks orchestrate roles and one-off tasks

Inventory Management

Your inventory defines what you manage. Organize it by environment and role.

Static YAML inventory (recommended for small-to-medium deployments):

all:
  children:
    production:
      children:
        prod_web:
          hosts:
            web-prod-01:
              ansible_host: 10.0.1.10
        prod_db:
          hosts:
            db-prod-01:
              ansible_host: 10.0.2.10
    staging:
      children:
        staging_web:
          hosts:
            web-staging-01:
              ansible_host: 10.1.1.10
Enter fullscreen mode Exit fullscreen mode

Dynamic inventory for cloud environments — use Ansible's built-in plugins:

# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
  - us-east-1
filters:
  tag:Environment: production
keyed_groups:
  - key: tags.Role
    prefix: role
Enter fullscreen mode Exit fullscreen mode

Best practices:

  • Use meaningful hostnames (not IPs) for readability
  • Group hosts by both environment (prod/staging) and role (web/db/monitoring)
  • Create logical groups that span environments for running playbooks across all webservers
  • Use --limit to target specific environments: ansible-playbook site.yml --limit staging

Variable Precedence & Organization

Ansible has 22 levels of variable precedence. Understanding the most common ones prevents surprises:

(lowest priority)
1. role defaults (roles/x/defaults/main.yml)
2. inventory group_vars
3. inventory host_vars
4. playbook group_vars
5. playbook host_vars
6. play vars
7. play vars_prompt
8. play vars_files
9. role vars (roles/x/vars/main.yml)
10. task vars
11. extra vars (-e "key=value")
(highest priority)
Enter fullscreen mode Exit fullscreen mode

Practical guidelines:

  • Role defaults — safe defaults that always work. Users override these.
  • group_vars/all.yml — project-wide settings (timezone, deploy user, package lists)
  • group_vars/production.yml — environment-specific settings (database sizes, replica counts)
  • host_vars/ — host-specific overrides (IP addresses, disk paths)
  • Extra vars (-e) — one-time overrides for testing. Never depend on these.

Never duplicate variables across multiple files. Define them at the lowest appropriate level and override where needed.


Role Design Patterns

Roles are Ansible's primary reuse mechanism. A well-designed role is self-contained and configurable through variables.

Role structure:

roles/nginx/
├── defaults/main.yml     # Default variables (user overrides these)
├── tasks/main.yml        # Main task list
├── handlers/main.yml     # Service restart/reload handlers
├── templates/             # Jinja2 templates (.j2 files)
├── files/                 # Static files to copy
├── vars/main.yml         # Internal variables (not meant to be overridden)
└── meta/main.yml         # Role dependencies and metadata
Enter fullscreen mode Exit fullscreen mode

Design principles:

  • Every run: step should use ansible.builtin. fully qualified collection names (FQCN)
  • Use defaults/main.yml for all configurable values — never hardcode
  • Use handlers for service restarts so they only run once, even if notified multiple times
  • Keep tasks focused: one logical operation per task
  • Always add tags: to tasks for selective execution
  • Use ansible.builtin.assert for pre-flight checks (OS version, required variables)

Avoid monolithic roles. If a role exceeds 200 lines of tasks, split it into smaller roles or use include_tasks to organize sections.


Idempotency

Every Ansible task should produce the same result whether run once or ten times. This is the most important principle in Ansible.

Idempotent modules (prefer these):

  • ansible.builtin.apt — installs packages only if not present
  • ansible.builtin.copy / ansible.builtin.template — writes files only if content changed
  • ansible.builtin.lineinfile — modifies lines only if they differ
  • ansible.builtin.systemd — ensures service state, doesn't restart unnecessarily

Non-idempotent modules (use with care):

  • ansible.builtin.command / ansible.builtin.shell — always reports "changed"
  • ansible.builtin.raw — no state tracking

Making command/shell idempotent:

# Bad — runs every time
- name: Create swap file
  ansible.builtin.command: fallocate -l 2G /swapfile

# Good — only runs if file doesn't exist
- name: Check if swap file exists
  ansible.builtin.stat:
    path: /swapfile
  register: swap_file

- name: Create swap file
  ansible.builtin.command: fallocate -l 2G /swapfile
  when: not swap_file.stat.exists
Enter fullscreen mode Exit fullscreen mode

Or use the creates parameter:

- name: Create swap file
  ansible.builtin.command:
    cmd: fallocate -l 2G /swapfile
    creates: /swapfile  # Skip if this file exists
Enter fullscreen mode Exit fullscreen mode

Secrets Management with Ansible Vault

Never store passwords, API keys, or certificates in plain text. Ansible Vault encrypts sensitive data at rest.

Creating encrypted variables:

# Encrypt a single variable
ansible-vault encrypt_string 'my_secret_password' --name 'db_password'

# Encrypt an entire file
ansible-vault encrypt group_vars/production/vault.yml
Enter fullscreen mode Exit fullscreen mode

Using encrypted variables in playbooks:

# group_vars/production/vault.yml (encrypted)
vault_db_password: "encrypted_value_here"
vault_api_key: "encrypted_value_here"

# group_vars/production/main.yml (plain text, references vault)
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"
Enter fullscreen mode Exit fullscreen mode

Running playbooks with vault:

# Prompt for password
ansible-playbook site.yml --ask-vault-pass

# Use a password file (for CI/CD)
ansible-playbook site.yml --vault-password-file ~/.vault_pass
Enter fullscreen mode Exit fullscreen mode

Best practices:

  • Prefix vault variables with vault_ for clarity
  • Keep one vault file per environment: group_vars/production/vault.yml
  • Use no_log: true on tasks that handle secrets to prevent console leakage
  • Store the vault password file outside the repository (.gitignore it)

Error Handling & Debugging

Graceful error handling:

# Continue on failure for non-critical tasks
- name: Clean temporary files
  ansible.builtin.file:
    path: /tmp/build-artifacts
    state: absent
  ignore_errors: true

# Custom failure conditions
- name: Check application health
  ansible.builtin.uri:
    url: "http://localhost:8000/health"
    return_content: true
  register: health_check
  failed_when: "'healthy' not in health_check.content"
  retries: 5
  delay: 10
  until: health_check.status == 200
Enter fullscreen mode Exit fullscreen mode

Debugging:

# Verbose output (1-4 levels)
ansible-playbook site.yml -vvv

# Check mode — dry run without making changes
ansible-playbook site.yml --check --diff

# Step through tasks interactively
ansible-playbook site.yml --step

# Start at a specific task
ansible-playbook site.yml --start-at-task="Install Docker"
Enter fullscreen mode Exit fullscreen mode

Use ansible.builtin.debug for variable inspection:

- name: Show gathered facts
  ansible.builtin.debug:
    var: ansible_distribution

- name: Show custom message
  ansible.builtin.debug:
    msg: "Deploying version {{ app_version }} to {{ inventory_hostname }}"
Enter fullscreen mode Exit fullscreen mode

Performance Optimization

SSH pipelining — the single biggest performance gain:

[ssh_connection]
pipelining = True
Enter fullscreen mode Exit fullscreen mode

This requires requiretty to be disabled in /etc/sudoers. Most modern distributions don't set it.

Fact caching — avoid re-gathering facts on every run:

[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 3600
Enter fullscreen mode Exit fullscreen mode

Parallel execution:

[defaults]
forks = 20  # Default is 5, increase for large inventories
Enter fullscreen mode Exit fullscreen mode

Other optimizations:

  • Use gather_facts: false when you don't need host facts
  • Use strategy: free to let fast hosts proceed without waiting for slow ones
  • Minimize ansible.builtin.command/ansible.builtin.shell usage (module-based tasks are faster)
  • Use async and poll for long-running tasks that don't need sequential execution

Testing Ansible Code

Syntax checking:

ansible-playbook site.yml --syntax-check
Enter fullscreen mode Exit fullscreen mode

Linting with ansible-lint:

pip install ansible-lint
ansible-lint playbooks/*.yml roles/*/tasks/*.yml
Enter fullscreen mode Exit fullscreen mode

Molecule for role testing:

pip install molecule molecule-docker
cd roles/common
molecule init scenario --driver-name docker
molecule test
Enter fullscreen mode Exit fullscreen mode

Molecule creates a Docker container, runs your role against it, verifies the result, and cleans up. It's the standard for CI/CD testing of Ansible roles.

Check mode for safe verification:

# Show what would change without making changes
ansible-playbook site.yml --check --diff --limit staging
Enter fullscreen mode Exit fullscreen mode

Security Considerations

  1. Least privilege — don't use become: true globally. Set it per-task where needed.
  2. Audit trail — enable logging in ansible.cfg:
   [defaults]
   log_path = /var/log/ansible.log
Enter fullscreen mode Exit fullscreen mode
  1. Restrict SSH access — use dedicated deploy keys, not personal keys
  2. Validate inputs — use ansible.builtin.assert to check variables before applying
  3. Pin versions — specify exact package versions for reproducibility
  4. Use no_log: true — on any task that handles passwords, tokens, or keys

CI/CD Integration

Run Ansible from CI/CD pipelines for automated provisioning:

# GitHub Actions example
- name: Run Ansible playbook
  run: |
    ansible-playbook \
      -i inventory/hosts.yml \
      --vault-password-file <(echo "${{ secrets.VAULT_PASSWORD }}") \
      --limit production \
      playbooks/deploy.yml
Enter fullscreen mode Exit fullscreen mode

CI/CD checklist:

  • Store vault password as a CI secret
  • Use --check --diff in PR pipelines for dry-run verification
  • Apply changes only on merge to main
  • Use --limit to target the correct environment
  • Add ansible-lint to your PR checks

Common Anti-Patterns

Anti-Pattern Why It's Bad Do This Instead
Using shell: for everything Not idempotent, slow Use built-in modules
Hardcoded values in tasks Not reusable Use defaults/main.yml variables
Giant monolithic playbooks Hard to debug and reuse Split into roles and focused playbooks
No tags on tasks Can't run selectively Add tags to every task block
Storing secrets in plain text Security risk Use Ansible Vault
Ignoring --check mode No safe preview Test with --check --diff first
Running as root everywhere Excessive privileges Use become: per-task
No error handling Silent failures Use failed_when, retries, assert

Summary

Writing production-quality Ansible automation comes down to:

  1. Structure — organize by concern, use roles for reuse
  2. Idempotency — every run should be safe to repeat
  3. Variables — configure at the right level, never hardcode
  4. Security — vault for secrets, least privilege, audit logging
  5. Testing — lint, check mode, and Molecule before production

The playbooks in this collection demonstrate these patterns. Start with server-setup.yml on a staging server, verify with --check --diff, then expand to production.


Part of Ansible Playbook Collection by Datanest Digital


This is 1 of 6 resources in the DevOps Toolkit Pro toolkit. Get the complete [Ansible Playbook Collection] with all files, templates, and documentation for $XX.

Get the Full Kit →

Or grab the entire DevOps Toolkit Pro bundle (6 products) for $178 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)