Ansible Best Practices Guide
A comprehensive guide to writing maintainable, idempotent, and secure Ansible playbooks. This document covers project structure, variable management, role design, secrets handling, testing, and production deployment patterns.
Table of Contents
- Project Structure
- Inventory Management
- Variable Precedence & Organization
- Role Design Patterns
- Idempotency
- Secrets Management with Ansible Vault
- Error Handling & Debugging
- Performance Optimization
- Testing Ansible Code
- Security Considerations
- CI/CD Integration
- Common Anti-Patterns
Project Structure
A well-organized project structure is the foundation of maintainable Ansible automation. The recommended layout follows Ansible's directory conventions:
ansible-project/
├── ansible.cfg # Project-level configuration
├── inventory/
│ └── hosts.yml # Inventory with groups and host vars
├── group_vars/
│ ├── all.yml # Variables for all hosts
│ ├── webservers.yml # Variables for webserver group
│ └── dbservers.yml # Variables for database group
├── host_vars/
│ └── web-prod-01.yml # Variables for a specific host
├── playbooks/
│ ├── server-setup.yml # Base provisioning
│ ├── docker-install.yml # Docker setup
│ └── security-hardening.yml
├── roles/
│ └── common/
│ ├── tasks/main.yml
│ ├── handlers/main.yml
│ ├── defaults/main.yml
│ ├── templates/
│ └── files/
└── guides/
Key principles:
- Keep
ansible.cfgin the project root so it's automatically picked up - Use
group_vars/andhost_vars/for environment-specific configuration - One playbook per concern (server setup, Docker, database, etc.)
- Shared logic goes in roles; playbooks orchestrate roles and one-off tasks
Inventory Management
Your inventory defines what you manage. Organize it by environment and role.
Static YAML inventory (recommended for small-to-medium deployments):
all:
children:
production:
children:
prod_web:
hosts:
web-prod-01:
ansible_host: 10.0.1.10
prod_db:
hosts:
db-prod-01:
ansible_host: 10.0.2.10
staging:
children:
staging_web:
hosts:
web-staging-01:
ansible_host: 10.1.1.10
Dynamic inventory for cloud environments — use Ansible's built-in plugins:
# inventory/aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
- us-east-1
filters:
tag:Environment: production
keyed_groups:
- key: tags.Role
prefix: role
Best practices:
- Use meaningful hostnames (not IPs) for readability
- Group hosts by both environment (prod/staging) and role (web/db/monitoring)
- Create logical groups that span environments for running playbooks across all webservers
- Use
--limitto target specific environments:ansible-playbook site.yml --limit staging
Variable Precedence & Organization
Ansible has 22 levels of variable precedence. Understanding the most common ones prevents surprises:
(lowest priority)
1. role defaults (roles/x/defaults/main.yml)
2. inventory group_vars
3. inventory host_vars
4. playbook group_vars
5. playbook host_vars
6. play vars
7. play vars_prompt
8. play vars_files
9. role vars (roles/x/vars/main.yml)
10. task vars
11. extra vars (-e "key=value")
(highest priority)
Practical guidelines:
- Role defaults — safe defaults that always work. Users override these.
- group_vars/all.yml — project-wide settings (timezone, deploy user, package lists)
- group_vars/production.yml — environment-specific settings (database sizes, replica counts)
- host_vars/ — host-specific overrides (IP addresses, disk paths)
-
Extra vars (
-e) — one-time overrides for testing. Never depend on these.
Never duplicate variables across multiple files. Define them at the lowest appropriate level and override where needed.
Role Design Patterns
Roles are Ansible's primary reuse mechanism. A well-designed role is self-contained and configurable through variables.
Role structure:
roles/nginx/
├── defaults/main.yml # Default variables (user overrides these)
├── tasks/main.yml # Main task list
├── handlers/main.yml # Service restart/reload handlers
├── templates/ # Jinja2 templates (.j2 files)
├── files/ # Static files to copy
├── vars/main.yml # Internal variables (not meant to be overridden)
└── meta/main.yml # Role dependencies and metadata
Design principles:
- Every
run:step should useansible.builtin.fully qualified collection names (FQCN) - Use
defaults/main.ymlfor all configurable values — never hardcode - Use
handlersfor service restarts so they only run once, even if notified multiple times - Keep tasks focused: one logical operation per task
- Always add
tags:to tasks for selective execution - Use
ansible.builtin.assertfor pre-flight checks (OS version, required variables)
Avoid monolithic roles. If a role exceeds 200 lines of tasks, split it into smaller roles or use include_tasks to organize sections.
Idempotency
Every Ansible task should produce the same result whether run once or ten times. This is the most important principle in Ansible.
Idempotent modules (prefer these):
-
ansible.builtin.apt— installs packages only if not present -
ansible.builtin.copy/ansible.builtin.template— writes files only if content changed -
ansible.builtin.lineinfile— modifies lines only if they differ -
ansible.builtin.systemd— ensures service state, doesn't restart unnecessarily
Non-idempotent modules (use with care):
-
ansible.builtin.command/ansible.builtin.shell— always reports "changed" -
ansible.builtin.raw— no state tracking
Making command/shell idempotent:
# Bad — runs every time
- name: Create swap file
ansible.builtin.command: fallocate -l 2G /swapfile
# Good — only runs if file doesn't exist
- name: Check if swap file exists
ansible.builtin.stat:
path: /swapfile
register: swap_file
- name: Create swap file
ansible.builtin.command: fallocate -l 2G /swapfile
when: not swap_file.stat.exists
Or use the creates parameter:
- name: Create swap file
ansible.builtin.command:
cmd: fallocate -l 2G /swapfile
creates: /swapfile # Skip if this file exists
Secrets Management with Ansible Vault
Never store passwords, API keys, or certificates in plain text. Ansible Vault encrypts sensitive data at rest.
Creating encrypted variables:
# Encrypt a single variable
ansible-vault encrypt_string 'my_secret_password' --name 'db_password'
# Encrypt an entire file
ansible-vault encrypt group_vars/production/vault.yml
Using encrypted variables in playbooks:
# group_vars/production/vault.yml (encrypted)
vault_db_password: "encrypted_value_here"
vault_api_key: "encrypted_value_here"
# group_vars/production/main.yml (plain text, references vault)
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"
Running playbooks with vault:
# Prompt for password
ansible-playbook site.yml --ask-vault-pass
# Use a password file (for CI/CD)
ansible-playbook site.yml --vault-password-file ~/.vault_pass
Best practices:
- Prefix vault variables with
vault_for clarity - Keep one vault file per environment:
group_vars/production/vault.yml - Use
no_log: trueon tasks that handle secrets to prevent console leakage - Store the vault password file outside the repository (
.gitignoreit)
Error Handling & Debugging
Graceful error handling:
# Continue on failure for non-critical tasks
- name: Clean temporary files
ansible.builtin.file:
path: /tmp/build-artifacts
state: absent
ignore_errors: true
# Custom failure conditions
- name: Check application health
ansible.builtin.uri:
url: "http://localhost:8000/health"
return_content: true
register: health_check
failed_when: "'healthy' not in health_check.content"
retries: 5
delay: 10
until: health_check.status == 200
Debugging:
# Verbose output (1-4 levels)
ansible-playbook site.yml -vvv
# Check mode — dry run without making changes
ansible-playbook site.yml --check --diff
# Step through tasks interactively
ansible-playbook site.yml --step
# Start at a specific task
ansible-playbook site.yml --start-at-task="Install Docker"
Use ansible.builtin.debug for variable inspection:
- name: Show gathered facts
ansible.builtin.debug:
var: ansible_distribution
- name: Show custom message
ansible.builtin.debug:
msg: "Deploying version {{ app_version }} to {{ inventory_hostname }}"
Performance Optimization
SSH pipelining — the single biggest performance gain:
[ssh_connection]
pipelining = True
This requires requiretty to be disabled in /etc/sudoers. Most modern distributions don't set it.
Fact caching — avoid re-gathering facts on every run:
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 3600
Parallel execution:
[defaults]
forks = 20 # Default is 5, increase for large inventories
Other optimizations:
- Use
gather_facts: falsewhen you don't need host facts - Use
strategy: freeto let fast hosts proceed without waiting for slow ones - Minimize
ansible.builtin.command/ansible.builtin.shellusage (module-based tasks are faster) - Use
asyncandpollfor long-running tasks that don't need sequential execution
Testing Ansible Code
Syntax checking:
ansible-playbook site.yml --syntax-check
Linting with ansible-lint:
pip install ansible-lint
ansible-lint playbooks/*.yml roles/*/tasks/*.yml
Molecule for role testing:
pip install molecule molecule-docker
cd roles/common
molecule init scenario --driver-name docker
molecule test
Molecule creates a Docker container, runs your role against it, verifies the result, and cleans up. It's the standard for CI/CD testing of Ansible roles.
Check mode for safe verification:
# Show what would change without making changes
ansible-playbook site.yml --check --diff --limit staging
Security Considerations
-
Least privilege — don't use
become: trueglobally. Set it per-task where needed. -
Audit trail — enable logging in
ansible.cfg:
[defaults]
log_path = /var/log/ansible.log
- Restrict SSH access — use dedicated deploy keys, not personal keys
-
Validate inputs — use
ansible.builtin.assertto check variables before applying - Pin versions — specify exact package versions for reproducibility
-
Use
no_log: true— on any task that handles passwords, tokens, or keys
CI/CD Integration
Run Ansible from CI/CD pipelines for automated provisioning:
# GitHub Actions example
- name: Run Ansible playbook
run: |
ansible-playbook \
-i inventory/hosts.yml \
--vault-password-file <(echo "${{ secrets.VAULT_PASSWORD }}") \
--limit production \
playbooks/deploy.yml
CI/CD checklist:
- Store vault password as a CI secret
- Use
--check --diffin PR pipelines for dry-run verification - Apply changes only on merge to main
- Use
--limitto target the correct environment - Add
ansible-lintto your PR checks
Common Anti-Patterns
| Anti-Pattern | Why It's Bad | Do This Instead |
|---|---|---|
Using shell: for everything |
Not idempotent, slow | Use built-in modules |
| Hardcoded values in tasks | Not reusable | Use defaults/main.yml variables |
| Giant monolithic playbooks | Hard to debug and reuse | Split into roles and focused playbooks |
| No tags on tasks | Can't run selectively | Add tags to every task block |
| Storing secrets in plain text | Security risk | Use Ansible Vault |
Ignoring --check mode |
No safe preview | Test with --check --diff first |
| Running as root everywhere | Excessive privileges | Use become: per-task |
| No error handling | Silent failures | Use failed_when, retries, assert
|
Summary
Writing production-quality Ansible automation comes down to:
- Structure — organize by concern, use roles for reuse
- Idempotency — every run should be safe to repeat
- Variables — configure at the right level, never hardcode
- Security — vault for secrets, least privilege, audit logging
- Testing — lint, check mode, and Molecule before production
The playbooks in this collection demonstrate these patterns. Start with server-setup.yml on a staging server, verify with --check --diff, then expand to production.
Part of Ansible Playbook Collection by Datanest Digital
This is 1 of 6 resources in the DevOps Toolkit Pro toolkit. Get the complete [Ansible Playbook Collection] with all files, templates, and documentation for $XX.
Or grab the entire DevOps Toolkit Pro bundle (6 products) for $178 — save 30%.
Top comments (0)