It was a Monday morning. A routine playbook. One task: state=latest.
Forty-seven minutes later the payments team had a P1 incident, 50 production web servers were running a version of Nginx nobody had approved, and the postmortem had a very uncomfortable finding: Ansible did exactly what it was told to do.
This article covers what happened, how idempotency actually works in production (not how tutorials describe it), and how to install and use Ansible without repeating this.
The Incident: What Actually Happened
The task looked like this:
- name: Ensure nginx is installed
ansible.builtin.apt:
name: nginx
state: latest
update_cache: yes
Symptom: Users started seeing SSL handshake failed errors immediately after the playbook completed. Payment gateway API calls timed out. Transaction success rate dropped below 2% within 90 seconds of the run completing.
Root cause: state: latest on a Monday morning after a weekend Ubuntu mirror sync pulled Nginx from 1.24 to 1.26 across the entire fleet simultaneously. Nginx 1.26 introduced a TLS configuration change that broke the handshake with the payment processor's aging intermediate certificate chain. Nobody tested it. Nobody saw it coming. The task logged changed — but didn't log what changed.
The fix: Roll back with state: present version=1.24.* and pin the version in apt preferences. Forty-seven minutes of payment outage for one missing word in a task definition.
The lesson that matters:
state: latestis not idempotency. It is an instruction to upgrade on every run. Idempotency means reaching a defined state.latestis not a defined state — it's a moving target.
Installing Ansible Properly (Ubuntu 22.04)
The official PPA gives you a more recent Ansible than the Ubuntu default repos, and it's the right way to set up a control node that won't surprise you six months later.
# Add the Ansible PPA
sudo apt-add-repository ppa:ansible/ansible -y
sudo apt update
# Install
sudo apt install ansible -y
# Verify
ansible --version
You should see output like:
ansible [core 2.17.x]
config file = /etc/ansible/ansible.cfg
python version = 3.10.x
Three installation methods compared:
| Method | Ansible version | Use when |
|---|---|---|
apt (default repo) |
Older (2.10.x on Ubuntu 22) | You need OS-managed packages |
PPA (ppa:ansible/ansible) |
Recent stable | Most production control nodes |
pip install ansible |
Latest | Testing, dev environments, containers |
For production control nodes, the PPA is the correct choice. Pip installs are fine for dev but create version drift when the Python environment changes.
Your Inventory File — The Part Everyone Configures Wrong
Before you run anything, Ansible needs to know what to connect to. The default inventory is /etc/ansible/hosts. For anything real, use a project-local inventory:
# inventory/production
[web]
web01.example.com
web02.example.com
web03.example.com
[db]
db01.example.com
[web:vars]
ansible_user=deploy
ansible_ssh_private_key_file=~/.ssh/id_ed25519
Test connectivity before writing a single playbook:
ansible all -i inventory/production -m ping
Expected output:
web01.example.com | SUCCESS => {"ping": "pong"}
web02.example.com | SUCCESS => {"ping": "pong"}
If you see UNREACHABLE — check SSH key permissions (chmod 600), verify the target's ~/.ssh/authorized_keys, and confirm the ansible_user exists on the target.
Ad-Hoc Commands — For When You Need Fast Answers
Ad-hoc commands are the fastest way to use Ansible without writing a playbook. Essential for operational work:
# Check uptime across all web servers
ansible web -i inventory/production -m shell -a "uptime"
# Copy a config file
ansible web -i inventory/production -m copy -a "src=./nginx.conf dest=/etc/nginx/nginx.conf"
# Install a package (note: state=present, not latest)
ansible web -i inventory/production -m apt -a "name=nginx state=present update_cache=yes"
# Restart a service
ansible web -i inventory/production -m service -a "name=nginx state=restarted"
# Check free disk space
ansible all -i inventory/production -m shell -a "df -h"
The key pattern: -m module_name -a "module_arguments".
Your First Playbook — Install Apache the Right Way
The hello-world of Ansible playbooks. Notice the version pinning:
---
- name: Configure web server cluster
hosts: web
become: yes
vars:
apache_version: "2.4.*" # Pin to minor version, not latest
tasks:
- name: Install Apache (pinned version)
ansible.builtin.apt:
name: "apache2={{ apache_version }}"
state: present # Not latest — ever
update_cache: yes
- name: Deploy site configuration
ansible.builtin.template:
src: templates/site.conf.j2
dest: /etc/apache2/sites-available/mysite.conf
mode: '0644'
notify: reload apache # Handler — only runs if this task changes something
- name: Enable site
ansible.builtin.command:
cmd: a2ensite mysite
changed_when: false # This command doesn't change state idempotently
handlers:
- name: reload apache
ansible.builtin.service:
name: apache2
state: reloaded
Run it:
ansible-playbook -i inventory/production site.yml
Before you run anything in production, always run with --check --diff first:
ansible-playbook -i inventory/production site.yml --check --diff
This shows you exactly what would change without changing anything. Make it a habit.
The Production Rule That Prevents the State:latest Incident
Never use state: latest for packages that sit in front of external integrations.
Instead, use this pattern:
vars:
package_versions:
nginx: "1.24.*"
openssl: "3.0.*"
python3: "3.10.*"
tasks:
- name: Install packages at pinned versions
ansible.builtin.apt:
name: "{{ item.key }}={{ item.value }}"
state: present
loop: "{{ package_versions | dict2items }}"
When you need to upgrade, the version bump is a deliberate code change that goes through your review process — not an automatic consequence of running a playbook on a Monday morning after a weekend mirror sync.
What the Full Article Covers
This covers the foundations. The full guide at TheCodeForge goes deeper into:
- Ansible vs Terraform — when to use each and the honest production trade-offs
- Handler deduplication — why handlers run once regardless of how many tasks notify them, and when that matters
- Idempotency as a property you build, not something Ansible gives you for free
- The
changed_whenandfailed_whenpatterns that make playbooks actually trustworthy in CI/CD - Check mode and diff mode workflows for safe production changes
- A complete production troubleshooting guide with runnable diagnostic commands
Read the full Ansible Basics guide on TheCodeForge →
Written by Naren — 20 years in enterprise IT, production Ansible deployments across banking, insurance, and fintech environments. Founder of TheCodeForge.io — programming tutorials that explain the why before the how.
Top comments (0)