This is Part 2 of my pre-implementation journey — a feature I am contributing to the open-source Debezium Platform project. In Part 1, I learned SSH from scratch — generated key pairs, understood ~/.ssh/config, fixed permission errors, and built two Docker containers as fake SSH servers on my MacBook.
If you haven't read Part 1, here's the short version: I have two Docker containers (db-server-1 and db-server-2) that accept SSH connections via key-based auth using a config alias. My ~/.ssh/config file looks like this:
Host db-server-1
HostName 127.0.0.1
User deploy
Port 2201
IdentityFile ~/.ssh/ddd41_practice
Host db-server-2
HostName 127.0.0.1
User deploy
Port 2202
IdentityFile ~/.ssh/ddd41_practice
Now the question is: what do I actually do with those SSH connections?
The answer is Ansible.
Why Ansible — And Why Not Just Java + SSH?
This was the first question I had to answer before studying Ansible at all.
Think about it: I already have SSH access to my remote servers. So why can't I just open those connections from Java and run shell commands? Well, SSH gives you a tunnel — it doesn't give you an automation engine. If I need to provision multiple remote servers from my main machine, SSH alone becomes incredibly tedious. I'd have to manually handle every step:
- Install Docker
- Start the Docker daemon
- Add the SSH user to the
dockergroup - Deploy the Host Agent as a
systemdservice - Pre-pull the Debezium Server Docker image
My first instinct as a Java developer was: "Can't I just use JSch or Apache MINA-SSHD and do all of this in Java?"
Let me show you why that instinct was wrong:
| Approach | What you'd have to write |
|---|---|
| Pure Java + SSH library | OS detection logic, apt vs yum/dnf branching, error handling, retry logic, idempotency checks — for every single step
|
| Ansible + YAML playbook | ~150 lines of YAML. All of the above is handled by built-in Ansible modules. |
Ansible handles OS detection (ansible_os_family), idempotency (modules check current state before acting), error reporting, retries, and parallel execution — for free. The Debezium design document puts it plainly: Java just runs ProcessBuilder to call ansible-playbook. Ansible does the heavy lifting.
Setting Up Ansible
Installing Ansible on macOS is a one-liner:
brew install ansible
Verify the installation:
ansible --version
You should see ansible [core 2.17.x] or similar. Then install the Docker community collection (needed later for pulling images idempotently):
ansible-galaxy collection install community.docker
The Mental Model — Four Things You Need to Understand
Before I ran a single Ansible command, I forced myself to understand four core concepts. Without these, you're just copying commands without knowing why they work.
1. Control Node
Your Mac (or whatever machine you're running Ansible from). This is where Ansible is installed and where you execute ansible-playbook. Here's the important part: Ansible does not need to be installed on the remote machines — only on your control node.
2. Managed Node
The remote host where Ansible will make changes. It only needs Python 3 and SSH access. Both of our Docker containers qualify.
3. Inventory
A list of hosts for Ansible to target. This can be a file (like inventory.ini) or an inline string passed on the command line.
The design uses inline ad-hoc inventory — a comma-separated string passed directly via the -i flag:
ansible-playbook host-setup.yml -i "db-server-1,"
See that trailing comma after the hostname? That is not a typo — it is required. Without it, Ansible interprets the string as a filename and tries to open a file called db-server-1 on your disk. The comma tells the parser: "This is a comma-separated list of hosts that happens to contain exactly one item."
💡 Why not use an inventory file? Because hosts in DDD-41 are dynamic — they come from
~/.ssh/config, not a static file. The JavaHostProvisioningServiceprovisions one host at a time and builds the inventory string programmatically.
And here's the elegant part: when Ansible sees db-server-1 in the inventory, it doesn't need you to explicitly pass an IP address, port, or private key path. It calls your system's native OpenSSH client under the hood, which automatically reads ~/.ssh/config, resolves the alias, and establishes the connection. Zero extra configuration.
4. Playbook
A YAML file describing what to do. It contains plays, each play contains tasks, and each task calls a module. Think of it as a nested structure:
Playbook
└── Play (target: all hosts in inventory)
├── Task 1: Bootstrap Python
├── Task 2: Install Docker
├── Task 3: Start Docker service
├── Task 4: Add user to docker group
├── Task 5: Deploy Host Agent as systemd service
└── Task 6: Pre-pull Debezium Server image
How Ansible Reads ~/.ssh/config Automatically
This is the part that genuinely surprised me. I assumed I'd have to pass IP addresses, ports, and key paths directly into the playbook or build some mapping layer in Java.
Nope.
When Ansible connects to db-server-1, it delegates the connection to your system's OpenSSH client. OpenSSH automatically reads ~/.ssh/config. That means Ansible inherits your entire SSH configuration for free:
- ✅ What IP address to connect to
- ✅ What port to use
- ✅ What username to log in with
- ✅ Which private key to authenticate with
The sysadmin maintains ~/.ssh/config — everything else flows from it automatically.
The ansible.cfg File — Avoiding Repetitive Flags
Before running any commands, I created a project-level config file so I wouldn't have to repeat flags every time:
mkdir -p ~/ddd41-lab/ansible
cat > ~/ddd41-lab/ansible/ansible.cfg << 'EOF'
[defaults]
host_key_checking = False
gathering = explicit
timeout = 30
[ssh_connection]
ssh_args = -F ~/.ssh/config -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
pipelining = True
EOF
Here's what these settings do:
-
host_key_checking = False— Disables the "Are you sure you want to continue connecting?" prompt. This is fine for a lab environment. Never do this in production. -
gathering = explicit— Tells Ansible not to auto-gather system facts at playbook start. We do it manually after bootstrapping Python, which matters on fresh hosts that might not have Python installed yet. -
ssh_args = -F ~/.ssh/config— Explicitly tells SSH to use our config file. -
pipelining = True— Reduces the number of SSH sessions needed per task. Faster execution.
Ansible reads ansible.cfg from the current working directory when you run a command. So as long as I cd ~/ddd41-lab/ansible before running anything, this config is automatically active.
First Test: Ad-Hoc Ping
An ad-hoc command runs a single Ansible module without a playbook — perfect for quick connectivity checks.
cd ~/ddd41-lab/ansible
ansible db-server-1 -i "db-server-1," -m ping
⚠️ Common misconception: The Ansible
pingmodule does NOT send ICMP packets (unlike thepingcommand in your terminal). It connects via SSH, runs a tiny Python script on the remote host, and verifies that Python is available and working. A successfulpingmeans: SSH works AND Python is installed.
To test both servers at once:
ansible all -i "db-server-1,db-server-2" -m ping
Both servers are alive and responding. Time to write a real playbook.
My First Real Playbook — A Connectivity Check
cat > ~/ddd41-lab/ansible/01-ping.yml << 'EOF'
---
- name: Verify connectivity to all lab hosts
hosts: all
gather_facts: false
tasks:
- name: Ping the host
ansible.builtin.ping:
- name: Print hostname
ansible.builtin.command: hostname
register: result
- name: Show hostname output
ansible.builtin.debug:
msg: "Remote hostname is: {{ result.stdout }}"
EOF
Run it:
ansible-playbook 01-ping.yml -i "db-server-1,db-server-2"
Understanding Task Status Colors
Ansible uses colour-coded output so you can read results at a glance:
| Status | Color | What It Means |
|---|---|---|
ok |
🟢 Green | Task ran successfully, nothing needed to change |
changed |
🟡 Yellow | Task ran and made a modification |
skipped |
🔵 Cyan | Task was skipped (condition not met) |
failed |
🔴 Red | Task failed |
unreachable |
🔴 Red | Could not connect to the host at all |
YAML Basics — Because Playbooks Are All YAML
Before writing the full playbook, I needed to make sure my YAML fundamentals were solid. One wrong indentation and the entire playbook breaks. Here are the rules:
- Spaces only, never tabs. Ansible recommends 2 spaces for indentation.
-
Lists start with
-(dash + space). -
Dictionaries use
:(colon + space). -
Comments start with
#. - Strings with special characters need quotes.
🫠 If you have tabs in a YAML file, Ansible throws the most cryptic error messages you've ever seen. I learned this the hard way when I copy-pasted a snippet from a web page that had invisible tab characters. Spent 20 minutes debugging a syntax error that was literally invisible.
Building the DDD-41 Provisioning Playbook
Now we get to the core of the project.our playbook needs to execute 6 specific provisioning steps to transform a clean server into a fully managed Debezium host.
Fair warning: if you copy the standard textbook templates for these steps, your pipeline will crash. the real-world errors I hit, and how I actually fixed them.
Step 1: Bootstrap Python (The "Permission Denied" Trap)
Ansible modules are agentless, but they require Python to be present on the remote host to execute tasks. If a server is completely fresh, Python might not be installed yet. The ansible.builtin.raw module solves this chicken-and-egg problem — it sends raw SSH shell commands directly, bypassing the Python requirement entirely.
The textbook version:
# ❌ WILL FAIL: Permission denied
- name: Bootstrap — install Python3 on Debian/Ubuntu
ansible.builtin.raw: |
apt-get update -qq && apt-get install -y python3 python3-pip
changed_when: false
What actually happened: The playbook instantly crashed with a wall of red:
"E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)"
The fix: When Ansible logs into the server, it uses the standard user from your SSH config — in our case, the deploy user. A regular user doesn't have permission to install system packages. We need to tell Ansible to escalate privileges using sudo:
# ✅ CORRECT: Explicitly escalates privileges via sudo
- name: Bootstrap — install Python3 on Debian/Ubuntu
ansible.builtin.raw: |
apt-get update -qq && apt-get install -y python3 python3-pip
changed_when: false
become: true
That single line — become: true — maps directly to running the command with sudo. It works seamlessly here because our Docker container's base setup configures the deploy user with passwordless sudo access in the /etc/sudoers file (which we set up in Part 1).
Step 2: Gather Facts (The Deprecation Warning)
Once Python is bootstrapped, we can safely collect system information. Ansible handles this through the setup module:
- name: Gather facts
ansible.builtin.setup:
This populates an internal inventory of the server — CPU architecture, RAM, Linux distribution, OS family, and more. We need this data for the next step where we conditionally install Docker based on the OS.
The gotcha: Older guides and tutorials reference top-level variables like when: ansible_os_family == "Debian". Modern versions of Ansible flag this with a bright yellow Deprecation Warning. To future-proof your playbooks, access facts through the formal dictionary:
# ❌ Old way (triggers deprecation warning)
when: ansible_os_family == "Debian"
# ✅ Modern way
when: ansible_facts['os_family'] == "Debian"
Step 3: Install Docker (The Package Name Mismatch)
The playbook needs to conditionally install Docker based on whether the target is running Debian/Ubuntu or RHEL/CentOS.
The textbook version:
# ❌ WILL FAIL: Package not found
- name: Install Docker (Debian/Ubuntu)
ansible.builtin.apt:
name:
- docker.io
- docker-compose-plugin
state: present
update_cache: yes
become: true
when: ansible_facts['os_family'] == "Debian"
What actually happened:
"msg": "No package matching 'docker-compose-plugin' is available"
The fix: The textbook assumed our servers had Docker's official apt repository pre-configured. Our lab containers use vanilla Ubuntu repositories, where the package is simply called docker-compose — not docker-compose-plugin:
# ✅ CORRECT: Uses package names from default Ubuntu repositories
- name: Install Docker (Debian/Ubuntu)
ansible.builtin.apt:
name:
- docker.io
- docker-compose
state: present
update_cache: yes
become: true
when: ansible_facts['os_family'] == "Debian"
📝 Lesson learned: Always verify package names against the actual repositories available on your target host.
apt-cache search dockeris your best friend.
Step 4: Start Docker Daemon (The Container Limitation)
Once Docker is installed, we tell the host to start the daemon and enable it on boot:
# ⚠️ EXPECTED TO FAIL IN LAB: No systemd inside Docker containers
- name: Start and enable Docker service
ansible.builtin.service:
name: docker
state: started
enabled: yes
become: true
ignore_errors: yes
What actually happened:
"System has not been booted with systemd as init system (PID 1)."
Why this is expected: Our "servers" are lightweight Docker containers, not real VMs. They don't run systemd as PID 1 — they lack a full init system. Since our goal here is to validate the Ansible playbook logic and Java ProcessBuilder orchestration (not to actually run Docker-in-Docker on a laptop), we add ignore_errors: yes. Ansible logs the failure, shrugs, and moves on to the next task.
On a real bare-metal server or cloud VM, this task would succeed without issues.
Step 5: Add User to Docker Group (The Undefined Variable)
To let our deploy user run Docker commands without sudo, we add it to the docker group:
# ⚠️ WILL FAIL: Variable not defined
- name: Add deploy user to docker group
ansible.builtin.user:
name: "{{ ansible_user }}"
groups: docker
append: yes
become: true
🛡️ The
append: yesflag is critical. Without it, Ansible'susermodule replaces all existing group memberships. Withappend: yes, it addsdockerto the user's existing groups without removing anything. This is idempotency in action — if the user is already in thedockergroup, Ansible does nothing.
What actually happened:
"msg": "Error while resolving value for 'name': 'ansible_user' is undefined"
The fix: Standard Ansible setups define ansible_user in static inventory files (like hosts.ini). Since we use inline ad-hoc inventory (-i "db-server-1,"), there's no file where this variable is declared. Ansible can't resolve it.
The solution is to explicitly define it in a vars block at the top of the playbook:
vars:
ansible_user: "deploy"
This ensures the template variable {{ ansible_user }} resolves correctly everywhere in the playbook.
Step 6: Deploy the Host Agent as a Systemd Service
The Java orchestrator needs a persistent remote process to communicate with. Since we haven't compiled the real agent yet, the playbook deploys a mock agent — a lightweight shell script running a sleep loop — to validate that our directory structures, file permissions, and service configurations are structurally correct.
- name: Create agent directory
ansible.builtin.file:
path: /opt/debezium-agent
state: directory
owner: "{{ ansible_user }}"
mode: '0755'
become: true
- name: Create mock agent script (for lab testing)
ansible.builtin.copy:
dest: /opt/debezium-agent/agent.sh
content: |
#!/bin/bash
echo "Debezium Host Agent starting on port {{ agent_port }}..."
echo "Token: {{ agent_token | default('NO_TOKEN_PROVIDED') }}"
while true; do sleep 3600; done
owner: "{{ ansible_user }}"
mode: '0755'
become: true
- name: Create systemd service for Host Agent
ansible.builtin.copy:
dest: /etc/systemd/system/debezium-agent.service
content: |
[Unit]
Description=Debezium Host Agent
After=network.target docker.service
Requires=docker.service
[Service]
Type=simple
User={{ ansible_user }}
ExecStart=/opt/debezium-agent/agent.sh
Restart=always
RestartSec=5
Environment="AGENT_PORT={{ agent_port }}"
Environment="AGENT_TOKEN={{ agent_token | default('test-token') }}"
[Install]
WantedBy=multi-user.target
mode: '0644'
become: true
- name: Reload systemd and start agent (expected to fail in lab)
ansible.builtin.systemd:
name: debezium-agent
state: started
enabled: yes
daemon_reload: yes
become: true
ignore_errors: yes
The {{ agent_port }} and {{ agent_token }} placeholders use Jinja2 templating — Ansible's template engine. These values get injected at runtime through the -e (extra vars) flag when we run the playbook. And just like Step 4, the final systemd reload uses ignore_errors: yes because our Docker containers don't have a real init system.
The Complete Playbook
Here's the full host-setup.yml with all six steps assembled:
---
################################################################################
# DDD-41 Host Provisioning Playbook
# Provisions a bare-metal or VM host to run Debezium Server containers.
# Usage: ansible-playbook host-setup.yml -i "<ssh-alias>,"
################################################################################
- name: Bootstrap and provision Debezium host
hosts: all
gather_facts: false
vars:
agent_port: 8090
agent_version: "1.0.0"
debezium_image: "quay.io/debezium/server:latest"
ansible_user: "deploy"
tasks:
############################################################
# Step 1: Bootstrap Python using raw module
############################################################
- name: Bootstrap — install Python3 on Debian/Ubuntu
ansible.builtin.raw: |
apt-get update -qq && apt-get install -y python3 python3-pip
changed_when: false
become: true
- name: Gather facts
ansible.builtin.setup:
############################################################
# Step 2: Install Docker
############################################################
- name: Install Docker (Debian/Ubuntu)
ansible.builtin.apt:
name:
- docker.io
- docker-compose
state: present
update_cache: yes
become: true
when: ansible_facts['os_family'] == "Debian"
- name: Install Docker (RHEL/CentOS)
ansible.builtin.yum:
name: docker
state: present
become: true
when: ansible_facts['os_family'] == "RedHat"
############################################################
# Step 3: Start and enable Docker daemon
############################################################
- name: Start and enable Docker service
ansible.builtin.service:
name: docker
state: started
enabled: yes
become: true
ignore_errors: yes
############################################################
# Step 4: Add SSH user to docker group
############################################################
- name: Add deploy user to docker group
ansible.builtin.user:
name: "{{ ansible_user }}"
groups: docker
append: yes
become: true
############################################################
# Step 5: Pre-pull Debezium Server image
############################################################
- name: Pre-pull Debezium Server Docker image
ansible.builtin.shell: |
docker pull {{ debezium_image }}
become: true
register: pull_result
changed_when: "'Pull complete' in pull_result.stdout or 'Downloaded' in pull_result.stdout"
ignore_errors: yes
############################################################
# Step 6: Deploy the Host Agent as a systemd service
############################################################
- name: Create agent directory
ansible.builtin.file:
path: /opt/debezium-agent
state: directory
owner: "{{ ansible_user }}"
mode: '0755'
become: true
- name: Create mock agent script (for lab testing)
ansible.builtin.copy:
dest: /opt/debezium-agent/agent.sh
content: |
#!/bin/bash
echo "Debezium Host Agent starting on port {{ agent_port }}..."
echo "Token: {{ agent_token | default('NO_TOKEN_PROVIDED') }}"
while true; do sleep 3600; done
owner: "{{ ansible_user }}"
mode: '0755'
become: true
- name: Create systemd service for Host Agent
ansible.builtin.copy:
dest: /etc/systemd/system/debezium-agent.service
content: |
[Unit]
Description=Debezium Host Agent
After=network.target docker.service
Requires=docker.service
[Service]
Type=simple
User={{ ansible_user }}
ExecStart=/opt/debezium-agent/agent.sh
Restart=always
RestartSec=5
Environment="AGENT_PORT={{ agent_port }}"
Environment="AGENT_TOKEN={{ agent_token | default('test-token') }}"
[Install]
WantedBy=multi-user.target
mode: '0644'
become: true
- name: Reload systemd and start agent (expected to fail in lab)
ansible.builtin.systemd:
name: debezium-agent
state: started
enabled: yes
daemon_reload: yes
become: true
ignore_errors: yes
- name: Report provisioning complete
ansible.builtin.debug:
msg: |
✅ Host {{ inventory_hostname }} provisioned successfully!
Docker: installed and running
Agent: deployed on port {{ agent_port }}
Running the Full Playbook
cd ~/ddd41-lab/ansible
ansible-playbook host-setup.yml \
-i "db-server-1," \
-e "agent_port=8090 agent_token=test-bearer-token-abc123"
And then... you hold your breath and watch the terminal scroll.
Reading the Output — What Those Errors Actually Mean
When the playbook finishes, you'll see some red text. Don't panic. Let me walk you through each piece.
1. The Docker Pull Error
failed to connect to the docker API... no such file or directory
Why this happened: Remember Step 3, where we tried to start the Docker daemon but it failed because our Docker containers can't run systemd? Since the Docker engine isn't running, Step 5 (pulling an image) physically cannot work.
Why it's fine: Look right below the error message. You'll see:
...ignoring
Our ignore_errors: yes safety net caught the crash and allowed the playbook to continue. Exactly as designed.
2. The Systemd Error
System has not been booted with systemd as init system (PID 1).
Why this happened: Same root cause. Docker containers don't have systemd running as PID 1. The agent service can't be started.
Why it's fine: Same ignore_errors: yes — Ansible logs it, prints ...ignoring, and moves on.
3. The Green Checkmark
"msg": "✅ Host db-server-1 provisioned successfully!\nDocker: installed and running\nAgent: deployed on port 8090\n"
Ansible reached the very end of the playbook. It created all directories, wrote the bash scripts, injected our agent_port: 8090 variable dynamically via Jinja2 templating, and printed the final success message.
4. The Play Recap — Your Report Card
PLAY RECAP ****************************************************
db-server-1 : ok=11 changed=4 unreachable=0 failed=0 skipped=1 rescued=0 ignored=3
Let me decode this:
| Metric | Value | What It Means |
|---|---|---|
failed |
0 | No task permanently failed. The deployment is a success. |
ignored |
3 | Ansible caught and bypassed 3 expected lab limitations (Docker daemon, image pull, systemd reload). |
changed |
4 | Four things were created: agent directory, mock script, service file, and Python was installed. |
skipped |
1 | The RHEL/CentOS Docker install was skipped (because our containers run Ubuntu). |
The bottom line: The logic is flawless. If you ran this exact playbook against a real bare-metal Ubuntu server, the ignored=3 would drop to 0, the red text would vanish, and it would deploy a real Debezium host end-to-end.
Verifying Everything Worked
Trust, but verify:
# Docker installed?
ssh db-server-1 "docker --version"
# Agent directory created?
ssh db-server-1 "ls -la /opt/debezium-agent/"
# Systemd service file exists?
ssh db-server-1 "cat /etc/systemd/system/debezium-agent.service"
Each of these should return exactly what our playbook configured. The directory is there, the mock script is executable, and the service file has our templated agent_port and agent_token values baked in.
The Most Important Concept: Idempotency
This is the golden rule of configuration management and the core reason why the Debezium design document specifies Ansible for the host provisioning engine.
Idempotent means: running an operation multiple times produces the exact same result without unintended side effects. If Docker is already installed, the playbook shouldn't reinstall it. If a directory already exists with the correct permissions, Ansible should leave it untouched.
Why This Matters for DDD-41
According to Section 3 of the design document, the platform architecture features a dynamic file watcher that monitors ~/.ssh/config. If a sysadmin modifies a host's IP address or changes an SSH alias, the watcher automatically re-triggers the provisioning pipeline.
Because of this, our playbook must be completely safe to re-run against an active, healthy host without disrupting running services.
Modules vs. Shell: The Structural Difference
To achieve idempotency, you must favor native Ansible modules over raw shell commands. Here's the exact contrast:
# ❌ NOT IDEMPOTENT: Runs every time, always reports "changed"
- name: Add user to docker group
ansible.builtin.shell: usermod -aG docker deploy
# ✅ IDEMPOTENT: Checks current state first, only acts if needed
- name: Add user to docker group
ansible.builtin.user:
name: "deploy"
groups: docker
append: yes
become: true
The shell version blindly runs usermod every time — even if the user is already in the group. It always reports changed, making it impossible to tell if your playbook actually did anything meaningful.
The module version inspects the system state first. If deploy is already in the docker group, it does nothing and reports ok. This is the difference between "automation" and "reliable automation."
The rule is simple: always prefer built-in Ansible modules over shell or command tasks. Modules are designed to be idempotent out of the box. Shell scripts are blind to existing state unless you manually wrap them with conditional checks.
What Happens on the Second Run?
To verify idempotency, I ran the exact same playbook a second time against the same containers:
PLAY RECAP ****************************************************
db-server-1 : ok=11 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=3
See that? changed=0.
Here's what happened:
Structural tasks shifted from yellow to green. Creating the agent directory, writing the mock script, and generating the service file all reported
okinstead ofchanged. Ansible checked the containers, verified the files matched the playbook specification exactly, and skipped rewriting them.Lab workarounds still triggered. The Docker daemon, image pull, and systemd reload tasks still hit their container limitations and fell back to
ignore_errors: yes. That's expected and correct.Zero unnecessary changes. The playbook confirmed the environment, touched nothing that was already correct, and proved that our provisioning pipeline is completely safe for continuous re-execution.
That changed=0 on the second run is the ultimate proof that your playbook is well-structured. It means the pipeline can safely loop through the same host over and over without causing drift or disruption.
🧠 Think of it this way: A good playbook is like a good
ifstatement — it only does work when the condition demands it.
What I Took Away From All This
By the end of this phase, I went from viewing Ansible as just another DevOps buzzword to understanding how to design and debug a resilient infrastructure pipeline. Wrangling with configuration errors on my M1 Mac forced me to appreciate the nuance required to build production-ready automation.
Here are the three architectural insights that clicked for me:
1. The Power of Delegating to OpenSSH
The thing that surprised me most was how cleanly Ansible handles networking. I initially assumed I'd have to pass explicit IP addresses, ports, and private key paths into the playbook or build a complex mapping layer in Java.
Instead, Ansible completely offloads connection management to the system's native OpenSSH client. Because OpenSSH automatically reads ~/.ssh/config, Ansible inherits that entire configuration for free. This made the DDD-41 host discovery architecture click: a sysadmin maintains one standard SSH config file, the platform watches it for changes, and the automation engine handles everything else.
2. Idempotency Is Verified, Not Assumed
Running this playbook six or seven times taught me what idempotency actually means in practice. On the first run, my terminal was flooded with yellow changed statuses as directories were created and packages were installed. On the second run, every structural task shifted to green ok.
Seeing changed=0 in the play recap is the ultimate proof of a well-behaved playbook. It proves that by favoring native modules over blind shell commands, the engine inspects remote state before touching a single file. This guarantee is critical for the file-watcher architecture — if a config change re-triggers provisioning, the pipeline passes through an active, healthy host without breaking anything.
3. A Clean Separation of Concerns
The boundary between the Java control plane and the Ansible automation layer is completely decoupled:
-
Java's job: Manage the high-level lifecycle state machine (
PENDING → PROVISIONING → READY/FAILED), handle asynchronous execution so the application never freezes during provisioning, and feed context variables into the deployment. - Ansible's job: Manage the concrete reality of the remote host — validate package states, create directories, write service configurations.
They don't need to know anything about each other's internals. Java fires off a ProcessBuilder, passes the runtime flags (-e), and waits for an exit code. A 0 means success; anything else means failure. Clean, simple, decoupled.
With SSH key authentication (Part 1) and Ansible provisioning playbooks (Part 2) fully tested and running against my local container lab, the automation foundation is rock solid.
Thank you for reading! If this helped you understand Ansible better (or saved you from the same Permission denied errors I hit), drop a comment or share your feedback. I'd love to hear how you'd approach this differently.
Happy automating! 🚀







Top comments (0)