What you'll learn:
- How the OpenTofu (Terraform-compatible) project is structured
- How secrets are managed at the infrastructure layer with SOPS
- What Hetzner resources get created and why
- How cloud-init hardens the server before any code runs on it
- How Ansible bootstraps Docker without hardcoded IPs
One Command, One Server, Fully Ready
tofu apply
After approximately five minutes, the following resources are available:
- A hardened Ubuntu 24.04 server on Hetzner
- SSH key-only authentication, fail2ban, kernel hardening
- Docker installed with Swarm initialized
- An overlay network for inter-service communication
- Tailscale VPN joined (secure admin access)
- Cloudflare DNS records pointing to the new server IP
Then, once:
cd server && ansible-playbook setup_docker.yaml
SwarmCD is deployed and monitoring the Git repository. From this point, all deployments are triggered by git push.
This article covers each file that makes this possible.
Project Structure
The OpenTofu project has two layers: a root module that orchestrates everything, and a server/ child module that handles Hetzner-specific resources.
gitops/
├── main.tf # Wires modules, Cloudflare DNS, GitLab remote state
├── providers.tf # SOPS, Cloudflare, Hcloud, Ansible provider versions
├── variables.tf # Root variables (cloudflare_zone_id)
├── data.tf # Loads vault.yaml via SOPS
├── vault.yaml # Encrypted secrets (not vault.yaml.example)
└── server/
├── main.tf # All Hetzner resources
├── providers.tf
├── variables.tf # Sensitive inputs from root
├── outputs.tf # Exported values (IP, hostname, etc.)
├── hetzner.tfpl # cloud-init template
├── inventory.yaml # Dynamic Ansible inventory
├── ansible.cfg
└── setup_docker.yaml # Ansible bootstrap playbook
The root module passes secrets down to the server module. The server module passes connection details back up via outputs, which are then consumed by the dynamic Ansible inventory.
Secrets at the Infrastructure Layer
Before anything can be provisioned, OpenTofu needs API keys: Hetzner, Cloudflare, Tailscale OAuth. These are stored in vault.yaml, encrypted with SOPS using an age key that never touches version control.
vault.yaml.example shows the structure:
cloudflare_api_key: <TOKEN>
hetzner_api_key: <TOKEN>
tailscale_client_secret: <SECRET>
server_admin_password_hash: <BCRYPT_HASH>
gitlab_password: <PERSONAL_ACCESS_TOKEN>
The actual vault.yaml is encrypted. OpenTofu decrypts it at plan/apply time using the SOPS provider:
# data.tf
data "sops_file" "vault" {
source_file = "vault.yaml"
}
This is the first of two secret layers in this stack. Terraform-layer secrets are used only during provisioning — they create infrastructure and pass credentials into the server. They are never stored in Docker or exposed to running applications.
Why two secret layers? The Terraform layer (vault.yaml) holds API keys for cloud providers. The Docker layer holds runtime credentials for apps (GitLab token for SwarmCD, Grafana password for Alloy). Separating these means a compromised application cannot access cloud infrastructure credentials.
Hetzner Resources
server/main.tf creates every cloud resource the server needs.
Static IP Addresses
resource "hcloud_primary_ip" "primary_ipv4" {
type = "ipv4"
name = "primary_ipv4"
location = local.hcloud_server_location
auto_delete = false
assignee_type = "server"
}
auto_delete = false is critical. Without it, destroying the server also destroys the IP address. When rebuilding a server, losing the IP address requires updating DNS records, waiting for propagation, and may invalidate pending Let's Encrypt certificate challenges. With auto_delete = false, the IP persists independently of the server lifecycle.
Firewall
resource "hcloud_firewall" "primary_firewall" {
name = "primary_firewall"
rule { direction = "in"; protocol = "icmp"; source_ips = ["0.0.0.0/0", "::/0"] }
rule { direction = "in"; protocol = "tcp"; port = "80"; source_ips = ["0.0.0.0/0", "::/0"] }
rule { direction = "in"; protocol = "tcp"; port = "443"; source_ips = ["0.0.0.0/0", "::/0"] }
rule { direction = "in"; protocol = "udp"; port = "80"; source_ips = ["0.0.0.0/0", "::/0"] }
rule { direction = "in"; protocol = "udp"; port = "443"; source_ips = ["0.0.0.0/0", "::/0"] }
}
Only three port ranges are open to the public: ICMP, 80, and 443. Everything else is blocked at the Hetzner firewall level — not just in software, but in the network infrastructure before packets reach the server.
UDP 80/443 is included for HTTP/3 (QUIC), which Traefik can use.
SSH is not open in the firewall at all. Tailscale VPN handles all admin access — the SSH port is only reachable over the Tailscale network, so it is invisible to public internet scanners.
Persistent Volume
resource "hcloud_volume" "primary_volume" {
name = "primary_volume"
size = 10 # GB
server_id = hcloud_server.primary_server.id
automount = true
format = "ext4"
}
The 10GB Hetzner volume is mounted separately from the server's root disk. It stores stateful data (database files, application data) that must survive server rebuilds. Hetzner volumes are not deleted when a server is deleted — they persist independently of the server lifecycle and can be attached to a replacement server.
Ansible later creates Docker bind-mount volumes pointing into this volume's mount path.
Server
resource "hcloud_server" "primary_server" {
name = "my-server"
image = "ubuntu-24.04"
server_type = "cx23" # 2 vCPU, 4GB RAM
location = local.hcloud_server_location
user_data = templatefile("${path.module}/hetzner.tfpl", {
admin_username = local.admin_username
admin_password_hash = var.admin_password_hash
admin_ssh_keys = values(local.admin_ssh_keys)
tailscale_client_id = local.tailscale_client_id
tailscale_client_secret = var.tailscale_client_secret
ssh_port = local.ssh_port
})
...
lifecycle {
ignore_changes = [user_data]
}
}
The user_data field is where cloud-init lives. The templatefile function renders hetzner.tfpl with values from Terraform variables — including the bcrypt-hashed admin password and Tailscale OAuth credentials. These are injected into the cloud-init template without ever being written to disk in plaintext.
The lifecycle block with ignore_changes = [user_data] is essential for production. Without it, every change to the cloud-init template would cause Terraform to destroy and recreate the server — a full machine wipe. cloud-init runs only on first boot; subsequent changes to the template have no effect on a running server anyway. Ignoring user_data drift allows changes to the cloud-init template without triggering a server rebuild.
cloud-init: OS Hardening on First Boot
cloud-init runs once on first boot, before any remote connection is established. This is where OS-level hardening happens.
The full template is at server/hetzner.tfpl. Here are the key sections:
Package Installation
packages:
- fail2ban
- auditd
- unattended-upgrades
- git
- curl
- ca-certificates
- build-essential
fail2ban and auditd are installed from packages (not Docker) because they need to monitor the host OS, not a container. unattended-upgrades enables automatic security patch installation.
User Creation
users:
- name: ${admin_username}
passwd: ${admin_password_hash}
lock_passwd: false
groups: sudo, docker
shell: /bin/bash
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys: ${jsonencode(admin_ssh_keys)}
The admin user is created with:
- A bcrypt-hashed password (generated locally, never sent in plaintext)
- Both your personal SSH key and an Ansible-specific SSH key
- Membership in the
dockergroup (can run Docker without sudo) - NOPASSWD sudo (automation-friendly)
The Ansible key is a separate ed25519 key generated locally (ssh-keygen -t ed25519 -f .ansible_key). It is used only for the bootstrap playbook and can be rotated or removed after provisioning.
SSH Hardening
- path: /etc/ssh/sshd_config.d/99-hardening.conf
content: |
PermitRootLogin prohibit-password
PasswordAuthentication no
MaxAuthTries 6
MaxSessions 3
X11Forwarding no
AllowAgentForwarding no
ClientAliveInterval 300
ClientAliveCountMax 2
LoginGraceTime 30
PasswordAuthentication no restricts authentication to SSH keys only. Combined with fail2ban banning source IPs after 3 failed attempts within a 1-hour window, brute-force SSH attacks are mitigated at both the authentication and network level.
Kernel Hardening
- path: /etc/sysctl.d/99-hardening.conf
content: |
# Prevent IP spoofing
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
# Ignore ICMP redirects (prevent routing attacks)
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
# SYN flood protection
net.ipv4.tcp_syncookies = 1
# IP forwarding (required for Docker networking and Tailscale)
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
Reverse path filtering blocks packets with source IPs that couldn't have arrived on the interface they came in on — a basic IP spoofing defense. SYN cookies protect against SYN flood DoS attacks. IP forwarding is required for both Docker's overlay networking and Tailscale's routing.
fail2ban Configuration
- path: /etc/fail2ban/jail.local
content: |
[DEFAULT]
bantime = 3600
findtime = 600
maxretry = 3
[sshd]
enabled = true
mode = aggressive
Three failed attempts within 10 minutes result in a 1-hour IP ban. Aggressive mode also covers additional SSH attack patterns beyond basic password failures.
Docker and Swarm Initialization
runcmd:
- sysctl --system
- systemctl enable --now fail2ban
- systemctl enable --now auditd
- systemctl restart ssh
- curl -fsSL https://get.docker.com | sh
- docker swarm init
- docker network create -d overlay --attachable swarm_network
- curl -fsSL https://tailscale.com/install.sh | sh
- tailscale up --ssh --accept-routes --advertise-exit-node
--advertise-tags=tag:server
--client-id=${tailscale_client_id}
--client-secret=${tailscale_client_secret}
- reboot
Docker is installed via the official convenience script. Swarm is initialized immediately. The swarm_network overlay network is created — all services in apps/ connect to this network, enabling inter-service communication across nodes as the cluster scales.
Tailscale authenticates via OAuth, requiring no interactive input. The --ssh flag enables Tailscale SSH, which Ansible uses for the bootstrap playbook.
The final reboot applies all sysctl changes and ensures all services start from a consistent state.
Cloudflare DNS via Terraform
Back in the root main.tf, Cloudflare DNS records are created after the server is provisioned:
resource "cloudflare_dns_record" "root" {
zone_id = var.cloudflare_zone_id
name = "@"
content = module.server.server_ipv4
type = "A"
proxied = true
}
resource "cloudflare_dns_record" "www" {
zone_id = var.cloudflare_zone_id
name = "www"
content = module.server.server_ipv4
type = "A"
proxied = true
}
proxied = true routes traffic through Cloudflare's CDN, concealing the origin server IP address. Traffic targeting the domain by IP address is handled by Cloudflare's infrastructure, keeping the origin server address private.
Dynamic Ansible Inventory
Instead of hardcoding the server IP in an inventory file, Ansible reads it directly from Terraform state:
# server/inventory.yaml
plugin: cloud.terraform.terraform_provider
binary_path: tofu
project_path: ../
This uses the cloud.terraform.terraform_provider Ansible plugin, which runs tofu output internally and maps the results to Ansible host variables. The server's Tailscale hostname, IP, SSH port, and SSH key path all come from Terraform outputs — no manual synchronization needed.
When a server is rebuilt, tofu apply updates the state and the next Ansible run reads the updated connection details from Terraform outputs.
Ansible Bootstrap Playbook
server/setup_docker.yaml runs once after tofu apply completes. It cannot be part of cloud-init because it requires Docker and Swarm to already be running.
Persistent Storage Directories
- name: Create directories on Hetzner volume
ansible.builtin.file:
path: "/mnt/{{ hostvars[inventory_hostname].volume_id }}/{{ item }}"
state: directory
loop:
- crowdsec_db
- papra_data
Directories are created on the Hetzner volume (mounted at /mnt/HC_Volume_*). The volume path uses the volume_id from Terraform outputs, so there's no hardcoded mount path.
Docker Secrets
- name: Create gitlab_password Docker secret
community.docker.docker_secret:
name: gitlab_password
data: "{{ lookup('community.sops.sops', '../vault.yaml') | from_yaml
| json_query('gitlab_password') }}"
state: present
SOPS decrypts vault.yaml locally on your machine, and Ansible pushes the decrypted value as a Docker secret. The plaintext never touches the server's filesystem — it goes directly from your machine into Docker's encrypted secret store.
Two secrets are created:
-
gitlab_password: SwarmCD uses this to authenticate with GitLab when cloning the repo -
age_key: SwarmCD uses this to decrypt SOPS-encrypted secret files in the repo
SwarmCD Deployment
- name: Deploy SwarmCD
community.docker.docker_stack:
name: swarmcd
compose:
- "{{ lookup('file', '../apps/swarmcd/swarmcd.yaml') | from_yaml }}"
state: present
SwarmCD is deployed from the Ansible playbook — it cannot manage its own initial deployment, so this bootstrap step is handled outside the GitOps loop. After this, SwarmCD takes over management of all other stacks from Git.
Remote State
The Terraform state is stored in GitLab's managed HTTP backend:
terraform {
backend "http" {
address = "https://gitlab.com/api/v4/projects/.../terraform/state/default"
lock_address = "..."
unlock_address = "..."
}
}
This means state is shared between team members and persists across machines. GitLab provides free managed Terraform state for any project.
Summary
After tofu apply and ansible-playbook setup_docker.yaml, you have:
- A Hetzner server with hardened OS (SSH keys only, fail2ban, kernel sysctl)
- Docker Swarm initialized with an overlay network
- Tailscale VPN joined (admin access without public SSH)
- Cloudflare DNS records created
- SwarmCD running and watching your Git repository
- A 10GB persistent volume with Docker bind-mount volumes for stateful data
From this point forward, infrastructure changes are managed through Git. New application deployments and configuration updates are applied via git push.
Repository: gitlab.com/sakonn/docker-swarm-gitops
Top comments (0)