Jakub Korečko

Posted on May 30

Part 2: Provision and Harden a Cloud Server in One Command with OpenTofu

What you'll learn:

How the OpenTofu (Terraform-compatible) project is structured
How secrets are managed at the infrastructure layer with SOPS
What Hetzner resources get created and why
How cloud-init hardens the server before any code runs on it
How Ansible bootstraps Docker without hardcoded IPs

One Command, One Server, Fully Ready

tofu apply

After approximately five minutes, the following resources are available:

A hardened Ubuntu 24.04 server on Hetzner
SSH key-only authentication, fail2ban, kernel hardening
Docker installed with Swarm initialized
An overlay network for inter-service communication
Tailscale VPN joined (secure admin access)
Cloudflare DNS records pointing to the new server IP

Then, once:

cd server && ansible-playbook setup_docker.yaml

SwarmCD is deployed and monitoring the Git repository. From this point, all deployments are triggered by git push.

This article covers each file that makes this possible.

Project Structure

The OpenTofu project has two layers: a root module that orchestrates everything, and a server/ child module that handles Hetzner-specific resources.

gitops/
├── main.tf         # Wires modules, Cloudflare DNS, GitLab remote state
├── providers.tf    # SOPS, Cloudflare, Hcloud, Ansible provider versions
├── variables.tf    # Root variables (cloudflare_zone_id)
├── data.tf         # Loads vault.yaml via SOPS
├── vault.yaml      # Encrypted secrets (not vault.yaml.example)
└── server/
    ├── main.tf     # All Hetzner resources
    ├── providers.tf
    ├── variables.tf    # Sensitive inputs from root
    ├── outputs.tf      # Exported values (IP, hostname, etc.)
    ├── hetzner.tfpl    # cloud-init template
    ├── inventory.yaml  # Dynamic Ansible inventory
    ├── ansible.cfg
    └── setup_docker.yaml  # Ansible bootstrap playbook

The root module passes secrets down to the server module. The server module passes connection details back up via outputs, which are then consumed by the dynamic Ansible inventory.

Secrets at the Infrastructure Layer

Before anything can be provisioned, OpenTofu needs API keys: Hetzner, Cloudflare, Tailscale OAuth. These are stored in vault.yaml, encrypted with SOPS using an age key that never touches version control.

vault.yaml.example shows the structure:

cloudflare_api_key: <TOKEN>
hetzner_api_key: <TOKEN>
tailscale_client_secret: <SECRET>
server_admin_password_hash: <BCRYPT_HASH>
gitlab_password: <PERSONAL_ACCESS_TOKEN>

The actual vault.yaml is encrypted. OpenTofu decrypts it at plan/apply time using the SOPS provider:

# data.tf
data "sops_file" "vault" {
  source_file = "vault.yaml"
}

This is the first of two secret layers in this stack. Terraform-layer secrets are used only during provisioning — they create infrastructure and pass credentials into the server. They are never stored in Docker or exposed to running applications.

Why two secret layers? The Terraform layer (vault.yaml) holds API keys for cloud providers. The Docker layer holds runtime credentials for apps (GitLab token for SwarmCD, Grafana password for Alloy). Separating these means a compromised application cannot access cloud infrastructure credentials.

Hetzner Resources

server/main.tf creates every cloud resource the server needs.

Static IP Addresses

resource "hcloud_primary_ip" "primary_ipv4" {
  type          = "ipv4"
  name          = "primary_ipv4"
  location      = local.hcloud_server_location
  auto_delete   = false
  assignee_type = "server"
}

auto_delete = false is critical. Without it, destroying the server also destroys the IP address. When rebuilding a server, losing the IP address requires updating DNS records, waiting for propagation, and may invalidate pending Let's Encrypt certificate challenges. With auto_delete = false, the IP persists independently of the server lifecycle.

Firewall

resource "hcloud_firewall" "primary_firewall" {
  name = "primary_firewall"

  rule { direction = "in"; protocol = "icmp"; source_ips = ["0.0.0.0/0", "::/0"] }
  rule { direction = "in"; protocol = "tcp"; port = "80";  source_ips = ["0.0.0.0/0", "::/0"] }
  rule { direction = "in"; protocol = "tcp"; port = "443"; source_ips = ["0.0.0.0/0", "::/0"] }
  rule { direction = "in"; protocol = "udp"; port = "80";  source_ips = ["0.0.0.0/0", "::/0"] }
  rule { direction = "in"; protocol = "udp"; port = "443"; source_ips = ["0.0.0.0/0", "::/0"] }
}

Only three port ranges are open to the public: ICMP, 80, and 443. Everything else is blocked at the Hetzner firewall level — not just in software, but in the network infrastructure before packets reach the server.

UDP 80/443 is included for HTTP/3 (QUIC), which Traefik can use.

SSH is not open in the firewall at all. Tailscale VPN handles all admin access — the SSH port is only reachable over the Tailscale network, so it is invisible to public internet scanners.

Persistent Volume

resource "hcloud_volume" "primary_volume" {
  name      = "primary_volume"
  size      = 10 # GB
  server_id = hcloud_server.primary_server.id
  automount = true
  format    = "ext4"
}

The 10GB Hetzner volume is mounted separately from the server's root disk. It stores stateful data (database files, application data) that must survive server rebuilds. Hetzner volumes are not deleted when a server is deleted — they persist independently of the server lifecycle and can be attached to a replacement server.

Ansible later creates Docker bind-mount volumes pointing into this volume's mount path.

Server

resource "hcloud_server" "primary_server" {
  name        = "my-server"
  image       = "ubuntu-24.04"
  server_type = "cx23"   # 2 vCPU, 4GB RAM
  location    = local.hcloud_server_location

  user_data = templatefile("${path.module}/hetzner.tfpl", {
    admin_username          = local.admin_username
    admin_password_hash     = var.admin_password_hash
    admin_ssh_keys          = values(local.admin_ssh_keys)
    tailscale_client_id     = local.tailscale_client_id
    tailscale_client_secret = var.tailscale_client_secret
    ssh_port                = local.ssh_port
  })

  ...

  lifecycle {
    ignore_changes = [user_data]
  }
}

The user_data field is where cloud-init lives. The templatefile function renders hetzner.tfpl with values from Terraform variables — including the bcrypt-hashed admin password and Tailscale OAuth credentials. These are injected into the cloud-init template without ever being written to disk in plaintext.

The lifecycle block with ignore_changes = [user_data] is essential for production. Without it, every change to the cloud-init template would cause Terraform to destroy and recreate the server — a full machine wipe. cloud-init runs only on first boot; subsequent changes to the template have no effect on a running server anyway. Ignoring user_data drift allows changes to the cloud-init template without triggering a server rebuild.

cloud-init: OS Hardening on First Boot

cloud-init runs once on first boot, before any remote connection is established. This is where OS-level hardening happens.

The full template is at server/hetzner.tfpl. Here are the key sections:

Package Installation

packages:
  - fail2ban
  - auditd
  - unattended-upgrades
  - git
  - curl
  - ca-certificates
  - build-essential

fail2ban and auditd are installed from packages (not Docker) because they need to monitor the host OS, not a container. unattended-upgrades enables automatic security patch installation.

User Creation

users:
  - name: ${admin_username}
    passwd: ${admin_password_hash}
    lock_passwd: false
    groups: sudo, docker
    shell: /bin/bash
    sudo: ALL=(ALL) NOPASSWD:ALL
    ssh_authorized_keys: ${jsonencode(admin_ssh_keys)}

The admin user is created with:

A bcrypt-hashed password (generated locally, never sent in plaintext)
Both your personal SSH key and an Ansible-specific SSH key
Membership in the docker group (can run Docker without sudo)
NOPASSWD sudo (automation-friendly)

The Ansible key is a separate ed25519 key generated locally (ssh-keygen -t ed25519 -f .ansible_key). It is used only for the bootstrap playbook and can be rotated or removed after provisioning.

SSH Hardening

- path: /etc/ssh/sshd_config.d/99-hardening.conf
  content: |
    PermitRootLogin prohibit-password
    PasswordAuthentication no
    MaxAuthTries 6
    MaxSessions 3
    X11Forwarding no
    AllowAgentForwarding no
    ClientAliveInterval 300
    ClientAliveCountMax 2
    LoginGraceTime 30

PasswordAuthentication no restricts authentication to SSH keys only. Combined with fail2ban banning source IPs after 3 failed attempts within a 1-hour window, brute-force SSH attacks are mitigated at both the authentication and network level.

Kernel Hardening

- path: /etc/sysctl.d/99-hardening.conf
  content: |
    # Prevent IP spoofing
    net.ipv4.conf.all.rp_filter = 1
    net.ipv4.conf.default.rp_filter = 1
    # Ignore ICMP redirects (prevent routing attacks)
    net.ipv4.conf.all.accept_redirects = 0
    net.ipv6.conf.all.accept_redirects = 0
    # SYN flood protection
    net.ipv4.tcp_syncookies = 1
    # IP forwarding (required for Docker networking and Tailscale)
    net.ipv4.ip_forward = 1
    net.ipv6.conf.all.forwarding = 1

Reverse path filtering blocks packets with source IPs that couldn't have arrived on the interface they came in on — a basic IP spoofing defense. SYN cookies protect against SYN flood DoS attacks. IP forwarding is required for both Docker's overlay networking and Tailscale's routing.

fail2ban Configuration

- path: /etc/fail2ban/jail.local
  content: |
    [DEFAULT]
    bantime = 3600
    findtime = 600
    maxretry = 3

    [sshd]
    enabled = true
    mode = aggressive

Three failed attempts within 10 minutes result in a 1-hour IP ban. Aggressive mode also covers additional SSH attack patterns beyond basic password failures.

Docker and Swarm Initialization

runcmd:
  - sysctl --system
  - systemctl enable --now fail2ban
  - systemctl enable --now auditd
  - systemctl restart ssh
  - curl -fsSL https://get.docker.com | sh
  - docker swarm init
  - docker network create -d overlay --attachable swarm_network
  - curl -fsSL https://tailscale.com/install.sh | sh
  - tailscale up --ssh --accept-routes --advertise-exit-node
      --advertise-tags=tag:server
      --client-id=${tailscale_client_id}
      --client-secret=${tailscale_client_secret}
  - reboot

Docker is installed via the official convenience script. Swarm is initialized immediately. The swarm_network overlay network is created — all services in apps/ connect to this network, enabling inter-service communication across nodes as the cluster scales.

Tailscale authenticates via OAuth, requiring no interactive input. The --ssh flag enables Tailscale SSH, which Ansible uses for the bootstrap playbook.

The final reboot applies all sysctl changes and ensures all services start from a consistent state.

Cloudflare DNS via Terraform

Back in the root main.tf, Cloudflare DNS records are created after the server is provisioned:

resource "cloudflare_dns_record" "root" {
  zone_id = var.cloudflare_zone_id
  name    = "@"
  content = module.server.server_ipv4
  type    = "A"
  proxied = true
}

resource "cloudflare_dns_record" "www" {
  zone_id = var.cloudflare_zone_id
  name    = "www"
  content = module.server.server_ipv4
  type    = "A"
  proxied = true
}

proxied = true routes traffic through Cloudflare's CDN, concealing the origin server IP address. Traffic targeting the domain by IP address is handled by Cloudflare's infrastructure, keeping the origin server address private.

Dynamic Ansible Inventory

Instead of hardcoding the server IP in an inventory file, Ansible reads it directly from Terraform state:

# server/inventory.yaml
plugin: cloud.terraform.terraform_provider
binary_path: tofu
project_path: ../

This uses the cloud.terraform.terraform_provider Ansible plugin, which runs tofu output internally and maps the results to Ansible host variables. The server's Tailscale hostname, IP, SSH port, and SSH key path all come from Terraform outputs — no manual synchronization needed.

When a server is rebuilt, tofu apply updates the state and the next Ansible run reads the updated connection details from Terraform outputs.

Ansible Bootstrap Playbook

server/setup_docker.yaml runs once after tofu apply completes. It cannot be part of cloud-init because it requires Docker and Swarm to already be running.

Persistent Storage Directories

- name: Create directories on Hetzner volume
  ansible.builtin.file:
    path: "/mnt/{{ hostvars[inventory_hostname].volume_id }}/{{ item }}"
    state: directory
  loop:
    - crowdsec_db
    - papra_data

Directories are created on the Hetzner volume (mounted at /mnt/HC_Volume_*). The volume path uses the volume_id from Terraform outputs, so there's no hardcoded mount path.

Docker Secrets

- name: Create gitlab_password Docker secret
  community.docker.docker_secret:
    name: gitlab_password
    data: "{{ lookup('community.sops.sops', '../vault.yaml') | from_yaml
              | json_query('gitlab_password') }}"
    state: present

SOPS decrypts vault.yaml locally on your machine, and Ansible pushes the decrypted value as a Docker secret. The plaintext never touches the server's filesystem — it goes directly from your machine into Docker's encrypted secret store.

Two secrets are created:

gitlab_password: SwarmCD uses this to authenticate with GitLab when cloning the repo
age_key: SwarmCD uses this to decrypt SOPS-encrypted secret files in the repo

SwarmCD Deployment

- name: Deploy SwarmCD
  community.docker.docker_stack:
    name: swarmcd
    compose:
      - "{{ lookup('file', '../apps/swarmcd/swarmcd.yaml') | from_yaml }}"
    state: present

SwarmCD is deployed from the Ansible playbook — it cannot manage its own initial deployment, so this bootstrap step is handled outside the GitOps loop. After this, SwarmCD takes over management of all other stacks from Git.

Remote State

The Terraform state is stored in GitLab's managed HTTP backend:

terraform {
  backend "http" {
    address        = "https://gitlab.com/api/v4/projects/.../terraform/state/default"
    lock_address   = "..."
    unlock_address = "..."
  }
}

This means state is shared between team members and persists across machines. GitLab provides free managed Terraform state for any project.

Summary

After tofu apply and ansible-playbook setup_docker.yaml, you have:

A Hetzner server with hardened OS (SSH keys only, fail2ban, kernel sysctl)
Docker Swarm initialized with an overlay network
Tailscale VPN joined (admin access without public SSH)
Cloudflare DNS records created
SwarmCD running and watching your Git repository
A 10GB persistent volume with Docker bind-mount volumes for stateful data

From this point forward, infrastructure changes are managed through Git. New application deployments and configuration updates are applied via git push.

Repository: gitlab.com/sakonn/docker-swarm-gitops

DEV Community