<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jakub Korečko</title>
    <description>The latest articles on DEV Community by Jakub Korečko (@sakonn).</description>
    <link>https://dev.to/sakonn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2968900%2F50ab1be4-6ccb-4585-bd1e-094281d6189b.JPG</url>
      <title>DEV Community: Jakub Korečko</title>
      <link>https://dev.to/sakonn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sakonn"/>
    <language>en</language>
    <item>
      <title>Part 5: Full Observability for Free — Grafana Alloy, Prometheus, and Loki</title>
      <dc:creator>Jakub Korečko</dc:creator>
      <pubDate>Sat, 30 May 2026 08:55:57 +0000</pubDate>
      <link>https://dev.to/sakonn/part-5-full-observability-for-free-grafana-alloy-prometheus-and-loki-5a2l</link>
      <guid>https://dev.to/sakonn/part-5-full-observability-for-free-grafana-alloy-prometheus-and-loki-5a2l</guid>
      <description>&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why Grafana Alloy replaces both Prometheus Agent and Promtail&lt;/li&gt;
&lt;li&gt;What metrics are scraped from each service and why&lt;/li&gt;
&lt;li&gt;How cardinality management keeps you inside Grafana Cloud's free tier&lt;/li&gt;
&lt;li&gt;How Docker log discovery works with the Alloy pipeline&lt;/li&gt;
&lt;li&gt;What to build in Grafana dashboards for this stack&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  One Container for Everything
&lt;/h2&gt;

&lt;p&gt;Before Grafana Alloy, a standard observability setup required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus&lt;/strong&gt; or &lt;strong&gt;Prometheus Agent&lt;/strong&gt; for metrics scraping and remote write&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Promtail&lt;/strong&gt; for log collection and Loki shipping&lt;/li&gt;
&lt;li&gt;Separate configs, separate containers, separate log rotation concerns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Grafana Alloy is the successor to both. One container, one config file (&lt;code&gt;grafana_alloy.alloy&lt;/code&gt;), and it handles metrics and logs in a single pipeline.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why not a full Prometheus stack?&lt;/strong&gt; Self-hosted Prometheus needs storage, retention config, and an alertmanager. For a single server, that's more infrastructure than the apps it monitors. Grafana Cloud's free tier gives you 14-day metric retention, 30-day log retention, and managed alerting — without running any of that yourself.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The free tier limits that matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10,000 active metric series&lt;/li&gt;
&lt;li&gt;50GB logs/month&lt;/li&gt;
&lt;li&gt;14 days metric retention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aggressive metric filtering (covered below) keeps this stack well under those limits.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Alloy Config Structure
&lt;/h2&gt;

&lt;p&gt;Alloy uses a config language called River (similar to HCL). The full config is at &lt;a href="https://gitlab.com/sakonn/docker-swarm-gitops/-/blob/main/apps/monitoring/grafana_alloy.alloy" rel="noopener noreferrer"&gt;apps/monitoring/grafana_alloy.alloy&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The pipeline follows this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prometheus.scrape → prometheus.relabel → prometheus.remote_write
loki.source       → loki.process      → loki.write
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each component is declared with a name and wired together via &lt;code&gt;forward_to&lt;/code&gt; references. This makes the data flow explicit and easy to trace.&lt;/p&gt;




&lt;h2&gt;
  
  
  Metrics: What Gets Scraped
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Traefik
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prometheus.scrape "traefik" {
  targets = [{ __address__ = "traefik:8899" }]
  forward_to      = [prometheus.relabel.traefik.receiver]
  scrape_interval = "30s"
}

prometheus.relabel "traefik" {
  forward_to = [prometheus.remote_write.default.receiver]

  rule {
    source_labels = ["__name__"]
    regex = "(traefik_open_connections|traefik_entrypoint_requests_total|traefik_entrypoint_request_duration_seconds_sum|traefik_entrypoint_request_duration_seconds_bucket|traefik_entrypoint_request_duration_seconds_count|traefik_service_requests_total|traefik_service_request_duration_seconds_bucket|traefik_service_request_duration_seconds_sum|traefik_service_request_duration_seconds_count|traefik_service_requests_bytes_total|traefik_service_responses_bytes_total)"
    action = "keep"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traefik exposes metrics on port &lt;code&gt;:8899&lt;/code&gt; (the &lt;code&gt;metrics&lt;/code&gt; entrypoint defined in static config). Raw Traefik output is ~50+ metric series. The relabeling rule keeps only 11 specific metrics:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;What it tells you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;traefik_open_connections&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Active connections right now&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;traefik_entrypoint_requests_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Total requests per entrypoint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;traefik_entrypoint_request_duration_seconds_*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Latency distribution (histogram)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;traefik_service_requests_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Total requests per backend service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;traefik_service_request_duration_seconds_*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-service latency histogram&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;traefik_service_requests_bytes_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Request bytes per service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;traefik_service_responses_bytes_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Response bytes per service&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is enough to build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request rate dashboards (requests/sec per service)&lt;/li&gt;
&lt;li&gt;Latency percentile panels (P50, P95, P99)&lt;/li&gt;
&lt;li&gt;Error rate panels (compare 2xx vs 4xx/5xx from Traefik access logs)&lt;/li&gt;
&lt;li&gt;Active connection gauges&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Host Metrics (node-exporter)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prometheus.scrape "node_exporter" {
  targets = [{ __address__ = "node-exporter:9100" }]
  forward_to      = [prometheus.relabel.node_exporter.receiver]
  scrape_interval = "30s"
}

prometheus.relabel "node_exporter" {
  forward_to = [prometheus.remote_write.default.receiver]

  rule {
    source_labels = ["__name__"]
    regex = "(node_cpu_seconds_total|node_memory_(MemTotal|MemFree|MemAvailable|Buffers|Cached|SReclaimable|SwapTotal|SwapFree)_bytes|node_filesystem_(size|avail)_bytes|node_network_(receive|transmit)_bytes_total|node_load(1|5|15)|node_time_seconds|node_boot_time_seconds|node_disk_(read_bytes_total|written_bytes_total|io_time_seconds_total))"
    action = "keep"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;node-exporter by default exposes hundreds of metrics covering every kernel subsystem. The regex keep-rule filters to the essentials:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Metrics kept&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;node_cpu_seconds_total&lt;/code&gt; (all modes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Total, free, available, buffers, cached, swap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Filesystem&lt;/td&gt;
&lt;td&gt;Size and available bytes per mount&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;Receive/transmit bytes per interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Load&lt;/td&gt;
&lt;td&gt;1m, 5m, 15m load averages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disk I/O&lt;/td&gt;
&lt;td&gt;Read bytes, write bytes, I/O time per device&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System&lt;/td&gt;
&lt;td&gt;Uptime (boot time), clock&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The node-exporter compose file further filters which collectors run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# apps/monitoring/node_exporter.yaml&lt;/span&gt;
&lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|run|var/lib/docker).*'&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--collector.filesystem.fs-types-exclude=^(sysfs|procfs|autofs|cgroup|devtmpfs|devpts|tmpfs|nsfs|overlay|securityfs|tracefs)$'&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--collector.netdev.device-exclude=^(veth|docker|br-).*'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker-internal network interfaces (&lt;code&gt;veth&lt;/code&gt;, &lt;code&gt;docker0&lt;/code&gt;, bridge interfaces) are excluded. Virtual filesystems (&lt;code&gt;tmpfs&lt;/code&gt;, &lt;code&gt;overlay&lt;/code&gt;, etc.) are excluded. This prevents cardinality explosion from Docker's ephemeral per-container veth interfaces — each container creates a new veth, and without filtering you'd accumulate thousands of metric series over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Container Metrics (cAdvisor)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prometheus.relabel "cadvisor" {
  forward_to = [prometheus.remote_write.default.receiver]

  // Drop root/aggregate metrics (no container label)
  rule {
    source_labels = ["name"]
    regex = "^$"
    action = "drop"
  }

  // Extract container name from Docker Swarm task format
  rule {
    source_labels = ["name"]
    regex = "(.+)\\.(\\d+)\\.[a-zA-Z0-9]+$"
    target_label  = "container_name"
    replacement   = "$1.$2"
  }

  // Add service name label
  rule {
    source_labels = ["container_label_com_docker_swarm_service_name"]
    target_label  = "service_name"
  }

  // Keep only essential container metrics
  rule {
    source_labels = ["__name__"]
    regex = "(container_memory_usage_bytes|container_last_seen|container_cpu_user_seconds_total|container_network_(receive|transmit)_bytes_total|container_memory_cache|container_fs_(reads|writes)_bytes_total|container_cpu_usage_seconds_total)"
    action = "keep"
  }

  // Drop unused labels
  rule {
    regex  = "(__name__|container_name|service_name|job|instance)"
    action = "labelkeep"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;cAdvisor exports metrics for every container on the system, including Docker's internal containers. The first rule drops "aggregate" metrics that have no &lt;code&gt;name&lt;/code&gt; label (cAdvisor's root-level stats). The final &lt;code&gt;labelkeep&lt;/code&gt; rule drops all the verbose Docker label metadata that cAdvisor attaches — keeping only the labels that matter for querying.&lt;/p&gt;

&lt;p&gt;The result: per-container CPU, memory, network, and disk I/O, labeled with service name and container name.&lt;/p&gt;




&lt;h2&gt;
  
  
  Logs: Docker and System
&lt;/h2&gt;

&lt;h3&gt;
  
  
  System Logs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;local.file_match "system" {
  path_targets = [{
    __path__ = "/var/log/**/*log",
    job       = "varlogs",
  }]
}

loki.process "system" {
  forward_to = [loki.write.default.receiver]

  stage.drop {
    older_than = "1h0m0s"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alloy reads all &lt;code&gt;*log&lt;/code&gt; files under &lt;code&gt;/var/log/&lt;/code&gt; (mounted from the host as read-only). The &lt;code&gt;stage.drop&lt;/code&gt; rule discards log lines older than 1 hour — this prevents Alloy from re-shipping old logs after a restart, which would generate duplicate log entries in Grafana Cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Container Logs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;discovery.docker "docker_swarm" {
  host             = "unix:///var/run/docker.sock"
  refresh_interval = "5s"
}

discovery.relabel "docker_swarm" {
  targets = []

  rule {
    source_labels = ["__meta_docker_container_name"]
    regex         = "(.+)\\.(\\d+)\\.[a-zA-Z0-9]+$"
    target_label  = "container_name"
  }

  rule {
    source_labels = ["__meta_docker_service_name"]
    target_label  = "service_name"
  }

  rule {
    source_labels = ["__meta_docker_service_label_com_docker_stack_namespace"]
    target_label  = "stack_name"
  }

  rule {
    source_labels = ["__meta_docker_container_id"]
    target_label  = "__path__"
    replacement   = "/var/lib/docker/containers/$1/$1-json.log"
  }
}

loki.source.docker "docker_swarm" {
  host          = "unix:///var/run/docker.sock"
  targets       = discovery.docker.docker_swarm.targets
  forward_to    = [loki.process.docker_swarm.receiver]
  relabel_rules = discovery.relabel.docker_swarm.rules
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker Swarm container names follow the pattern &lt;code&gt;stackname_servicename.replicanumber.taskid&lt;/code&gt;. The regex &lt;code&gt;(.+)\.(\d+)\.[a-zA-Z0-9]+$&lt;/code&gt; extracts a stable name (without the random task ID) so log streams from the same service don't fragment across container restarts.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;stack_name&lt;/code&gt;, &lt;code&gt;service_name&lt;/code&gt;, and &lt;code&gt;container_name&lt;/code&gt; labels are added to every log line, making Loki queries like &lt;code&gt;{stack_name="traefik"}&lt;/code&gt; or &lt;code&gt;{service_name="traefik_traefik"}&lt;/code&gt; work correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  cAdvisor and node-exporter Compose Files
&lt;/h2&gt;

&lt;p&gt;Both exporters have carefully tuned compose configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  cAdvisor
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# apps/monitoring/cadvisor.yaml (key sections)&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;cadvisor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/google/cadvisor:0.56.2&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--docker_only=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--housekeeping_interval=30s&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--max_housekeeping_interval=35s&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--global_housekeeping_interval=10m&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--storage_duration=10m&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;128M&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;64M&lt;/span&gt;
      &lt;span class="na"&gt;restart_policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;on-failure&lt;/span&gt;
        &lt;span class="na"&gt;delay&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
        &lt;span class="na"&gt;max_attempts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--docker_only=true&lt;/code&gt; restricts cAdvisor to Docker containers only (no system processes). &lt;code&gt;--housekeeping_interval=30s&lt;/code&gt; matches the Alloy scrape interval — collecting at a higher frequency than the scrape interval provides no benefit. Memory is capped at 128M because cAdvisor has a known tendency to grow unbounded on busy systems.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;restart_policy: condition: on-failure, max_attempts: 3&lt;/code&gt; rather than &lt;code&gt;always&lt;/code&gt; — repeated failures indicate a persistent issue that warrants investigation rather than indefinite restart attempts.&lt;/p&gt;

&lt;h3&gt;
  
  
  node-exporter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;node-exporter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prom/node-exporter:v1.10.2&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/proc:/host/proc:ro&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/sys:/host/sys:ro&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/:/rootfs:ro&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--path.procfs=/host/proc'&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--path.sysfs=/host/sys'&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--path.rootfs=/rootfs'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;node-exporter needs read-only access to &lt;code&gt;/proc&lt;/code&gt;, &lt;code&gt;/sys&lt;/code&gt;, and the filesystem root to collect host metrics. It runs in the host PID namespace to see all processes. The &lt;code&gt;:ro&lt;/code&gt; mounts ensure it can only read, never write.&lt;/p&gt;




&lt;h2&gt;
  
  
  Remote Write to Grafana Cloud
&lt;/h2&gt;

&lt;p&gt;Both Prometheus and Loki ship to Grafana Cloud via basic auth:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prometheus.remote_write "default" {
  endpoint {
    url = "&amp;lt;YOUR_GRAFANA_CLOUD_PROMETHEUS_URL&amp;gt;/api/prom/push"

    basic_auth {
      username      = "&amp;lt;YOUR_GRAFANA_CLOUD_METRICS_USERNAME&amp;gt;"
      password_file = "/run/secrets/grafana_cloud_passwd"
    }
  }
}

loki.write "default" {
  endpoint {
    url = "&amp;lt;YOUR_GRAFANA_CLOUD_LOKI_URL&amp;gt;/loki/api/v1/push"

    basic_auth {
      username      = "&amp;lt;YOUR_GRAFANA_CLOUD_LOGS_USERNAME&amp;gt;"
      password_file = "/run/secrets/grafana_cloud_passwd"
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;grafana_cloud_passwd&lt;/code&gt; is an API key from Grafana Cloud. It's stored encrypted in the Git repo (SOPS) and decrypted by SwarmCD at deploy time into a Docker secret. The same password is shared between the Prometheus and Loki write endpoints (Grafana Cloud uses the same credential for both).&lt;/p&gt;




&lt;h2&gt;
  
  
  Suggested Grafana Dashboard Panels
&lt;/h2&gt;

&lt;p&gt;Once data is flowing to Grafana Cloud, here are the most useful panels to build:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server Overview:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU usage (% of total, per-core breakdown)&lt;/li&gt;
&lt;li&gt;Memory usage (total / available / used)&lt;/li&gt;
&lt;li&gt;Disk usage per mount point&lt;/li&gt;
&lt;li&gt;Network traffic (bytes/sec in/out)&lt;/li&gt;
&lt;li&gt;System uptime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Traefik:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requests/sec per service (rate of &lt;code&gt;traefik_service_requests_total&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Error rate (4xx + 5xx as % of total)&lt;/li&gt;
&lt;li&gt;P95 latency per service (histogram from &lt;code&gt;traefik_service_request_duration_seconds_bucket&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Active open connections&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Container Resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory usage per service (container_memory_usage_bytes by service_name)&lt;/li&gt;
&lt;li&gt;CPU usage per service (rate of container_cpu_usage_seconds_total)&lt;/li&gt;
&lt;li&gt;Container restarts (changes in container_last_seen gaps)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Logs (Grafana Explore or dashboard panels):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;{stack_name="traefik"}&lt;/code&gt; — Traefik access logs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;{service_name="swarmcd_swarmcd"}&lt;/code&gt; — SwarmCD deploy events&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;{job="varlogs"}&lt;/code&gt; — System logs (SSH, fail2ban bans, etc.)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The observability stack in this setup achieves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metrics from 3 sources&lt;/strong&gt; (Traefik, node-exporter, cAdvisor) shipped to Grafana Cloud Prometheus&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logs from 2 sources&lt;/strong&gt; (Docker containers, system) shipped to Grafana Cloud Loki&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One container&lt;/strong&gt; (Grafana Alloy) handling all of the above&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero cost&lt;/strong&gt; — Grafana Cloud free tier is sufficient with aggressive metric filtering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitOps-deployed&lt;/strong&gt; — changes to &lt;code&gt;grafana_alloy.alloy&lt;/code&gt; trigger automatic redeployment via SwarmCD&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The metric filtering is the key insight: Prometheus exporters expose far more data than you need. Being selective about what you ship keeps you within free tier limits and makes dashboards faster to query.&lt;/p&gt;




&lt;p&gt;All source code: &lt;em&gt;&lt;a href="https://gitlab.com/sakonn/docker-swarm-gitops" rel="noopener noreferrer"&gt;gitlab.com/sakonn/docker-swarm-gitops&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Part 4: Protecting Public Traffic — Traefik, CrowdSec WAF, and Tailscale VPN</title>
      <dc:creator>Jakub Korečko</dc:creator>
      <pubDate>Sat, 30 May 2026 08:51:00 +0000</pubDate>
      <link>https://dev.to/sakonn/part-4-protecting-public-traffic-traefik-crowdsec-waf-and-tailscale-vpn-8ln</link>
      <guid>https://dev.to/sakonn/part-4-protecting-public-traffic-traefik-crowdsec-waf-and-tailscale-vpn-8ln</guid>
      <description>&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The complete traffic flow from user browser to container&lt;/li&gt;
&lt;li&gt;How Traefik handles TLS termination, routing, and zero-downtime updates&lt;/li&gt;
&lt;li&gt;Why services opt-in to exposure via Docker labels&lt;/li&gt;
&lt;li&gt;How CrowdSec adds a WAF and IP reputation layer to every request&lt;/li&gt;
&lt;li&gt;How Tailscale VPN secures admin access without opening SSH to the internet&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Threat Model
&lt;/h2&gt;

&lt;p&gt;A public internet server is subject to continuous automated scanning and attack attempts. Within minutes of a new IP address becoming reachable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated scanners probe every common port&lt;/li&gt;
&lt;li&gt;Bots attempt SSH brute-force (hundreds of attempts per hour)&lt;/li&gt;
&lt;li&gt;Crawlers look for exposed admin interfaces (wp-admin, /actuator, .env, etc.)&lt;/li&gt;
&lt;li&gt;Malicious requests probe for SQL injection, XSS, and other OWASP vulnerabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The security architecture in this stack addresses each of these without requiring a separate security engineer. Here's the full picture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Internet
   │
   ▼
Cloudflare (DNS proxy)
   │  Hides server IP, DDoS mitigation, HTTPS at edge
   │
   ▼
Hetzner Firewall
   │  Allows: 80, 443 only. Blocks everything else at network level.
   │
   ▼
fail2ban (host)
   │  Bans IPs after 3 failed SSH attempts (1h ban)
   │
   ▼
Traefik (port 80 / 443)
   │  TLS termination, HTTP→HTTPS redirect, real IP forwarding
   │
   ▼
CrowdSec bouncer (Traefik plugin)
   │  IP reputation check + AppSec WAF rules (SQLi, XSS, etc.)
   │  Block decision: 60s default ban
   │
   ▼
Application (bento, etc.)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Admin traffic takes a different path entirely — through Tailscale VPN, bypassing the public internet stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  Traefik: The Entry Point for All Traffic
&lt;/h2&gt;

&lt;p&gt;Traefik is the reverse proxy that sits in front of all applications. It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS certificate acquisition and renewal (Let's Encrypt, automated)&lt;/li&gt;
&lt;li&gt;HTTP to HTTPS redirection&lt;/li&gt;
&lt;li&gt;Routing requests to the correct backend service&lt;/li&gt;
&lt;li&gt;Running the CrowdSec bouncer plugin&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Static Configuration
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;apps/traefik/traefik_static_conf.yaml&lt;/code&gt; defines the entrypoints, providers, and plugins that are loaded once at startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entrypoints:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;entryPoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;web&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;:80&lt;/span&gt;
    &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;redirections&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;entryPoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;websecure&lt;/span&gt;
          &lt;span class="na"&gt;scheme&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https&lt;/span&gt;
    &lt;span class="na"&gt;forwardedHeaders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;trustedIPs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nl"&gt;&amp;amp;trustedIps&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;103.21.244.0/22&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;104.16.0.0/13&lt;/span&gt;
        &lt;span class="c1"&gt;# ... all Cloudflare IP ranges&lt;/span&gt;

  &lt;span class="na"&gt;websecure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;:443&lt;/span&gt;
    &lt;span class="na"&gt;forwardedHeaders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;trustedIPs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*trustedIps&lt;/span&gt;
    &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;respondingTimeouts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;readTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;600s&lt;/span&gt;
        &lt;span class="na"&gt;writeTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;600s&lt;/span&gt;

  &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;:8899&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Port 80 (&lt;code&gt;web&lt;/code&gt;) redirects all traffic to 443 and trusts Cloudflare's IP ranges for the &lt;code&gt;X-Forwarded-For&lt;/code&gt; header. Without this &lt;code&gt;trustedIPs&lt;/code&gt; configuration, Traefik would see Cloudflare's IP as the client IP — meaning CrowdSec would evaluate Cloudflare's infrastructure, not the actual user. By trusting Cloudflare's ranges, Traefik unwraps the &lt;code&gt;X-Forwarded-For&lt;/code&gt; header to get the real client IP.&lt;/p&gt;

&lt;p&gt;Port 443 (&lt;code&gt;websecure&lt;/code&gt;) has 600-second timeouts to support long-running operations like PDF generation in the Bento app.&lt;/p&gt;

&lt;p&gt;Port 8899 (&lt;code&gt;metrics&lt;/code&gt;) exposes Prometheus metrics for Grafana Alloy to scrape. This port is not in the Hetzner firewall allow-list and is not accessible from the public internet — Alloy scrapes it from inside the overlay network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Certificate resolvers:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;certificatesResolvers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;staging&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;acme&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;YOUR_EMAIL&amp;gt;&lt;/span&gt;
      &lt;span class="na"&gt;caServer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://acme-staging-v02.api.letsencrypt.org/directory"&lt;/span&gt;
      &lt;span class="na"&gt;httpChallenge&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;entryPoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web&lt;/span&gt;

  &lt;span class="na"&gt;production&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;acme&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;YOUR_EMAIL&amp;gt;&lt;/span&gt;
      &lt;span class="na"&gt;caServer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://acme-v02.api.letsencrypt.org/directory"&lt;/span&gt;
      &lt;span class="na"&gt;httpChallenge&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;entryPoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;web&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two resolvers exist: &lt;code&gt;staging&lt;/code&gt; (for testing — will not exceed Let's Encrypt rate limits) and &lt;code&gt;production&lt;/code&gt; (real certificates). Services specify which resolver to use in their labels. During initial setup, use &lt;code&gt;staging&lt;/code&gt; to validate the configuration, then switch to &lt;code&gt;production&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Providers:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;swarm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;exposedByDefault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;docker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;exposedByDefault&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;directory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/traefik&lt;/span&gt;
    &lt;span class="na"&gt;watch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;exposedByDefault: false&lt;/code&gt; means Traefik ignores all containers unless they have &lt;code&gt;traefik.enable=true&lt;/code&gt; in their labels. A service added to Swarm without this label will not be exposed publicly. Every exposure is explicit and intentional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CrowdSec plugin:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;experimental&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;bouncer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;moduleName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin"&lt;/span&gt;
      &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1.5.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The plugin is declared here in static config. Its configuration (which requests it applies to, which CrowdSec instance it talks to) is in the dynamic config.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamic Configuration
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;apps/traefik/traefik_dynamic_conf.yaml&lt;/code&gt; defines middlewares and routes that Traefik watches for changes without restarting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;middlewares&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;basicAuth&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;USERNAME&amp;gt;:&amp;lt;BCRYPT_HASH&amp;gt;&lt;/span&gt;

    &lt;span class="na"&gt;crowdsec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;plugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;bouncer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;crowdsecMode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;live&lt;/span&gt;
          &lt;span class="na"&gt;crowdsecAppsecEnabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;crowdsecAppsecHost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crowdsec_crowdsec:7422&lt;/span&gt;
          &lt;span class="na"&gt;crowdsecAppsecFailureBlock&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;crowdsecLapiKeyFile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/run/secrets/crowdsec_api_key"&lt;/span&gt;
          &lt;span class="na"&gt;crowdsecLapiHost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crowdsec_crowdsec:8080&lt;/span&gt;
          &lt;span class="na"&gt;forwardedHeadersTrustedIPs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;10.0.0.0/8&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;172.16.0.0/12&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;192.168.0.0/16&lt;/span&gt;
          &lt;span class="na"&gt;clientTrustedIPs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;10.0.0.0/8&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;172.16.0.0/12&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;192.168.0.0/16&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;crowdsec&lt;/code&gt; middleware is defined once here and referenced by any service that wants WAF protection. The &lt;code&gt;auth&lt;/code&gt; middleware is used for any internal service (like the Traefik dashboard) that should be behind basic auth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-Downtime Updates
&lt;/h3&gt;

&lt;p&gt;In the Traefik compose file, the update strategy is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;update_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;order&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;start-first&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;start-first&lt;/code&gt; means Docker Swarm starts the new Traefik container before stopping the old one. During the overlap window, the new container is running and healthy before the old one receives the stop signal. This means Traefik updates happen with no dropped requests.&lt;/p&gt;

&lt;p&gt;Combined with SwarmCD's immutable config versioning (Part 3), every configuration change to Traefik is zero-downtime.&lt;/p&gt;




&lt;h2&gt;
  
  
  Service Exposure via Docker Labels
&lt;/h2&gt;

&lt;p&gt;Here's how a service opts into public access. From &lt;code&gt;apps/bento/bento.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;bento&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/alam00000/bentopdf-simple:v2.7.0&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;swarm_network&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.enable=true"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.bento-http.rule=Host(`pdf.yourdomain.com`)"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.bento-http.entrypoints=web"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.bento-http.middlewares=redirect-to-https@file"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.bento.rule=Host(`pdf.yourdomain.com`)"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.bento.entrypoints=websecure"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.bento.tls.certresolver=production"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.bento.middlewares=crowdsec@file"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.services.bento.loadbalancer.server.port=8080"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Breaking this down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;traefik.enable=true&lt;/code&gt; — opts in to Traefik management&lt;/li&gt;
&lt;li&gt;Two routers: one for HTTP (redirect to HTTPS), one for HTTPS&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tls.certresolver=production&lt;/code&gt; — request a production Let's Encrypt certificate for this hostname&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;middlewares=crowdsec@file&lt;/code&gt; — all requests to this service pass through the CrowdSec bouncer&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server.port=8080&lt;/code&gt; — Traefik forwards to this container port&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice that &lt;strong&gt;labels go under &lt;code&gt;deploy:&lt;/code&gt; not under &lt;code&gt;services:&lt;/code&gt;&lt;/strong&gt; in Swarm mode. This is a Docker Swarm requirement — service labels (the ones Traefik watches) must be deployment labels, not container labels.&lt;/p&gt;




&lt;h2&gt;
  
  
  CrowdSec: WAF and IP Reputation
&lt;/h2&gt;

&lt;p&gt;CrowdSec adds two protection layers to every request passing through Traefik:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LAPI (Local API) — IP Reputation:&lt;/strong&gt;&lt;br&gt;
CrowdSec maintains a local database of banned IP addresses. This database is populated from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CrowdSec community threat intelligence feed (millions of crowdsourced malicious IPs)&lt;/li&gt;
&lt;li&gt;Local detections (if you run CrowdSec agents on the host)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a request arrives, the bouncer plugin checks the source IP against the LAPI. If it's in the ban list, the request is blocked immediately with a 403.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AppSec — WAF Rules:&lt;/strong&gt;&lt;br&gt;
CrowdSec's AppSec component applies request inspection rules that block common attack patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL injection (e.g., &lt;code&gt;' OR 1=1 --&lt;/code&gt; in query parameters)&lt;/li&gt;
&lt;li&gt;XSS (e.g., &lt;code&gt;&amp;lt;script&amp;gt;alert(1)&amp;lt;/script&amp;gt;&lt;/code&gt; in form fields)&lt;/li&gt;
&lt;li&gt;Path traversal (e.g., &lt;code&gt;../../../etc/passwd&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Known CVE exploit patterns for common web frameworks
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;crowdsec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;plugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;bouncer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;crowdsecAppsecEnabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;crowdsecAppsecHost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crowdsec_crowdsec:7422&lt;/span&gt;
      &lt;span class="na"&gt;crowdsecAppsecFailureBlock&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# Block if AppSec is unreachable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;code&gt;crowdsecAppsecFailureBlock: true&lt;/code&gt; means that if the AppSec engine is unavailable (container restart, etc.), requests are blocked rather than allowed through. This is a fail-closed posture — prefer availability loss over security bypass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Internal traffic bypass:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;clientTrustedIPs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;10.0.0.0/8&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;172.16.0.0/12&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;192.168.0.0/16&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RFC1918 private address ranges (Docker's overlay network, Tailscale) bypass CrowdSec checks. Inter-service communication inside the cluster doesn't need to be WAF-inspected — it never crosses the public internet boundary.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tailscale: Secure Admin Access
&lt;/h2&gt;

&lt;p&gt;SSH is not exposed in the Hetzner firewall. All administrative access is routed through Tailscale VPN.&lt;/p&gt;

&lt;p&gt;During cloud-init (Part 2), the server joins your Tailscale network:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tailscale up &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ssh&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--accept-routes&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--advertise-exit-node&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--advertise-tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;tag:server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--client-id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;TAILSCALE_CLIENT_ID&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--client-secret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;TAILSCALE_CLIENT_SECRET&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--ssh&lt;/code&gt; enables Tailscale SSH, allowing SSH access to the server using Tailscale credentials. The Tailscale hostname (&lt;code&gt;my-server.your-tailnet.ts.net&lt;/code&gt;) is stable even if the server IP changes.&lt;/p&gt;

&lt;p&gt;From any device enrolled in the Tailscale network:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh admin@my-server.your-tailnet.ts.net
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This eliminates the need for public SSH key management, firewall IP exceptions, or a self-managed VPN gateway. Tailscale handles NAT traversal automatically, establishing a peer-to-peer encrypted connection regardless of network topology.&lt;/p&gt;




&lt;h2&gt;
  
  
  SSH Hardening Recap
&lt;/h2&gt;

&lt;p&gt;Even though Tailscale VPN is the primary admin path, SSH is still hardened as a defense-in-depth measure:&lt;/p&gt;

&lt;p&gt;From &lt;code&gt;server/hetzner.tfpl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ssh"&gt;&lt;code&gt;&lt;span class="k"&gt;PasswordAuthentication&lt;/span&gt; &lt;span class="no"&gt;no&lt;/span&gt;    → SSH keys only, passwords rejected
&lt;span class="k"&gt;MaxAuthTries&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt;               → Disconnect after &lt;span class="m"&gt;6&lt;/span&gt; failed attempts
&lt;span class="k"&gt;MaxSessions&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;                → Limit concurrent sessions
&lt;span class="k"&gt;X11Forwarding&lt;/span&gt; &lt;span class="no"&gt;no&lt;/span&gt;             → Disable graphical forwarding
&lt;span class="k"&gt;ClientAliveInterval&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;      → Disconnect idle sessions after &lt;span class="m"&gt;5&lt;/span&gt; min
&lt;span class="k"&gt;LoginGraceTime&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;            → Disconnect if auth not completed in &lt;span class="m"&gt;30&lt;/span&gt;s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And fail2ban:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;bantime&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;3600               → 1-hour bans&lt;/span&gt;
&lt;span class="py"&gt;findtime&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;600               → 10-minute window&lt;/span&gt;
&lt;span class="py"&gt;maxretry&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;3                 → 3 failures triggers ban&lt;/span&gt;
&lt;span class="py"&gt;mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;aggressive            → Also catches scan patterns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source IPs that fail authentication 3 times within a 10-minute window are banned for 1 hour. Combined with key-only authentication and SSH not being exposed to the public internet, the SSH attack surface is substantially reduced.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary: Security in Layers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it protects against&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare DNS proxy&lt;/td&gt;
&lt;td&gt;Hides server IP; DDoS mitigation at edge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hetzner firewall&lt;/td&gt;
&lt;td&gt;Blocks all non-HTTP/HTTPS traffic at network level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fail2ban&lt;/td&gt;
&lt;td&gt;SSH brute-force banning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSH key-only auth&lt;/td&gt;
&lt;td&gt;Password-based SSH attacks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tailscale VPN&lt;/td&gt;
&lt;td&gt;Admin access without exposing SSH to internet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Traefik &lt;code&gt;exposedByDefault: false&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Accidental service exposure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrowdSec LAPI&lt;/td&gt;
&lt;td&gt;Known malicious IP blocking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrowdSec AppSec&lt;/td&gt;
&lt;td&gt;Application-layer attack filtering (SQLi, XSS, CVEs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker secrets&lt;/td&gt;
&lt;td&gt;Credentials as files, not environment variables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SOPS encryption&lt;/td&gt;
&lt;td&gt;No plaintext secrets in Git&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each layer is independent — a failure or bypass of any one layer still leaves others intact. This is defense in depth.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Repository: &lt;a href="https://gitlab.com/sakonn/docker-swarm-gitops" rel="noopener noreferrer"&gt;gitlab.com/sakonn/docker-swarm-gitops&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Part 3: Push to Git, Get a Deployment — SwarmCD + SOPS on Docker Swarm</title>
      <dc:creator>Jakub Korečko</dc:creator>
      <pubDate>Sat, 30 May 2026 08:45:52 +0000</pubDate>
      <link>https://dev.to/sakonn/part-3-push-to-git-get-a-deployment-swarmcd-sops-on-docker-swarm-501p</link>
      <guid>https://dev.to/sakonn/part-3-push-to-git-get-a-deployment-swarmcd-sops-on-docker-swarm-501p</guid>
      <description>&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What SwarmCD is and how it compares to ArgoCD&lt;/li&gt;
&lt;li&gt;How the SwarmCD configuration file works&lt;/li&gt;
&lt;li&gt;Why Docker Swarm configs and secrets are immutable — and what that means for updates&lt;/li&gt;
&lt;li&gt;How SOPS + age encryption keeps secrets in Git without exposing them&lt;/li&gt;
&lt;li&gt;The complete GitOps loop from &lt;code&gt;git push&lt;/code&gt; to running container&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Loop
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push
   ↓
SwarmCD detects change (≤45 seconds)
   ↓
Clone repo → parse config → decrypt secrets (SOPS + age)
   ↓
docker stack deploy
   ↓
Traefik picks up new service (if labels changed)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No CI/CD pipeline. No webhooks. No manual SSH. Deployment requires only a Git push.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is SwarmCD?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/m-adawi/swarm-cd" rel="noopener noreferrer"&gt;SwarmCD&lt;/a&gt; is a lightweight GitOps controller for Docker Swarm. It does one thing: watch a Git repository and deploy Docker Compose stacks when files change.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why not ArgoCD?&lt;/strong&gt; ArgoCD is excellent for Kubernetes. It has no Swarm support. Its Kubernetes dependency alone would require more RAM than the entire rest of this stack combined.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not Portainer?&lt;/strong&gt; Portainer has a GitOps feature, but it requires a paid Business license for the functionality we need, and it runs as a much heavier process.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;SwarmCD has no web UI complexity, no operator pattern, no CRDs. It reads a config YAML, polls Git, and runs &lt;code&gt;docker stack deploy&lt;/code&gt;. This simplicity is intentional at this scale.&lt;/p&gt;

&lt;p&gt;The SwarmCD service is deployed by Ansible during bootstrap (see Part 2) as the only service that isn't self-managed via GitOps. Everything else is managed by SwarmCD from Git.&lt;/p&gt;




&lt;h2&gt;
  
  
  SwarmCD Configuration
&lt;/h2&gt;

&lt;p&gt;The full configuration lives at &lt;a href="https://gitlab.com/sakonn/docker-swarm-gitops/-/blob/main/apps/swarmcd/config.yaml" rel="noopener noreferrer"&gt;apps/swarmcd/config.yaml&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;update_interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;45&lt;/span&gt;

&lt;span class="na"&gt;repos_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;repos/&lt;/span&gt;

&lt;span class="na"&gt;sops_secrets_discovery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;auto_rotate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="na"&gt;repos&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;my-infra&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://gitlab.com/&amp;lt;YOUR_GITLAB_USERNAME&amp;gt;/&amp;lt;YOUR_REPO_NAME&amp;gt;.git"&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;does_not_matter&lt;/span&gt;
    &lt;span class="na"&gt;password_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/secrets/gitlab_token&lt;/span&gt;

&lt;span class="na"&gt;stacks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;traefik&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-infra&lt;/span&gt;
    &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;compose_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/apps/traefik/traefik.yaml&lt;/span&gt;
    &lt;span class="na"&gt;sops_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/apps/traefik/crowdsec_api_key&lt;/span&gt;

  &lt;span class="na"&gt;bento&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-infra&lt;/span&gt;
    &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;compose_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/apps/bento/bento.yaml&lt;/span&gt;

  &lt;span class="na"&gt;alloy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-infra&lt;/span&gt;
    &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;compose_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/apps/monitoring/alloy.yaml&lt;/span&gt;
    &lt;span class="na"&gt;sops_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/apps/monitoring/grafana_cloud_passwd&lt;/span&gt;

  &lt;span class="na"&gt;cadvisor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-infra&lt;/span&gt;
    &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;compose_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/apps/monitoring/cadvisor.yaml&lt;/span&gt;

  &lt;span class="na"&gt;node_exporter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-infra&lt;/span&gt;
    &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
    &lt;span class="na"&gt;compose_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/apps/monitoring/node_exporter.yaml&lt;/span&gt;

&lt;span class="na"&gt;address&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0:8080&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key settings explained:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;update_interval: 45&lt;/code&gt;&lt;/strong&gt; — SwarmCD polls Git every 45 seconds. This is a tradeoff: shorter intervals mean faster deployments but more Git API requests. 45 seconds is sufficient for most infrastructure change workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;sops_secrets_discovery: true&lt;/code&gt;&lt;/strong&gt; — SwarmCD automatically finds and decrypts SOPS-encrypted files listed in &lt;code&gt;sops_files&lt;/code&gt;. The decrypted content is passed to Docker as secrets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;auto_rotate: true&lt;/code&gt;&lt;/strong&gt; — This is the most significant setting. See the next section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;password_file: /secrets/gitlab_token&lt;/code&gt;&lt;/strong&gt; — The GitLab personal access token is mounted as a Docker secret (created by Ansible during bootstrap). SwarmCD reads it from the filesystem, never from environment variables.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Immutable Config Problem (and the Solution)
&lt;/h2&gt;

&lt;p&gt;Docker Swarm has a strict rule: &lt;strong&gt;configs and secrets are immutable once created&lt;/strong&gt;. You cannot update a config in place. If you change a Traefik config file, you must:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new config object with a different name&lt;/li&gt;
&lt;li&gt;Update the service to reference the new config&lt;/li&gt;
&lt;li&gt;(Optionally) delete the old config&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is different from Kubernetes ConfigMaps, which can be updated in place. Docker's immutability is a feature — it means config history is preserved and rollback is possible — but it requires tooling support.&lt;/p&gt;

&lt;p&gt;SwarmCD's &lt;code&gt;auto_rotate: true&lt;/code&gt; handles this automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You push a change to &lt;code&gt;apps/traefik/traefik_static_conf.yaml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;SwarmCD detects the change&lt;/li&gt;
&lt;li&gt;It creates a new Docker config: &lt;code&gt;traefik_static_conf_&amp;lt;hash&amp;gt;&lt;/code&gt; (where hash is derived from the content)&lt;/li&gt;
&lt;li&gt;It updates the Traefik service to mount the new config instead of the old one&lt;/li&gt;
&lt;li&gt;Docker Swarm rolls out the update (Traefik restarts with the new config)&lt;/li&gt;
&lt;li&gt;The old config remains until you clean it up (SwarmCD can do this automatically)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without &lt;code&gt;auto_rotate&lt;/code&gt;, you'd need to manually rename config objects on every change. With it, the whole process is transparent.&lt;/p&gt;




&lt;h2&gt;
  
  
  SOPS + age: Secrets in Git
&lt;/h2&gt;

&lt;p&gt;Storing secrets in Git is a security risk without encryption. SOPS (Secrets OPerationS) addresses this by encrypting secret files before they are committed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;age is an asymmetric encryption tool. You generate a keypair:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;age-keygen &lt;span class="nt"&gt;-o&lt;/span&gt; ~/.age/key.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The public key goes into a &lt;code&gt;.sops.yaml&lt;/code&gt; file in your repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;creation_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path_regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.*&lt;/span&gt;
    &lt;span class="na"&gt;age&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;age1youragepublickey...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To encrypt a secret:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"my-secret-value"&lt;/span&gt; | sops &lt;span class="nt"&gt;--encrypt&lt;/span&gt; &lt;span class="nt"&gt;--input-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;binary &lt;span class="nt"&gt;--output-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;binary /dev/stdin &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; apps/traefik/crowdsec_api_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The encrypted file is committed to Git. The private age key (&lt;code&gt;~/.age/key.txt&lt;/code&gt;) is never committed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How SwarmCD decrypts
&lt;/h3&gt;

&lt;p&gt;During the Ansible bootstrap (Part 2), the age private key is stored as a Docker secret named &lt;code&gt;age_key&lt;/code&gt;. SwarmCD mounts this secret and sets the &lt;code&gt;SOPS_AGE_KEY_FILE&lt;/code&gt; environment variable to point to it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# apps/swarmcd/swarmcd.yaml (relevant excerpt)&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;swarmcd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;SOPS_AGE_KEY_FILE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/secrets/age_key&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;age_key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When SwarmCD processes a stack that has &lt;code&gt;sops_files&lt;/code&gt;, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Finds each listed file in the cloned repo&lt;/li&gt;
&lt;li&gt;Decrypts it using the age key&lt;/li&gt;
&lt;li&gt;Creates a Docker secret with the decrypted content&lt;/li&gt;
&lt;li&gt;References that secret in the service deployment&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The decrypted value exists only in Docker's encrypted secret store and in the container's memory at runtime. It never touches the server's filesystem in plaintext.&lt;/p&gt;

&lt;h3&gt;
  
  
  Two-layer secret model
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Where stored&lt;/th&gt;
&lt;th&gt;Used by&lt;/th&gt;
&lt;th&gt;Contents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SOPS/vault.yaml&lt;/td&gt;
&lt;td&gt;Git (encrypted)&lt;/td&gt;
&lt;td&gt;Terraform at provisioning time&lt;/td&gt;
&lt;td&gt;Cloud API keys, admin password hash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker secrets&lt;/td&gt;
&lt;td&gt;Docker Swarm secret store&lt;/td&gt;
&lt;td&gt;Running containers&lt;/td&gt;
&lt;td&gt;App credentials (GitLab token, Grafana password, CrowdSec API key)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Terraform-layer secrets are only present during &lt;code&gt;tofu apply&lt;/code&gt;. They are never present in a running container. Docker-layer secrets are created once (by Ansible or by SwarmCD) and mounted into containers as files.&lt;/p&gt;




&lt;h2&gt;
  
  
  Adding a New App
&lt;/h2&gt;

&lt;p&gt;To add a new service to the GitOps setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Write a Docker Compose file&lt;/strong&gt; and add it to &lt;code&gt;apps/yourapp/yourapp.yaml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a stack entry&lt;/strong&gt; to &lt;code&gt;apps/swarmcd/config.yaml&lt;/code&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;stacks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;yourapp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-infra&lt;/span&gt;
       &lt;span class="na"&gt;branch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;main&lt;/span&gt;
       &lt;span class="na"&gt;compose_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/apps/yourapp/yourapp.yaml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run the Ansible playbook&lt;/strong&gt; to deploy the updated SwarmCD configuration to the server:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;cd &lt;/span&gt;server &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ansible-playbook setup_docker.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SwarmCD's &lt;code&gt;config.yaml&lt;/code&gt; is mounted into the container as a Docker config object. Modifying it in Git is not sufficient — Ansible must redeploy SwarmCD with the updated config so the new stack entry takes effect.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add Traefik labels&lt;/strong&gt; to expose the service (if needed):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="c1"&gt;# In yourapp.yaml&lt;/span&gt;
   &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;yourapp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.enable=true"&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.yourapp.rule=Host(`yourapp.yourdomain.com`)"&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.routers.yourapp.tls.certresolver=production"&lt;/span&gt;
         &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traefik.http.services.yourapp.loadbalancer.server.port=8080"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add a Cloudflare DNS record&lt;/strong&gt; in root &lt;code&gt;main.tf&lt;/code&gt; (and run &lt;code&gt;tofu apply&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Push to Git&lt;/strong&gt; — SwarmCD deploys it within 45 seconds&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the complete workflow. No manual Docker commands, no state to maintain outside of Git.&lt;/p&gt;




&lt;h2&gt;
  
  
  What SwarmCD Does NOT Do
&lt;/h2&gt;

&lt;p&gt;SwarmCD has limitations worth addressing explicitly, since GitOps tooling often implies more sophisticated features:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No health-check-based rollbacks.&lt;/strong&gt; If a new deployment causes container crashes, SwarmCD will not automatically roll back. Docker Swarm's restart policy (&lt;code&gt;on-failure&lt;/code&gt;) will restart failed containers, and &lt;code&gt;start-first&lt;/code&gt; update strategy (configured in Traefik) minimizes downtime, but there's no automatic rollback to a previous Git commit.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mitigation:&lt;/em&gt; If a deployment breaks, &lt;code&gt;git revert&lt;/code&gt; and push. SwarmCD will redeploy the reverted config within 45 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No diff preview.&lt;/strong&gt; SwarmCD doesn't show you what will change before deploying.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mitigation:&lt;/em&gt; Git PR review serves this function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No multi-cluster support.&lt;/strong&gt; SwarmCD manages one Swarm from one config file. This is fine for a single-server setup.&lt;/p&gt;

&lt;p&gt;For a small production setup, none of these limitations prevent adoption. The simplicity SwarmCD offers in exchange for these features is an appropriate tradeoff at this scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Complete Deployment Flow
&lt;/h2&gt;

&lt;p&gt;Here's every step that happens when you push a change to &lt;code&gt;apps/traefik/traefik_static_conf.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. git push origin main
   └── GitLab receives the commit

2. SwarmCD polls (within 45 seconds)
   └── Detects diff in apps/traefik/traefik_static_conf.yaml

3. Clone / pull latest main branch

4. Parse apps/swarmcd/config.yaml
   └── traefik stack → compose_file: /apps/traefik/traefik.yaml

5. Decrypt SOPS secrets
   └── apps/traefik/crowdsec_api_key → decrypted value

6. Create versioned Docker objects
   ├── Config: traefik_static_conf_abc123 (new content hash)
   ├── Config: traefik_dynamic_conf_def456 (unchanged, reuse or recreate)
   └── Secret: crowdsec_api_key_ghi789

7. docker stack deploy traefik
   └── Service update: new config references

8. Docker Swarm rolls out update
   └── Traefik restarts with new static config

9. Traefik re-reads config
   ├── New entrypoint settings take effect
   └── New ACME resolver (if changed) initializes

10. Monitoring captures the event
    └── Alloy ships container restart log to Grafana Cloud Loki
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total time from &lt;code&gt;git push&lt;/code&gt; to running: approximately 45-90 seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The GitOps loop in this stack is simple by design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SwarmCD&lt;/strong&gt; polls Git every 45 seconds and runs &lt;code&gt;docker stack deploy&lt;/code&gt; on changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;auto_rotate&lt;/code&gt;&lt;/strong&gt; handles Docker Swarm's immutable config requirement transparently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SOPS + age&lt;/strong&gt; lets secrets live in Git safely — encrypted at commit time, decrypted at deploy time by SwarmCD using a Docker-managed age key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No pipeline required&lt;/strong&gt; — the deployment trigger is a Git push, not a CI job&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following part covers how public traffic is routed and secured.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Repository: &lt;a href="https://gitlab.com/sakonn/docker-swarm-gitops" rel="noopener noreferrer"&gt;https://gitlab.com/sakonn/docker-swarm-gitops&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Part 2: Provision and Harden a Cloud Server in One Command with OpenTofu</title>
      <dc:creator>Jakub Korečko</dc:creator>
      <pubDate>Sat, 30 May 2026 08:44:27 +0000</pubDate>
      <link>https://dev.to/sakonn/part-2-provision-and-harden-a-cloud-server-in-one-command-with-opentofu-8om</link>
      <guid>https://dev.to/sakonn/part-2-provision-and-harden-a-cloud-server-in-one-command-with-opentofu-8om</guid>
      <description>&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How the OpenTofu (Terraform-compatible) project is structured&lt;/li&gt;
&lt;li&gt;How secrets are managed at the infrastructure layer with SOPS&lt;/li&gt;
&lt;li&gt;What Hetzner resources get created and why&lt;/li&gt;
&lt;li&gt;How cloud-init hardens the server before any code runs on it&lt;/li&gt;
&lt;li&gt;How Ansible bootstraps Docker without hardcoded IPs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  One Command, One Server, Fully Ready
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tofu apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After approximately five minutes, the following resources are available:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A hardened Ubuntu 24.04 server on Hetzner&lt;/li&gt;
&lt;li&gt;SSH key-only authentication, fail2ban, kernel hardening&lt;/li&gt;
&lt;li&gt;Docker installed with Swarm initialized&lt;/li&gt;
&lt;li&gt;An overlay network for inter-service communication&lt;/li&gt;
&lt;li&gt;Tailscale VPN joined (secure admin access)&lt;/li&gt;
&lt;li&gt;Cloudflare DNS records pointing to the new server IP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then, once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;server &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ansible-playbook setup_docker.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SwarmCD is deployed and monitoring the Git repository. From this point, all deployments are triggered by &lt;code&gt;git push&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This article covers each file that makes this possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;

&lt;p&gt;The OpenTofu project has two layers: a root module that orchestrates everything, and a &lt;code&gt;server/&lt;/code&gt; child module that handles Hetzner-specific resources.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gitops/
├── main.tf         # Wires modules, Cloudflare DNS, GitLab remote state
├── providers.tf    # SOPS, Cloudflare, Hcloud, Ansible provider versions
├── variables.tf    # Root variables (cloudflare_zone_id)
├── data.tf         # Loads vault.yaml via SOPS
├── vault.yaml      # Encrypted secrets (not vault.yaml.example)
└── server/
    ├── main.tf     # All Hetzner resources
    ├── providers.tf
    ├── variables.tf    # Sensitive inputs from root
    ├── outputs.tf      # Exported values (IP, hostname, etc.)
    ├── hetzner.tfpl    # cloud-init template
    ├── inventory.yaml  # Dynamic Ansible inventory
    ├── ansible.cfg
    └── setup_docker.yaml  # Ansible bootstrap playbook
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The root module passes secrets down to the server module. The server module passes connection details back up via outputs, which are then consumed by the dynamic Ansible inventory.&lt;/p&gt;




&lt;h2&gt;
  
  
  Secrets at the Infrastructure Layer
&lt;/h2&gt;

&lt;p&gt;Before anything can be provisioned, OpenTofu needs API keys: Hetzner, Cloudflare, Tailscale OAuth. These are stored in &lt;code&gt;vault.yaml&lt;/code&gt;, encrypted with SOPS using an age key that never touches version control.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;vault.yaml.example&lt;/code&gt; shows the structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;cloudflare_api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;TOKEN&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;hetzner_api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;TOKEN&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;tailscale_client_secret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;SECRET&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;server_admin_password_hash&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;BCRYPT_HASH&amp;gt;&lt;/span&gt;
&lt;span class="na"&gt;gitlab_password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;&amp;lt;PERSONAL_ACCESS_TOKEN&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual &lt;code&gt;vault.yaml&lt;/code&gt; is encrypted. OpenTofu decrypts it at plan/apply time using the SOPS provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# data.tf&lt;/span&gt;
&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"sops_file"&lt;/span&gt; &lt;span class="s2"&gt;"vault"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source_file&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"vault.yaml"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the first of two secret layers in this stack. Terraform-layer secrets are used only during provisioning — they create infrastructure and pass credentials into the server. They are never stored in Docker or exposed to running applications.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why two secret layers?&lt;/strong&gt; The Terraform layer (vault.yaml) holds API keys for cloud providers. The Docker layer holds runtime credentials for apps (GitLab token for SwarmCD, Grafana password for Alloy). Separating these means a compromised application cannot access cloud infrastructure credentials.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Hetzner Resources
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;server/main.tf&lt;/code&gt; creates every cloud resource the server needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Static IP Addresses
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"hcloud_primary_ip"&lt;/span&gt; &lt;span class="s2"&gt;"primary_ipv4"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ipv4"&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"primary_ipv4"&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hcloud_server_location&lt;/span&gt;
  &lt;span class="nx"&gt;auto_delete&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="nx"&gt;assignee_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"server"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;auto_delete = false&lt;/code&gt; is critical. Without it, destroying the server also destroys the IP address. When rebuilding a server, losing the IP address requires updating DNS records, waiting for propagation, and may invalidate pending Let's Encrypt certificate challenges. With &lt;code&gt;auto_delete = false&lt;/code&gt;, the IP persists independently of the server lifecycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Firewall
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"hcloud_firewall"&lt;/span&gt; &lt;span class="s2"&gt;"primary_firewall"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"primary_firewall"&lt;/span&gt;

  &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;direction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"in"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"icmp"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;source_ips&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;direction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"in"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"80"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt;  &lt;span class="nx"&gt;source_ips&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;direction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"in"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"443"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;source_ips&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;direction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"in"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"udp"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"80"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt;  &lt;span class="nx"&gt;source_ips&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;direction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"in"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;protocol&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"udp"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"443"&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;source_ips&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"::/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only three port ranges are open to the public: ICMP, 80, and 443. Everything else is blocked at the Hetzner firewall level — not just in software, but in the network infrastructure before packets reach the server.&lt;/p&gt;

&lt;p&gt;UDP 80/443 is included for HTTP/3 (QUIC), which Traefik can use.&lt;/p&gt;

&lt;p&gt;SSH is not open in the firewall at all. Tailscale VPN handles all admin access — the SSH port is only reachable over the Tailscale network, so it is invisible to public internet scanners.&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent Volume
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"hcloud_volume"&lt;/span&gt; &lt;span class="s2"&gt;"primary_volume"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"primary_volume"&lt;/span&gt;
  &lt;span class="nx"&gt;size&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="c1"&gt;# GB&lt;/span&gt;
  &lt;span class="nx"&gt;server_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;hcloud_server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;primary_server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;automount&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;format&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ext4"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 10GB Hetzner volume is mounted separately from the server's root disk. It stores stateful data (database files, application data) that must survive server rebuilds. Hetzner volumes are not deleted when a server is deleted — they persist independently of the server lifecycle and can be attached to a replacement server.&lt;/p&gt;

&lt;p&gt;Ansible later creates Docker bind-mount volumes pointing into this volume's mount path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"hcloud_server"&lt;/span&gt; &lt;span class="s2"&gt;"primary_server"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-server"&lt;/span&gt;
  &lt;span class="nx"&gt;image&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ubuntu-24.04"&lt;/span&gt;
  &lt;span class="nx"&gt;server_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cx23"&lt;/span&gt;   &lt;span class="c1"&gt;# 2 vCPU, 4GB RAM&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hcloud_server_location&lt;/span&gt;

  &lt;span class="nx"&gt;user_data&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;templatefile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"${path.module}/hetzner.tfpl"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;admin_username&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;admin_username&lt;/span&gt;
    &lt;span class="nx"&gt;admin_password_hash&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;admin_password_hash&lt;/span&gt;
    &lt;span class="nx"&gt;admin_ssh_keys&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;admin_ssh_keys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nx"&gt;tailscale_client_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tailscale_client_id&lt;/span&gt;
    &lt;span class="nx"&gt;tailscale_client_secret&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tailscale_client_secret&lt;/span&gt;
    &lt;span class="nx"&gt;ssh_port&lt;/span&gt;                &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ssh_port&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="p"&gt;...&lt;/span&gt;

  &lt;span class="nx"&gt;lifecycle&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;ignore_changes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;user_data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;user_data&lt;/code&gt; field is where cloud-init lives. The &lt;code&gt;templatefile&lt;/code&gt; function renders &lt;code&gt;hetzner.tfpl&lt;/code&gt; with values from Terraform variables — including the bcrypt-hashed admin password and Tailscale OAuth credentials. These are injected into the cloud-init template without ever being written to disk in plaintext.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;lifecycle&lt;/code&gt; block with &lt;code&gt;ignore_changes = [user_data]&lt;/code&gt; is essential for production. Without it, every change to the cloud-init template would cause Terraform to destroy and recreate the server — a full machine wipe. cloud-init runs only on first boot; subsequent changes to the template have no effect on a running server anyway. Ignoring &lt;code&gt;user_data&lt;/code&gt; drift allows changes to the cloud-init template without triggering a server rebuild.&lt;/p&gt;




&lt;h2&gt;
  
  
  cloud-init: OS Hardening on First Boot
&lt;/h2&gt;

&lt;p&gt;cloud-init runs once on first boot, before any remote connection is established. This is where OS-level hardening happens.&lt;/p&gt;

&lt;p&gt;The full template is at &lt;a href="https://gitlab.com/sakonn/docker-swarm-gitops/-/blob/main/server/hetzner.tfpl" rel="noopener noreferrer"&gt;server/hetzner.tfpl&lt;/a&gt;. Here are the key sections:&lt;/p&gt;

&lt;h3&gt;
  
  
  Package Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;packages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;fail2ban&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;auditd&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;unattended-upgrades&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;git&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;curl&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ca-certificates&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;build-essential&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;fail2ban and auditd are installed from packages (not Docker) because they need to monitor the host OS, not a container. &lt;code&gt;unattended-upgrades&lt;/code&gt; enables automatic security patch installation.&lt;/p&gt;

&lt;h3&gt;
  
  
  User Creation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${admin_username}&lt;/span&gt;
    &lt;span class="na"&gt;passwd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${admin_password_hash}&lt;/span&gt;
    &lt;span class="na"&gt;lock_passwd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="na"&gt;groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sudo, docker&lt;/span&gt;
    &lt;span class="na"&gt;shell&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/bin/bash&lt;/span&gt;
    &lt;span class="na"&gt;sudo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ALL=(ALL) NOPASSWD:ALL&lt;/span&gt;
    &lt;span class="na"&gt;ssh_authorized_keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${jsonencode(admin_ssh_keys)}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The admin user is created with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A bcrypt-hashed password (generated locally, never sent in plaintext)&lt;/li&gt;
&lt;li&gt;Both your personal SSH key and an Ansible-specific SSH key&lt;/li&gt;
&lt;li&gt;Membership in the &lt;code&gt;docker&lt;/code&gt; group (can run Docker without sudo)&lt;/li&gt;
&lt;li&gt;NOPASSWD sudo (automation-friendly)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Ansible key is a separate ed25519 key generated locally (&lt;code&gt;ssh-keygen -t ed25519 -f .ansible_key&lt;/code&gt;). It is used only for the bootstrap playbook and can be rotated or removed after provisioning.&lt;/p&gt;

&lt;h3&gt;
  
  
  SSH Hardening
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/ssh/sshd_config.d/99-hardening.conf&lt;/span&gt;
  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;PermitRootLogin prohibit-password&lt;/span&gt;
    &lt;span class="s"&gt;PasswordAuthentication no&lt;/span&gt;
    &lt;span class="s"&gt;MaxAuthTries 6&lt;/span&gt;
    &lt;span class="s"&gt;MaxSessions 3&lt;/span&gt;
    &lt;span class="s"&gt;X11Forwarding no&lt;/span&gt;
    &lt;span class="s"&gt;AllowAgentForwarding no&lt;/span&gt;
    &lt;span class="s"&gt;ClientAliveInterval 300&lt;/span&gt;
    &lt;span class="s"&gt;ClientAliveCountMax 2&lt;/span&gt;
    &lt;span class="s"&gt;LoginGraceTime 30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;PasswordAuthentication no&lt;/code&gt; restricts authentication to SSH keys only. Combined with fail2ban banning source IPs after 3 failed attempts within a 1-hour window, brute-force SSH attacks are mitigated at both the authentication and network level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kernel Hardening
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/sysctl.d/99-hardening.conf&lt;/span&gt;
  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;# Prevent IP spoofing&lt;/span&gt;
    &lt;span class="s"&gt;net.ipv4.conf.all.rp_filter = 1&lt;/span&gt;
    &lt;span class="s"&gt;net.ipv4.conf.default.rp_filter = 1&lt;/span&gt;
    &lt;span class="s"&gt;# Ignore ICMP redirects (prevent routing attacks)&lt;/span&gt;
    &lt;span class="s"&gt;net.ipv4.conf.all.accept_redirects = 0&lt;/span&gt;
    &lt;span class="s"&gt;net.ipv6.conf.all.accept_redirects = 0&lt;/span&gt;
    &lt;span class="s"&gt;# SYN flood protection&lt;/span&gt;
    &lt;span class="s"&gt;net.ipv4.tcp_syncookies = 1&lt;/span&gt;
    &lt;span class="s"&gt;# IP forwarding (required for Docker networking and Tailscale)&lt;/span&gt;
    &lt;span class="s"&gt;net.ipv4.ip_forward = 1&lt;/span&gt;
    &lt;span class="s"&gt;net.ipv6.conf.all.forwarding = 1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reverse path filtering blocks packets with source IPs that couldn't have arrived on the interface they came in on — a basic IP spoofing defense. SYN cookies protect against SYN flood DoS attacks. IP forwarding is required for both Docker's overlay networking and Tailscale's routing.&lt;/p&gt;

&lt;h3&gt;
  
  
  fail2ban Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/etc/fail2ban/jail.local&lt;/span&gt;
  &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;[DEFAULT]&lt;/span&gt;
    &lt;span class="s"&gt;bantime = 3600&lt;/span&gt;
    &lt;span class="s"&gt;findtime = 600&lt;/span&gt;
    &lt;span class="s"&gt;maxretry = 3&lt;/span&gt;

    &lt;span class="s"&gt;[sshd]&lt;/span&gt;
    &lt;span class="s"&gt;enabled = true&lt;/span&gt;
    &lt;span class="s"&gt;mode = aggressive&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three failed attempts within 10 minutes result in a 1-hour IP ban. Aggressive mode also covers additional SSH attack patterns beyond basic password failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker and Swarm Initialization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;runcmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sysctl --system&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;systemctl enable --now fail2ban&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;systemctl enable --now auditd&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;systemctl restart ssh&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;curl -fsSL https://get.docker.com | sh&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;docker swarm init&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;docker network create -d overlay --attachable swarm_network&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;curl -fsSL https://tailscale.com/install.sh | sh&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tailscale up --ssh --accept-routes --advertise-exit-node&lt;/span&gt;
      &lt;span class="s"&gt;--advertise-tags=tag:server&lt;/span&gt;
      &lt;span class="s"&gt;--client-id=${tailscale_client_id}&lt;/span&gt;
      &lt;span class="s"&gt;--client-secret=${tailscale_client_secret}&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;reboot&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker is installed via the official convenience script. Swarm is initialized immediately. The &lt;code&gt;swarm_network&lt;/code&gt; overlay network is created — all services in &lt;code&gt;apps/&lt;/code&gt; connect to this network, enabling inter-service communication across nodes as the cluster scales.&lt;/p&gt;

&lt;p&gt;Tailscale authenticates via OAuth, requiring no interactive input. The &lt;code&gt;--ssh&lt;/code&gt; flag enables Tailscale SSH, which Ansible uses for the bootstrap playbook.&lt;/p&gt;

&lt;p&gt;The final &lt;code&gt;reboot&lt;/code&gt; applies all sysctl changes and ensures all services start from a consistent state.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cloudflare DNS via Terraform
&lt;/h2&gt;

&lt;p&gt;Back in the root &lt;code&gt;main.tf&lt;/code&gt;, Cloudflare DNS records are created after the server is provisioned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"cloudflare_dns_record"&lt;/span&gt; &lt;span class="s2"&gt;"root"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudflare_zone_id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"@"&lt;/span&gt;
  &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;server_ipv4&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"A"&lt;/span&gt;
  &lt;span class="nx"&gt;proxied&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"cloudflare_dns_record"&lt;/span&gt; &lt;span class="s2"&gt;"www"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudflare_zone_id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"www"&lt;/span&gt;
  &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;server_ipv4&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"A"&lt;/span&gt;
  &lt;span class="nx"&gt;proxied&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;proxied = true&lt;/code&gt; routes traffic through Cloudflare's CDN, concealing the origin server IP address. Traffic targeting the domain by IP address is handled by Cloudflare's infrastructure, keeping the origin server address private.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dynamic Ansible Inventory
&lt;/h2&gt;

&lt;p&gt;Instead of hardcoding the server IP in an inventory file, Ansible reads it directly from Terraform state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# server/inventory.yaml&lt;/span&gt;
&lt;span class="na"&gt;plugin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cloud.terraform.terraform_provider&lt;/span&gt;
&lt;span class="na"&gt;binary_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tofu&lt;/span&gt;
&lt;span class="na"&gt;project_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;../&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This uses the &lt;code&gt;cloud.terraform.terraform_provider&lt;/code&gt; Ansible plugin, which runs &lt;code&gt;tofu output&lt;/code&gt; internally and maps the results to Ansible host variables. The server's Tailscale hostname, IP, SSH port, and SSH key path all come from Terraform outputs — no manual synchronization needed.&lt;/p&gt;

&lt;p&gt;When a server is rebuilt, &lt;code&gt;tofu apply&lt;/code&gt; updates the state and the next Ansible run reads the updated connection details from Terraform outputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Ansible Bootstrap Playbook
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;server/setup_docker.yaml&lt;/code&gt; runs once after &lt;code&gt;tofu apply&lt;/code&gt; completes. It cannot be part of cloud-init because it requires Docker and Swarm to already be running.&lt;/p&gt;

&lt;h3&gt;
  
  
  Persistent Storage Directories
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Create directories on Hetzner volume&lt;/span&gt;
  &lt;span class="na"&gt;ansible.builtin.file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/mnt/{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hostvars[inventory_hostname].volume_id&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}/{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;item&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
    &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;directory&lt;/span&gt;
  &lt;span class="na"&gt;loop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;crowdsec_db&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;papra_data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Directories are created on the Hetzner volume (mounted at &lt;code&gt;/mnt/HC_Volume_*&lt;/code&gt;). The volume path uses the &lt;code&gt;volume_id&lt;/code&gt; from Terraform outputs, so there's no hardcoded mount path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Secrets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Create gitlab_password Docker secret&lt;/span&gt;
  &lt;span class="na"&gt;community.docker.docker_secret&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gitlab_password&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lookup('community.sops.sops',&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'../vault.yaml')&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;from_yaml&lt;/span&gt;
              &lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;json_query('gitlab_password')&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
    &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;present&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SOPS decrypts &lt;code&gt;vault.yaml&lt;/code&gt; locally on your machine, and Ansible pushes the decrypted value as a Docker secret. The plaintext never touches the server's filesystem — it goes directly from your machine into Docker's encrypted secret store.&lt;/p&gt;

&lt;p&gt;Two secrets are created:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gitlab_password&lt;/code&gt;: SwarmCD uses this to authenticate with GitLab when cloning the repo&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;age_key&lt;/code&gt;: SwarmCD uses this to decrypt SOPS-encrypted secret files in the repo&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  SwarmCD Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy SwarmCD&lt;/span&gt;
  &lt;span class="na"&gt;community.docker.docker_stack&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;swarmcd&lt;/span&gt;
    &lt;span class="na"&gt;compose&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;lookup('file',&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'../apps/swarmcd/swarmcd.yaml')&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;|&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;from_yaml&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
    &lt;span class="na"&gt;state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;present&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SwarmCD is deployed from the Ansible playbook — it cannot manage its own initial deployment, so this bootstrap step is handled outside the GitOps loop. After this, SwarmCD takes over management of all other stacks from Git.&lt;/p&gt;




&lt;h2&gt;
  
  
  Remote State
&lt;/h2&gt;

&lt;p&gt;The Terraform state is stored in GitLab's managed HTTP backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="s2"&gt;"http"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;address&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://gitlab.com/api/v4/projects/.../terraform/state/default"&lt;/span&gt;
    &lt;span class="nx"&gt;lock_address&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt;
    &lt;span class="nx"&gt;unlock_address&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means state is shared between team members and persists across machines. GitLab provides free managed Terraform state for any project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;After &lt;code&gt;tofu apply&lt;/code&gt; and &lt;code&gt;ansible-playbook setup_docker.yaml&lt;/code&gt;, you have:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A Hetzner server with hardened OS (SSH keys only, fail2ban, kernel sysctl)&lt;/li&gt;
&lt;li&gt;Docker Swarm initialized with an overlay network&lt;/li&gt;
&lt;li&gt;Tailscale VPN joined (admin access without public SSH)&lt;/li&gt;
&lt;li&gt;Cloudflare DNS records created&lt;/li&gt;
&lt;li&gt;SwarmCD running and watching your Git repository&lt;/li&gt;
&lt;li&gt;A 10GB persistent volume with Docker bind-mount volumes for stateful data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From this point forward, infrastructure changes are managed through Git. New application deployments and configuration updates are applied via &lt;code&gt;git push&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Repository: &lt;a href="https://gitlab.com/sakonn/docker-swarm-gitops" rel="noopener noreferrer"&gt;gitlab.com/sakonn/docker-swarm-gitops&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>I Run a Full GitOps Stack for $6/month — Here's the Architecture</title>
      <dc:creator>Jakub Korečko</dc:creator>
      <pubDate>Sat, 30 May 2026 08:44:03 +0000</pubDate>
      <link>https://dev.to/sakonn/i-run-a-full-gitops-stack-for-6month-heres-the-architecture-53m2</link>
      <guid>https://dev.to/sakonn/i-run-a-full-gitops-stack-for-6month-heres-the-architecture-53m2</guid>
      <description>&lt;p&gt;&lt;strong&gt;What you'll learn:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What this stack does and what problem it solves&lt;/li&gt;
&lt;li&gt;Why Docker Swarm instead of Kubernetes&lt;/li&gt;
&lt;li&gt;Why Hetzner and why GitOps&lt;/li&gt;
&lt;li&gt;The full tech stack with tool-by-tool rationale&lt;/li&gt;
&lt;li&gt;How the repository is structured&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Every side project or small production deployment starts the same way: you SSH into a server, run some commands, tweak a config, and hope you remember what you did. Six months later something breaks and you can't reproduce the setup. You're firefighting with no audit trail and no way to roll back.&lt;/p&gt;

&lt;p&gt;The solution is GitOps — your infrastructure and application configuration live in a Git repository, and changes to that repository automatically trigger deployments. Git becomes your source of truth, your audit log, and your rollback mechanism.&lt;/p&gt;

&lt;p&gt;The problem is that most GitOps tutorials assume you're running Kubernetes. If you're running a small project on a single server, Kubernetes is enormous overkill.&lt;/p&gt;

&lt;p&gt;This series shows a different approach: a full GitOps pipeline on Docker Swarm, running on a single Hetzner server for roughly €5/month. No cluster management. No operator madness. Just Git, Docker, and a handful of focused tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Stack Does
&lt;/h2&gt;

&lt;p&gt;From a single &lt;code&gt;git push&lt;/code&gt;, the stack:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Detects the change within 45 seconds&lt;/li&gt;
&lt;li&gt;Decrypts any secrets that were updated&lt;/li&gt;
&lt;li&gt;Deploys the new service configuration to Docker Swarm&lt;/li&gt;
&lt;li&gt;Routes traffic through Traefik (TLS terminated, WAF filtered)&lt;/li&gt;
&lt;li&gt;Ships metrics and logs to Grafana Cloud&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From &lt;code&gt;tofu apply&lt;/code&gt; (a one-time operation), the stack:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Provisions a hardened Ubuntu 24.04 server on Hetzner&lt;/li&gt;
&lt;li&gt;Configures firewall rules, SSH hardening, fail2ban&lt;/li&gt;
&lt;li&gt;Installs Docker, initializes a Swarm&lt;/li&gt;
&lt;li&gt;Joins Tailscale VPN (for secure admin access without exposing SSH publicly)&lt;/li&gt;
&lt;li&gt;Creates Cloudflare DNS records&lt;/li&gt;
&lt;li&gt;Bootstraps the GitOps controller (SwarmCD)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After that initial apply, the server is fully hands-off. All future changes go through Git.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│                 Your Local Machine                  │
│                                                     │
│  git push  ──────────────────────────► GitLab Repo  │
│                                             │       │
│  tofu apply ──► Hetzner + Cloudflare        │       │
└─────────────────────────────────────────────────────┘
                                             │
                                             ▼
┌─────────────────────────────────────────────────────┐
│               Hetzner Cloud Server                  │
│                                                     │
│  ┌─────────────┐    polls every 45s                 │
│  │  SwarmCD    │ ◄──────────────── GitLab Repo      │
│  └──────┬──────┘                                    │
│         │ docker stack deploy                       │
│         ▼                                           │
│  ┌─────────────────────────────────────────────┐    │
│  │           Docker Swarm                      │    │
│  │                                             │    │
│  │  Traefik ──► CrowdSec ──► Apps              │    │
│  │  (reverse proxy + WAF)    (bento, etc.)     │    │
│  │                                             │    │
│  │  Monitoring (Alloy, cAdvisor, node-exporter)│    │
│  └─────────────────────────────────────────────┘    │
│                                                     │
│  Tailscale VPN ──── Admin Access (no public SSH)    │
│                                                     │
│  Hetzner Volume (10GB persistent storage)           │
└─────────────────────────────────────────────────────┘
         │ metrics + logs
         ▼
  Grafana Cloud (free tier)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why Docker Swarm, Not Kubernetes
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"Why not just use Kubernetes?"&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the first question you may ask, so let's address it directly.&lt;/p&gt;

&lt;p&gt;Kubernetes is excellent for teams managing dozens of services across multiple nodes. For a single server with five services, it introduces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A control plane that consumes ~1-2GB RAM before your apps even start&lt;/li&gt;
&lt;li&gt;A separate installation process (kubeadm, k3s, minikube, etc.)&lt;/li&gt;
&lt;li&gt;A completely different mental model for networking, storage, and secrets&lt;/li&gt;
&lt;li&gt;Custom Resource Definitions, operators, Helm charts, and dozens of YAML abstractions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker Swarm, by contrast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is built into Docker — one command to initialize: &lt;code&gt;docker swarm init&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Uses the same Docker Compose format you already know&lt;/li&gt;
&lt;li&gt;Has native support for secrets and configs&lt;/li&gt;
&lt;li&gt;Consumes essentially no overhead on a single-node setup&lt;/li&gt;
&lt;li&gt;Handles rolling updates and zero-downtime deployments without configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tradeoff: Docker Swarm has fewer features, smaller ecosystem, and doesn't scale horizontally as easily as Kubernetes. For a single server hosting a handful of services, none of that matters.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; Docker Swarm is not "Kubernetes lite" — it's a different tool for a different scale. Choosing the right tool for the scale you actually have is good engineering.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why Hetzner
&lt;/h2&gt;

&lt;p&gt;Hetzner is a German cloud provider with data centers in Nuremberg, Falkenstein, and Helsinki. The cx23 instance used in this stack costs approximately €4-5/month:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;2 vCPU (shared)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM&lt;/td&gt;
&lt;td&gt;4 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;40 GB SSD (+ 10 GB volume)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;20 TB/month included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Location&lt;/td&gt;
&lt;td&gt;Nuremberg, Germany&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For comparison, a comparable AWS EC2 instance (t3.medium) costs roughly $30/month. The performance difference for a hobby or small production workload is negligible.&lt;/p&gt;

&lt;p&gt;Hetzner is also EU-based and GDPR-compliant — relevant if you're handling user data from European users.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why GitOps
&lt;/h2&gt;

&lt;p&gt;GitOps solves three problems that every self-hosted deployment eventually hits:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. No reproducibility.&lt;/strong&gt; "I set this up six months ago and I don't remember what I did" is the death of many side projects. When infrastructure and app config live in Git, any team member (or future you) can re-provision from scratch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. No audit trail.&lt;/strong&gt; Who changed what and when? Git history answers this. Every deployment is a commit with a timestamp and author.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Config drift.&lt;/strong&gt; The server diverges from what you think is deployed. GitOps eliminates drift by making Git the authoritative source. The GitOps controller continuously reconciles what's running with what's in Git.&lt;/p&gt;

&lt;p&gt;The GitOps controller in this stack is SwarmCD. It polls the Git repository every 45 seconds and deploys any changed stacks. You push to Git; SwarmCD handles the rest.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Why This One&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://opentofu.org/" rel="noopener noreferrer"&gt;OpenTofu&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Infrastructure as Code&lt;/td&gt;
&lt;td&gt;Open-source Terraform fork; provisions Hetzner, Cloudflare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.hetzner.com/cloud" rel="noopener noreferrer"&gt;Hetzner Cloud&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Server hosting&lt;/td&gt;
&lt;td&gt;Cheap, reliable, EU-based, good API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.cloudflare.com/" rel="noopener noreferrer"&gt;Cloudflare&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;DNS + CDN&lt;/td&gt;
&lt;td&gt;Free tier, hides server IP, DDoS protection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cloud-init&lt;/td&gt;
&lt;td&gt;OS bootstrapping&lt;/td&gt;
&lt;td&gt;Runs on first boot; installs Docker, hardens SSH, joins Tailscale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.ansible.com/" rel="noopener noreferrer"&gt;Ansible&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Bootstrap provisioning&lt;/td&gt;
&lt;td&gt;One-time Docker setup (volumes, secrets, SwarmCD deploy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://docs.docker.com/engine/swarm/" rel="noopener noreferrer"&gt;Docker Swarm&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Container orchestration&lt;/td&gt;
&lt;td&gt;Built into Docker, no extra install, Compose-compatible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/m-adawi/swarm-cd" rel="noopener noreferrer"&gt;SwarmCD&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;GitOps controller&lt;/td&gt;
&lt;td&gt;Lightweight, native Swarm support, SOPS integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://github.com/getsops/sops" rel="noopener noreferrer"&gt;SOPS&lt;/a&gt; + &lt;a href="https://github.com/FiloSottile/age" rel="noopener noreferrer"&gt;age&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Secret encryption&lt;/td&gt;
&lt;td&gt;Encrypts secrets in Git; decrypted at deploy time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://traefik.io/" rel="noopener noreferrer"&gt;Traefik&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Reverse proxy + TLS&lt;/td&gt;
&lt;td&gt;Label-based routing, Let's Encrypt built in, Swarm-aware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.crowdsec.net/" rel="noopener noreferrer"&gt;CrowdSec&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;WAF + IDS&lt;/td&gt;
&lt;td&gt;Community IP reputation list + AppSec rules via Traefik plugin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://tailscale.com/" rel="noopener noreferrer"&gt;Tailscale&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;VPN&lt;/td&gt;
&lt;td&gt;Secure admin access; no public SSH exposure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://grafana.com/oss/alloy-opentelemetry-collector/" rel="noopener noreferrer"&gt;Grafana Alloy&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Metrics + logs agent&lt;/td&gt;
&lt;td&gt;Single container replaces Prometheus Agent + Promtail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://grafana.com/products/cloud/" rel="noopener noreferrer"&gt;Grafana Cloud&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Observability backend&lt;/td&gt;
&lt;td&gt;Free tier; managed Prometheus + Loki&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fail2ban&lt;/td&gt;
&lt;td&gt;SSH brute-force protection&lt;/td&gt;
&lt;td&gt;Bans IPs after 3 failed SSH attempts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Repository Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gitops/
├── vault.yaml.example      # Template for SOPS-encrypted secrets
├── main.tf                 # Root: wires modules, Cloudflare DNS, remote state
├── providers.tf            # Provider versions (SOPS, Cloudflare, Hcloud, Ansible)
├── variables.tf            # Root variables
├── data.tf                 # Loads vault.yaml via SOPS provider
│
├── server/                 # Hetzner provisioning module
│   ├── main.tf             # Server, firewall, IPs, volume resources
│   ├── providers.tf        # Local provider declarations
│   ├── variables.tf        # Sensitive inputs (API keys, password hash)
│   ├── outputs.tf          # Server IP, SSH port, Tailscale hostname
│   ├── hetzner.tfpl        # cloud-init template (OS hardening + Docker)
│   ├── inventory.yaml      # Dynamic Ansible inventory (reads Terraform state)
│   ├── ansible.cfg         # Ansible config (points to inventory.yaml)
│   └── setup_docker.yaml   # Ansible playbook (bootstrap Docker + SwarmCD)
│
└── apps/                   # Docker Compose stacks (deployed by SwarmCD)
    ├── swarmcd/
    │   ├── config.yaml     # SwarmCD: which stacks to watch, poll interval
    │   └── swarmcd.yaml    # SwarmCD compose file (bootstrapped by Ansible)
    ├── traefik/
    │   ├── traefik.yaml            # Traefik compose file
    │   ├── traefik_static_conf.yaml  # Entrypoints, ACME, providers, plugins
    │   └── traefik_dynamic_conf.yaml # Middlewares (auth, CrowdSec)
    ├── monitoring/
    │   ├── alloy.yaml              # Grafana Alloy compose file
    │   ├── cadvisor.yaml           # cAdvisor compose file
    │   ├── node_exporter.yaml      # node-exporter compose file
    │   └── grafana_alloy.alloy     # Alloy pipeline config
    └── bento/
        └── bento.yaml              # Example app with Traefik labels
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The split between &lt;code&gt;server/&lt;/code&gt; and &lt;code&gt;apps/&lt;/code&gt; reflects a key separation of concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;server/&lt;/code&gt;&lt;/strong&gt; is infrastructure — runs once, creates the environment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;apps/&lt;/code&gt;&lt;/strong&gt; is application configuration — changes continuously via GitOps&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What's in Each Part
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 2:&lt;/strong&gt; Infrastructure as Code — OpenTofu, Hetzner provisioning, cloud-init hardening, Ansible bootstrap&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3:&lt;/strong&gt; GitOps loop — SwarmCD, SOPS secret encryption, Docker Swarm immutable configs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4:&lt;/strong&gt; Security — Traefik, CrowdSec WAF, Tailscale VPN&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 5:&lt;/strong&gt; Observability — Grafana Alloy, Prometheus, Loki, Grafana Cloud free tier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All source code is in the repository linked at the end of this article. Each article walks through specific files with annotated code snippets.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Repository: &lt;a href="https://gitlab.com/sakonn/docker-swarm-gitops" rel="noopener noreferrer"&gt;gitlab.com/sakonn/docker-swarm-gitops&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>security</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
