Jakub Korečko

Posted on May 30

Part 4: Protecting Public Traffic — Traefik, CrowdSec WAF, and Tailscale VPN

What you'll learn:

The complete traffic flow from user browser to container
How Traefik handles TLS termination, routing, and zero-downtime updates
Why services opt-in to exposure via Docker labels
How CrowdSec adds a WAF and IP reputation layer to every request
How Tailscale VPN secures admin access without opening SSH to the internet

The Threat Model

A public internet server is subject to continuous automated scanning and attack attempts. Within minutes of a new IP address becoming reachable:

Automated scanners probe every common port
Bots attempt SSH brute-force (hundreds of attempts per hour)
Crawlers look for exposed admin interfaces (wp-admin, /actuator, .env, etc.)
Malicious requests probe for SQL injection, XSS, and other OWASP vulnerabilities

The security architecture in this stack addresses each of these without requiring a separate security engineer. Here's the full picture:

Internet
   │
   ▼
Cloudflare (DNS proxy)
   │  Hides server IP, DDoS mitigation, HTTPS at edge
   │
   ▼
Hetzner Firewall
   │  Allows: 80, 443 only. Blocks everything else at network level.
   │
   ▼
fail2ban (host)
   │  Bans IPs after 3 failed SSH attempts (1h ban)
   │
   ▼
Traefik (port 80 / 443)
   │  TLS termination, HTTP→HTTPS redirect, real IP forwarding
   │
   ▼
CrowdSec bouncer (Traefik plugin)
   │  IP reputation check + AppSec WAF rules (SQLi, XSS, etc.)
   │  Block decision: 60s default ban
   │
   ▼
Application (bento, etc.)

Admin traffic takes a different path entirely — through Tailscale VPN, bypassing the public internet stack.

Traefik: The Entry Point for All Traffic

Traefik is the reverse proxy that sits in front of all applications. It handles:

TLS certificate acquisition and renewal (Let's Encrypt, automated)
HTTP to HTTPS redirection
Routing requests to the correct backend service
Running the CrowdSec bouncer plugin

Static Configuration

apps/traefik/traefik_static_conf.yaml defines the entrypoints, providers, and plugins that are loaded once at startup.

Entrypoints:

entryPoints:
  web:
    address: :80
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
    forwardedHeaders:
      trustedIPs: &trustedIps
        - 103.21.244.0/22
        - 104.16.0.0/13
        # ... all Cloudflare IP ranges

  websecure:
    address: :443
    forwardedHeaders:
      trustedIPs: *trustedIps
    transport:
      respondingTimeouts:
        readTimeout: 600s
        writeTimeout: 600s

  metrics:
    address: :8899

Port 80 (web) redirects all traffic to 443 and trusts Cloudflare's IP ranges for the X-Forwarded-For header. Without this trustedIPs configuration, Traefik would see Cloudflare's IP as the client IP — meaning CrowdSec would evaluate Cloudflare's infrastructure, not the actual user. By trusting Cloudflare's ranges, Traefik unwraps the X-Forwarded-For header to get the real client IP.

Port 443 (websecure) has 600-second timeouts to support long-running operations like PDF generation in the Bento app.

Port 8899 (metrics) exposes Prometheus metrics for Grafana Alloy to scrape. This port is not in the Hetzner firewall allow-list and is not accessible from the public internet — Alloy scrapes it from inside the overlay network.

Certificate resolvers:

certificatesResolvers:
  staging:
    acme:
      email: <YOUR_EMAIL>
      caServer: "https://acme-staging-v02.api.letsencrypt.org/directory"
      httpChallenge:
        entryPoint: web

  production:
    acme:
      email: <YOUR_EMAIL>
      caServer: "https://acme-v02.api.letsencrypt.org/directory"
      httpChallenge:
        entryPoint: web

Two resolvers exist: staging (for testing — will not exceed Let's Encrypt rate limits) and production (real certificates). Services specify which resolver to use in their labels. During initial setup, use staging to validate the configuration, then switch to production.

Providers:

providers:
  swarm:
    exposedByDefault: false
  docker:
    exposedByDefault: false
  file:
    directory: /etc/traefik
    watch: true

exposedByDefault: false means Traefik ignores all containers unless they have traefik.enable=true in their labels. A service added to Swarm without this label will not be exposed publicly. Every exposure is explicit and intentional.

CrowdSec plugin:

experimental:
  plugins:
    bouncer:
      moduleName: "github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin"
      version: "v1.5.0"

The plugin is declared here in static config. Its configuration (which requests it applies to, which CrowdSec instance it talks to) is in the dynamic config.

Dynamic Configuration

apps/traefik/traefik_dynamic_conf.yaml defines middlewares and routes that Traefik watches for changes without restarting:

http:
  middlewares:
    auth:
      basicAuth:
        users:
          - <USERNAME>:<BCRYPT_HASH>

    crowdsec:
      plugin:
        bouncer:
          enabled: true
          crowdsecMode: live
          crowdsecAppsecEnabled: true
          crowdsecAppsecHost: crowdsec_crowdsec:7422
          crowdsecAppsecFailureBlock: true
          crowdsecLapiKeyFile: "/run/secrets/crowdsec_api_key"
          crowdsecLapiHost: crowdsec_crowdsec:8080
          forwardedHeadersTrustedIPs:
            - 10.0.0.0/8
            - 172.16.0.0/12
            - 192.168.0.0/16
          clientTrustedIPs:
            - 10.0.0.0/8
            - 172.16.0.0/12
            - 192.168.0.0/16

The crowdsec middleware is defined once here and referenced by any service that wants WAF protection. The auth middleware is used for any internal service (like the Traefik dashboard) that should be behind basic auth.

Zero-Downtime Updates

In the Traefik compose file, the update strategy is:

deploy:
  update_config:
    order: start-first

start-first means Docker Swarm starts the new Traefik container before stopping the old one. During the overlap window, the new container is running and healthy before the old one receives the stop signal. This means Traefik updates happen with no dropped requests.

Combined with SwarmCD's immutable config versioning (Part 3), every configuration change to Traefik is zero-downtime.

Service Exposure via Docker Labels

Here's how a service opts into public access. From apps/bento/bento.yaml:

services:
  bento:
    image: ghcr.io/alam00000/bentopdf-simple:v2.7.0
    networks:
      - swarm_network
    deploy:
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.bento-http.rule=Host(`pdf.yourdomain.com`)"
        - "traefik.http.routers.bento-http.entrypoints=web"
        - "traefik.http.routers.bento-http.middlewares=redirect-to-https@file"
        - "traefik.http.routers.bento.rule=Host(`pdf.yourdomain.com`)"
        - "traefik.http.routers.bento.entrypoints=websecure"
        - "traefik.http.routers.bento.tls.certresolver=production"
        - "traefik.http.routers.bento.middlewares=crowdsec@file"
        - "traefik.http.services.bento.loadbalancer.server.port=8080"

Breaking this down:

traefik.enable=true — opts in to Traefik management
Two routers: one for HTTP (redirect to HTTPS), one for HTTPS
tls.certresolver=production — request a production Let's Encrypt certificate for this hostname
middlewares=crowdsec@file — all requests to this service pass through the CrowdSec bouncer
server.port=8080 — Traefik forwards to this container port

Notice that labels go under deploy: not under services: in Swarm mode. This is a Docker Swarm requirement — service labels (the ones Traefik watches) must be deployment labels, not container labels.

CrowdSec: WAF and IP Reputation

CrowdSec adds two protection layers to every request passing through Traefik:

LAPI (Local API) — IP Reputation:
CrowdSec maintains a local database of banned IP addresses. This database is populated from:

The CrowdSec community threat intelligence feed (millions of crowdsourced malicious IPs)
Local detections (if you run CrowdSec agents on the host)

When a request arrives, the bouncer plugin checks the source IP against the LAPI. If it's in the ban list, the request is blocked immediately with a 403.

AppSec — WAF Rules:
CrowdSec's AppSec component applies request inspection rules that block common attack patterns:

SQL injection (e.g., ' OR 1=1 -- in query parameters)
XSS (e.g., <script>alert(1)</script> in form fields)
Path traversal (e.g., ../../../etc/passwd)
Known CVE exploit patterns for common web frameworks

crowdsec:
  plugin:
    bouncer:
      crowdsecAppsecEnabled: true
      crowdsecAppsecHost: crowdsec_crowdsec:7422
      crowdsecAppsecFailureBlock: true  # Block if AppSec is unreachable

crowdsecAppsecFailureBlock: true means that if the AppSec engine is unavailable (container restart, etc.), requests are blocked rather than allowed through. This is a fail-closed posture — prefer availability loss over security bypass.

Internal traffic bypass:

clientTrustedIPs:
  - 10.0.0.0/8
  - 172.16.0.0/12
  - 192.168.0.0/16

RFC1918 private address ranges (Docker's overlay network, Tailscale) bypass CrowdSec checks. Inter-service communication inside the cluster doesn't need to be WAF-inspected — it never crosses the public internet boundary.

Tailscale: Secure Admin Access

SSH is not exposed in the Hetzner firewall. All administrative access is routed through Tailscale VPN.

During cloud-init (Part 2), the server joins your Tailscale network:

tailscale up \
  --ssh \
  --accept-routes \
  --advertise-exit-node \
  --advertise-tags=tag:server \
  --client-id=<TAILSCALE_CLIENT_ID> \
  --client-secret=<TAILSCALE_CLIENT_SECRET>

--ssh enables Tailscale SSH, allowing SSH access to the server using Tailscale credentials. The Tailscale hostname (my-server.your-tailnet.ts.net) is stable even if the server IP changes.

From any device enrolled in the Tailscale network:

ssh admin@my-server.your-tailnet.ts.net

This eliminates the need for public SSH key management, firewall IP exceptions, or a self-managed VPN gateway. Tailscale handles NAT traversal automatically, establishing a peer-to-peer encrypted connection regardless of network topology.

SSH Hardening Recap

Even though Tailscale VPN is the primary admin path, SSH is still hardened as a defense-in-depth measure:

From server/hetzner.tfpl:

PasswordAuthentication no    → SSH keys only, passwords rejected
MaxAuthTries 6               → Disconnect after 6 failed attempts
MaxSessions 3                → Limit concurrent sessions
X11Forwarding no             → Disable graphical forwarding
ClientAliveInterval 300      → Disconnect idle sessions after 5 min
LoginGraceTime 30            → Disconnect if auth not completed in 30s

And fail2ban:

bantime = 3600               → 1-hour bans
findtime = 600               → 10-minute window
maxretry = 3                 → 3 failures triggers ban
mode = aggressive            → Also catches scan patterns

Source IPs that fail authentication 3 times within a 10-minute window are banned for 1 hour. Combined with key-only authentication and SSH not being exposed to the public internet, the SSH attack surface is substantially reduced.

Summary: Security in Layers

Layer	What it protects against
Cloudflare DNS proxy	Hides server IP; DDoS mitigation at edge
Hetzner firewall	Blocks all non-HTTP/HTTPS traffic at network level
fail2ban	SSH brute-force banning
SSH key-only auth	Password-based SSH attacks
Tailscale VPN	Admin access without exposing SSH to internet
Traefik `exposedByDefault: false`	Accidental service exposure
CrowdSec LAPI	Known malicious IP blocking
CrowdSec AppSec	Application-layer attack filtering (SQLi, XSS, CVEs)
Docker secrets	Credentials as files, not environment variables
SOPS encryption	No plaintext secrets in Git