DEV Community

Edin Husejnefendic
Edin Husejnefendic

Posted on

Stop Paying $150/Month for Managed Kubernetes  -  Run Your Own for $10

hetzner-k3s


Table of Contents


🔹Task

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  • Create the cheapest Kubernetes solution with the possibility to delete the Kubernetes cluster/instance and recreate it from scratch (as a DR solution).-
  • Implement Power On/Off for the environment.-
  • $0 cost for offline mode, except for storage space for S3 and volumes.

🔹Assumptions

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  • Use a Hetzner cloud instance as single Master/Worker
  • Use the integrated Load Balancer in Traefik Gateway
  • Use Kubernetes Gateway instead of Ingress
  • Recreate the Kubernetes environment from a Velero backup
  • Use your host file (/etc/hosts) as a DNS provider

🔹Flow

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Init or Restore

────────────────────────────────────────────

(1) Import Configuration
(2) Activate empty Hetzner Project
(3) Install Kubernetes k3scluster on single node as Master/Worker with hetzner-k3s
(4) Install Velero backup by HELM with AZURE plugin and AZURE credentials


Init

────────────────────────────────────────────
(5) Install Kubernetes CRD Gateway API
(6) Install Traefik
(7) Install kube-prometheus-stack with Helm
(8) Deploy nginx reverse proxy (/prometheus, /grafana)
(9a) Post steps: generate self-signed TLS cert → secret tls-traefik
(9b) Apply gateway-post.yml
(10) Print costs report (Hetzner servers + volumes)
Update your /etc/hosts


Delete

────────────────────────────────────────────
(1) Set all PVs to Retain
(2) Make Velero Backup
(3) Delete Hetzner Cloud instance


Restore

────────────────────────────────────────────
(5) Wait for Velero to sync backups from Azure storage
(6) Find latest completed backup
(7) Create Velero restore (excludes: kube-system, kube-public, kube-node-lease, velero
(8) Print costs report (Hetzner servers + volumes)
Update your /etc/hosts


Price

────────────────────────────────────────────
Print costs report (Hetzner servers + volumes)


🔹Info

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
INFO
For the purposes of this project, we need to have:

  • Active a Hetzner account
  • Installed the hcloud CLI
  • Installed the kubectl CLI
  • Installed the HELM CLI
  • Installed the Velero CLI
  • SSH key for server access
  • Active a Azure account for an S3 bucket
  • Installed the Azure CLI - az

ATTENTION

⚠️ For this occasion, we will be using an S3 bucket on the Azure platform, but keep in mind that you can use this setup for any S3-compatible storage.

⚠️ A list of supported providers for Velero can be found at https://velero.io/docs/main/supported-providers/

⚠️ Depending on your provider, adapt this installation according to your requirements.


🔹Install hcloud

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
INFO
ℹ️ GitHub releases: https://github.com/hetznercloud/cli/releases

Check if the current version exists

hcloud version
Enter fullscreen mode Exit fullscreen mode

Download to /tmp

wget $(curl -s https://api.github.com/repos/hetznercloud/cli/releases/latest \
       | jq -r '.assets[0].browser_download_url' \
       | sed 's%checksums.txt%hcloud-linux-amd64.tar.gz%g') \
  -P /tmp/
Enter fullscreen mode Exit fullscreen mode

Check tar.gz

tar -tvzf /tmp/hcloud-linux-amd64.tar.gz
Enter fullscreen mode Exit fullscreen mode
-rw-r--r-- runner/docker  1075 2025-01-16 18:15 LICENSE
-rw-r--r-- runner/docker  6811 2025-01-16 18:15 README.md
-rwxr-xr-x runner/docker 15311000 2025-01-16 18:19 hcloud
Enter fullscreen mode Exit fullscreen mode

Extract to /usr/local/bin/

sudo tar -xvzf /tmp/hcloud-linux-amd64.tar.gz \
         -C /usr/local/bin/ hcloud
sudo chmod +x /usr/local/bin/hcloud
Enter fullscreen mode Exit fullscreen mode

Check command

whereis hcloud
which   hcloud
Enter fullscreen mode Exit fullscreen mode
hcloud version
Enter fullscreen mode Exit fullscreen mode
hcloud 1.63.0
Enter fullscreen mode Exit fullscreen mode

Add to BASH autocomplete

vi ~/.bashrc
Enter fullscreen mode Exit fullscreen mode
# -- hcloud -------------------------------------------------------------------
source <(hcloud completion bash)
# ------------------------------------------------------------------------------
Enter fullscreen mode Exit fullscreen mode
source ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Test hcloud autocomplete

hcloud <TAB><TAB>
Enter fullscreen mode Exit fullscreen mode
all                 image               server-type
certificate         iso                 ssh-key
completion          load-balancer       storage-box
config              load-balancer-type  storage-box-type
context             location            version
datacenter          network             volume
firewall            placement-group     zone
floating-ip         primary-ip          
help                server
Enter fullscreen mode Exit fullscreen mode

🔹Install kubectl

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ℹ️ See https://kubernetes.io/docs/tasks/tools/


🔹Install HELM

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

INFO
Url:
ℹ️ Official Site: https://helm.sh/
ℹ️ How to Install: https://helm.sh/docs/intro/install/

Install

cd /tmp
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-4
chmod +x get_helm.sh
sudo ./get_helm.sh
Enter fullscreen mode Exit fullscreen mode
Helm v4.1.4 is available. Changing from version v3.19.4.
Downloading https://get.helm.sh/helm-v4.1.4-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
Enter fullscreen mode Exit fullscreen mode

Cleanup

sudo rm -fv /tmp/get_helm.sh
Enter fullscreen mode Exit fullscreen mode

Check version

/usr/local/bin/helm version --short
Enter fullscreen mode Exit fullscreen mode
v4.1.4+g05fa379
Enter fullscreen mode Exit fullscreen mode

Add $PATH and autocomplete to ~/.bashrc

vi ~/.bashrc
Enter fullscreen mode Exit fullscreen mode
# -- HELM ---------------------------------------------------------------------
export PATH="$PATH:/usr/local/bin"
source <(helm completion bash)
# -----------------------------------------------------------------------------
Enter fullscreen mode Exit fullscreen mode

Check $PATH and version

# -- LogOut
# -- LogIn
source ~/.bashrc
helm version --short
Enter fullscreen mode Exit fullscreen mode
v4.1.3+gc94d381
Enter fullscreen mode Exit fullscreen mode

🔹Install AZ (azure CLI) for Linux Mint/DEB

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ℹ️ Based on: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-linux?view=azure-cli-latest&pivots=apt

APT

sudo apt-get update
sudo apt-get install apt-transport-https \
                     ca-certificates curl \
                     gnupg \
                     lsb-release
Enter fullscreen mode Exit fullscreen mode

Microsoft signing key

sudo mkdir -p /etc/apt/keyrings
curl -sLS https://packages.microsoft.com/keys/microsoft.asc |
  gpg --dearmor | sudo tee /etc/apt/keyrings/microsoft.gpg > /dev/null
sudo chmod go+r /etc/apt/keyrings/microsoft.gpg
Enter fullscreen mode Exit fullscreen mode

Create repository

AZ_DIST=$(lsb_release -cs)
echo "Types: deb
URIs: https://packages.microsoft.com/repos/azure-cli/
Suites: ${AZ_DIST}
Components: main
Architectures: $(dpkg --print-architecture)
Signed-by: /etc/apt/keyrings/microsoft.gpg" | sudo tee /etc/apt/sources.list.d/azure-cli.sources
Enter fullscreen mode Exit fullscreen mode

APT

sudo apt-get update
sudo apt-get install azure-cli
Enter fullscreen mode Exit fullscreen mode

Upgrade

sudo apt-get update
sudo apt-get install --only-upgrade azure-cli
Enter fullscreen mode Exit fullscreen mode

Check version

az version
Enter fullscreen mode Exit fullscreen mode
{
  "azure-cli": "2.85.0",
  "azure-cli-core": "2.85.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": {
    "bastion": "1.4.3",
    "ssh": "2.0.6"
  }
Enter fullscreen mode Exit fullscreen mode

AZURE

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔹Set AZURE defaults

────────────────────────────────────────────
az login

az login
Enter fullscreen mode Exit fullscreen mode

Set default subscription

az account list --output table
Enter fullscreen mode Exit fullscreen mode
Name     CloudName    SubscriptionId                        TenantId                              State    IsDefault
-------  -----------  ------------------------------------  ------------------------------------  -------  -----------
default  AzureCloud   xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy  Enabled  True            AzureCloud   xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy  Enabled  True
Enter fullscreen mode Exit fullscreen mode

Set default subscription | by SubscriptionId

az account set --subscription "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Set default location | West Europe
az configure --list-defaults
az configure --defaults location=westeurope
az configure --list-defaults
Enter fullscreen mode Exit fullscreen mode

Create resource group

az group list -o table
az group create -n velero --location westeurope
az group list -o table
az group show --name velero --output table
Enter fullscreen mode Exit fullscreen mode
Location    Name
----------  ------
westeurope  velero
Enter fullscreen mode Exit fullscreen mode

🔹Create the storage

────────────────────────────────────────────
az login

az login
AZURE_SUBSCRIPTION_ID=$(az account list --all --query '[?isDefault].id' -o tsv)
AZURE_TENANT_ID=$(az account list --all --query '[?isDefault].tenantId' -o tsv)
AZURE_BACKUP_RESOURCE_GROUP=velero
BLOB_CONTAINER=edok3s
AZURE_STORAGE_ACCOUNT_ID="velero$(uuidgen | cut -d '-' -f5 | tr '[A-Z]' '[a-z]')"
Enter fullscreen mode Exit fullscreen mode

Create the storage account

az storage account create \
  --name $AZURE_STORAGE_ACCOUNT_ID \
  --resource-group $AZURE_BACKUP_RESOURCE_GROUP \
  --sku Standard_LRS \
  --encryption-services blob \
  --https-only true \
  --min-tls-version TLS1_2 \
  --kind BlobStorage \
  --access-tier Hot
Enter fullscreen mode Exit fullscreen mode
{
  "accessTier": "Hot",
  "accountMigrationInProgress": null,
  "allowBlobPublicAccess": false,
  "allowCrossTenantReplication": false,
  "allowSharedKeyAccess": null,
  "allowedCopyScope": null,
  "azureFilesIdentityBasedAuthentication": null,
  "blobRestoreStatus": null,
  "creationTime": "2026-05-03T19:30:06.161353+00:00",
  "customDomain": null,
  "defaultToOAuthAuthentication": null,
  "dnsEndpointType": null,
  "dualStackEndpointPreference": null,
  "enableExtendedGroups": null,
  "enableHttpsTrafficOnly": true,
  "enableNfsV3": null,
  "encryption": {
    "encryptionIdentity": null,
    "keySource": "Microsoft.Storage",
    "keyVaultProperties": null,
    "requireInfrastructureEncryption": null,
    "services": {
      "blob": {
        "enabled": true,
        "keyType": "Account",
        "lastEnabledTime": "2026-05-03T19:30:06.574215+00:00"
      },
      "file": {
        "enabled": true,
        "keyType": "Account",
        "lastEnabledTime": "2026-05-03T19:30:06.574215+00:00"
      },
      "queue": null,
      "table": null
    }
  },
  "extendedLocation": null,
  "failoverInProgress": null,
  "geoPriorityReplicationStatus": null,
  "geoReplicationStats": null,
  "id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/velero/providers/Microsoft.Storage/storageAccounts/velero44dfh567h5gh",
  "identity": null,
  "immutableStorageWithVersioning": null,
  "isHnsEnabled": null,
  "isLocalUserEnabled": null,
  "isSftpEnabled": null,
  "isSkuConversionBlocked": null,
  "keyCreationTime": {
    "key1": "2026-05-03T19:30:06.564067+00:00",
    "key2": "2026-05-03T19:30:06.564067+00:00"
  },
  "keyPolicy": null,
  "kind": "BlobStorage",
  "largeFileSharesState": null,
  "lastGeoFailoverTime": null,
  "location": "westeurope",
  "minimumTlsVersion": "TLS1_2",
  "name": "velero44dfh567h5gh",
  "networkRuleSet": {
    "bypass": "AzureServices",
    "defaultAction": "Allow",
    "ipRules": [],
    "ipv6Rules": [],
    "resourceAccessRules": null,
    "virtualNetworkRules": []
  },
  "placement": null,
  "primaryEndpoints": {
    "blob": "https://velero44dfh567h5gh.blob.core.windows.net/",
    "dfs": "https://velero44dfh567h5gh.dfs.core.windows.net/",
    "file": null,
    "internetEndpoints": null,
    "ipv6Endpoints": null,
    "microsoftEndpoints": null,
    "queue": null,
    "table": "https://velero44dfh567h5gh.table.core.windows.net/",
    "web": null
  },
  "primaryLocation": "westeurope",
  "privateEndpointConnections": [],
  "provisioningState": "Succeeded",
  "publicNetworkAccess": null,
  "resourceGroup": "velero",
  "routingPreference": null,
  "sasPolicy": null,
  "secondaryEndpoints": null,
  "secondaryLocation": null,
  "sku": {
    "name": "Standard_LRS",
    "tier": "Standard"
  },
  "statusOfPrimary": "available",
  "statusOfSecondary": null,
  "storageAccountSkuConversionStatus": null,
  "tags": {},
  "type": "Microsoft.Storage/storageAccounts",
  "zones": null
}
Enter fullscreen mode Exit fullscreen mode

List all storage

az storage account list --output table
Enter fullscreen mode Exit fullscreen mode
AccessTier    AllowBlobPublicAccess    AllowCrossTenantReplication    CreationTime                      EnableHttpsTrafficOnly    Kind         Location    MinimumTlsVersion    Name                PrimaryLocation    ProvisioningState    ResourceGroup    StatusOfPrimary
------------  -----------------------  -----------------------------  --------------------------------  ------------------------  -----------  ----------  -------------------  ------------------  -----------------  -------------------  ---------------  -----------------
Hot           False                    False                          2026-05-03T19:30:06.161353+00:00  True                      BlobStorage  westeurope  TLS1_2               velero44dfh567h5gh  westeurope         Succeeded            velero           available
Enter fullscreen mode Exit fullscreen mode

List storage

#  --name "$AZURE_STORAGE_ACCOUNT_ID" \
az storage account show \
  --name "velero44dfh567h5gh" \
  --resource-group "$AZURE_BACKUP_RESOURCE_GROUP" \
  --query "{name:name, location:location, sku:sku.name, kind:kind, accessTier:accessTier, httpsOnly:enableHttpsTrafficOnly, minTls:minimumTlsVersion}" \
  --output table
Enter fullscreen mode Exit fullscreen mode
Name                Location    Sku           Kind         AccessTier    HttpsOnly    MinTls
------------------  ----------  ------------  -----------  ------------  -----------  --------
velero44dfh567h5gh  westeurope  Standard_LRS  BlobStorage  Hot           True         TLS1_2
Enter fullscreen mode Exit fullscreen mode

🔹Create the blob container

────────────────────────────────────────────
Pull the storage account key once

ACCOUNT_KEY=$(az storage account keys list \
                 --account-name "$AZURE_STORAGE_ACCOUNT_ID" \
                 --resource-group "$AZURE_BACKUP_RESOURCE_GROUP" \
                 --query "[0].value" -o tsv)
echo ${ACCOUNT_KEY}
Enter fullscreen mode Exit fullscreen mode
bhsfgZJd-gkjae79kdgmkasfjk+fgghghflks56GJKJDSSA67jdd+htghlllsjhre46hgfdd2klj3i45fo45QW==
Enter fullscreen mode Exit fullscreen mode

Create the storage account

az storage container create \
  --account-name "$AZURE_STORAGE_ACCOUNT_ID" \
  --name "$BLOB_CONTAINER" \
  --public-access off \
  --account-key "$ACCOUNT_KEY"
Enter fullscreen mode Exit fullscreen mode

Verify

az storage container list \
  --account-name "$AZURE_STORAGE_ACCOUNT_ID" \
  --account-key "$ACCOUNT_KEY" \
  --output table
Enter fullscreen mode Exit fullscreen mode
Name    Lease Status    Last Modified
------  --------------  -------------------------
edok3s                  2026-05-03T19:56:01+00:00
Enter fullscreen mode Exit fullscreen mode

🔹Velero credentials file

────────────────────────────────────────────
Save the same key for Velero's use

AZURE_STORAGE_ACCOUNT_ACCESS_KEY="$ACCOUNT_KEY"
echo ${AZURE_STORAGE_ACCOUNT_ACCESS_KEY}
.credentials-velero
cat << EOF > ./.credentials-velero
AZURE_STORAGE_ACCOUNT_ACCESS_KEY=${AZURE_STORAGE_ACCOUNT_ACCESS_KEY}
AZURE_CLOUD_NAME=AzurePublicCloud
EOF
Enter fullscreen mode Exit fullscreen mode

Protect and list

chmod 600 -v ./.credentials-velero
ls -l
cat .credentials-velero
Enter fullscreen mode Exit fullscreen mode

Example: AZURE S3 bucket with Velero backups


🔹Velero CLI


🔹Install CLI

Install CLI

VELERO_VERSION=v1.18.0

# -- Download
cd /tmp
wget https://github.com/vmware-tanzu/velero/releases/download/${VELERO_VERSION}/velero-${VELERO_VERSION}-linux-amd64.tar.gz

# -- Install
ls -alh   velero-${VELERO_VERSION}-linux-amd64.tar.gz
tar -tvzf velero-${VELERO_VERSION}-linux-amd64.tar.gz
tar -xvzf velero-${VELERO_VERSION}-linux-amd64.tar.gz

sudo mv -v velero-${VELERO_VERSION}-linux-amd64/velero /usr/local/bin/


# -- Clean Up
rm -rf /tmp/velero-${VELERO_VERSION}-linux-amd64*
Enter fullscreen mode Exit fullscreen mode

Check version

velero version --client-only
Enter fullscreen mode Exit fullscreen mode
Client:
    Version: v1.18.0
    Git commit: 6adcf06b5b0e6fb93998d3e101e2cbdc134fa3c3
Enter fullscreen mode Exit fullscreen mode

🔹Install Velero on k3s

Path

cd ~/Documents/hetzner-k3s/edo
Enter fullscreen mode Exit fullscreen mode

Env. variables
See below .env file

Kubeconfig

source .env 
export KUBECONFIG="${HOME}/.kube/config-${CLUSTER_NAME}.yml"
Enter fullscreen mode Exit fullscreen mode

Check dependencies

echo "VER_PLUGIN_AZURE=$VER_PLUGIN_AZURE"
echo "AZURE_SUBSCRIPTION_ID=$AZURE_SUBSCRIPTION_ID"
echo "AZURE_TENANT_ID=$AZURE_TENANT_ID"
echo "AZURE_BACKUP_RESOURCE_GROUP=$AZURE_BACKUP_RESOURCE_GROUP"
echo "BLOB_CONTAINER=$BLOB_CONTAINER"
echo "AZURE_STORAGE_ACCOUNT_ID=$AZURE_STORAGE_ACCOUNT_ID"

test -f .credentials-velero && echo "File exists: .credentials-velero" || f_error "File .credentials-velero DOES NOT EXIST"
Enter fullscreen mode Exit fullscreen mode

Check installation

kubectl -n velero get all
Enter fullscreen mode Exit fullscreen mode

Check backupstoragelocation Available

kubectl -n velero get backupstoragelocation default -o jsonpath='{.status.phase}'

kubectl -n velero get backupstoragelocation default -o jsonpath='{.status.phase}' | grep -qx 'Available'
Enter fullscreen mode Exit fullscreen mode

Check Logs

kubectl -n velero logs deployment/velero
Enter fullscreen mode Exit fullscreen mode

🔹List Backup

echo "----- List Velero Backups and Describe Last Backup ----------------------------"
prev_count=0
for i in {1..30}; do
    curr_count=$(velero backup get -o json 2>/dev/null | jq '[.items[] | select(.status.phase == "Completed")] | length')
    if [[ "$curr_count" -gt 0 && "$curr_count" -eq "$prev_count" ]]; then
        break
    fi
    prev_count="$curr_count"
    echo "  Attempt $i: Found $curr_count backups, waiting for sync to settle..."
    sleep 5
done
velero backup get
VELERO_LAST_BACKUP_NAME=$(velero backup get -o json 2>/dev/null | \
    jq -r '[.items[] | select(.status.phase == "Completed")] | sort_by(.status.startTimestamp) | last | .metadata.name // empty')
echo ""

velero backup describe "${VELERO_LAST_BACKUP_NAME}"
# velero backup logs   "${VELERO_LAST_BACKUP_NAME}"
echo ""
Enter fullscreen mode Exit fullscreen mode

🔹File structure

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
File tree

├── cluster.yml                        # hetzner-k3s configuration file
├── .credentials-velero                # AZIRE credentials for S3
├── .env                               # Your configuration for init_cluster.sh
├── gateway-post.yml                   # k3s Gateway configuration
├── init_cluster.sh
├── kubeconfig                         # File create by hetzner-k3s  --config. Don't use it
├── nginx.yml                          # Setup nginx reverse proxy for /promethes and /grafana
├── values_kube-prometheus-stack.yml   # Prometheus/Grafana configuration
└── values_traefik-default.yml         # Traefik configuration
Enter fullscreen mode Exit fullscreen mode

🔹1. init_cluster.sh

#!/bin/bash

# -----------------------------------------------------------------------------
# FLOW:
#   - Check for required arguments
#   - Source .env config; define helper functions (f_error, f_az_login_check)
#     If OPTION is "init" or "restore":
#       - Create k3s cluster with hetzner-k3s
#       - Get kubeconfig from cluster master and update server address to use public IP
#       - Install Velero with Azure plugin
#     If OPTION is "init":
#       - Install Traefik (Gateway API mode)
#       - Install kube-prometheus-stack with Helm
#       - Deploy nginx reverse proxy (nginx.yml: Namespace, ConfigMap, Secret, Service, Deployment)
#       - Post steps: generate self-signed TLS cert → secret tls-traefik; apply gateway-post.yml
#     If OPTION is "delete":
#       - Set all PVs to Retain
#       - Create Velero backup (TTL 6 months)
#       - Delete all servers (with confirmation prompt)
#     If OPTION is "restore":
#       - Wait for Velero to sync backups from Azure storage
#       - Find latest completed backup
#       - Create Velero restore (excludes: kube-system, kube-public, kube-node-lease, velero)
#     If OPTION is "init" or"costs":
#       - Print costs report (Hetzner servers + load balancers + volumes)
#     Note: costs report also runs automatically at the end of "init"
# -----------------------------------------------------------------------------



# -- Check for required arguments ---------------------------------------------
# Cluster name
if [[ -z "${1:-}" ]]; then
    echo "Error: OPTION is required"   >&2
    echo "Usage: $0 <init|delete|restore|costs>" >&2
    exit 1
fi

# Power state must be 'on' or 'off'
if [[ "$1" != "init" && "$1" != "delete" && "$1" != "restore" && "$1" != "costs" ]]; then
    echo "Error: OPTION must be 'init', 'delete', 'restore' or 'costs', got '$1'" >&2
    exit 1
fi
# -----------------------------------------------------------------------------


# -- Include Configs ----------------------------------------------------------
CONFIG_FILE="$(dirname "$0")/.env"
[[ -f "$CONFIG_FILE" ]] || { echo "Error: $CONFIG_FILE not found"; exit 1; }
# shellcheck source=/dev/null
source "$CONFIG_FILE"
# -----------------------------------------------------------------------------


# ----- f_error msg -----------------------------------------------------------
f_error() {
  echo "*** Error: $1"
  exit 1
}
# -----------------------------------------------------------------------------


# ----- f_az_login_check ------------------------------------------------------
f_az_login_check() {
    local current_subscription_id current_tenant_id
    current_subscription_id=$(az account show --query id -o tsv 2>/dev/null)
    current_tenant_id=$(az account show --query tenantId -o tsv 2>/dev/null)

    [[ -z "$current_subscription_id" ]] && f_error "Not logged into Azure. Run: az login"
    [[ "$current_subscription_id" != "$AZURE_SUBSCRIPTION_ID" ]] && \
        f_error "Azure subscription mismatch: current=$current_subscription_id, expected=$AZURE_SUBSCRIPTION_ID"

    [[ "$current_tenant_id" != "$AZURE_TENANT_ID" ]] && \
        f_error "Azure tenant mismatch: current=$current_tenant_id, expected=$AZURE_TENANT_ID"

    echo "Azure login verified: subscription=$current_subscription_id, tenant=$current_tenant_id"
}
# -----------------------------------------------------------------------------


if [[ "$1" == "init" || "$1" == "restore" ]]; then

    # -- Create k3s -----------------------------------------------------------
    hcloud context use edok3s || f_error "Failed to switch context to edok3s"
    echo "-----  Hetzner Context List ---------------------------------------------------"
    hcloud context list
    echo ""

    echo "----- Create Cluster ----------------------------------------------------------"
    hetzner-k3s create --config <(envsubst < cluster.yml) || f_error "Failed to create cluster with hetzner-k3s"
    echo ""
    # -------------------------------------------------------------------------

    # -- Kubeconf -------------------------------------------------------------
    IP_CLUSTER_MASTER=$(hcloud server describe "${CLUSTER_NAME}"-master1 -o format='{{.PublicNet.IPv4.IP}}')

    # Get kubeconfig from cluster master and update server address to use public IP
    ssh -p 8512 \
      -i ${SSH_KEY} \
      -o UserKnownHostsFile=/dev/null \
      -o StrictHostKeyChecking=no \
      root@${IP_CLUSTER_MASTER} \
      'cat /etc/rancher/k3s/k3s.yaml' \
        | sed "s%server: https://127.0.0.1:6443%server: https://${IP_CLUSTER_MASTER}:6443%g" \
        > "${HOME}/.kube/config-${CLUSTER_NAME}".yml


    echo "----- Kubeconfig --------------------------------------------------------------"
    cat ${HOME}/.kube/config-${CLUSTER_NAME}.yml
    export KUBECONFIG="${HOME}/.kube/config-${CLUSTER_NAME}.yml"
    echo ""

    echo "----- Test kubectl ------------------------------------------------------------"
    for i in {1..5}; do
      kubectl get all -A && break || { echo "Attempt $i/5 failed, retrying in 3s..."; sleep 3; }
    done            || f_error "Failed to get all resources in cluster with kubectl"
    kubectl version || f_error "Failed to connect to cluster with kubectl"


    # -- Set pool=master label for control-plane nodes
    kubectl get nodes -l node-role.kubernetes.io/control-plane \
            -o name | xargs -I{} kubectl label {} pool=master --overwrite || f_error "Failed to label control-plane nodes with pool=master"

    echo ""
    # -------------------------------------------------------------------------


    # -- VELERO Install -------------------------------------------------------
    echo "VER_PLUGIN_AZURE=$VER_PLUGIN_AZURE"
    echo "AZURE_SUBSCRIPTION_ID=$AZURE_SUBSCRIPTION_ID"
    echo "AZURE_TENANT_ID=$AZURE_TENANT_ID"
    echo "AZURE_BACKUP_RESOURCE_GROUP=$AZURE_BACKUP_RESOURCE_GROUP"
    echo "BLOB_CONTAINER=$BLOB_CONTAINER"
    echo "AZURE_STORAGE_ACCOUNT_ID=$AZURE_STORAGE_ACCOUNT_ID"

    test -f .credentials-velero && echo "File exists: .credentials-velero" || f_error "File .credentials-velero DOES NOT EXIST"


    velero install \
      --provider azure \
      --plugins velero/velero-plugin-for-microsoft-azure:"$VER_PLUGIN_AZURE" \
      --bucket "$BLOB_CONTAINER" \
      --secret-file .credentials-velero \
      --backup-location-config "storageAccount=$AZURE_STORAGE_ACCOUNT_ID,storageAccountKeyEnvVar=AZURE_STORAGE_ACCOUNT_ACCESS_KEY,subscriptionId=$AZURE_SUBSCRIPTION_ID" \
      --use-volume-snapshots=false \
      --wait || f_error "Failed to install Velero with Azure plugin"

    # -- Add nodeSelector for Velero
    kubectl patch deployment velero -n velero -p \
      '{"spec": {"template": {"spec": {"nodeSelector": {"pool": "master"}}}}}'

    kubectl -n velero get all

    if kubectl -n velero get backupstoragelocation default -o jsonpath='{.status.phase}' | grep -qx 'Available'; then
      echo "Velero backupstoragelocation Available"
    else
      kubectl -n velero logs deployment/velero
      f_error "Velero backupstoragelocation Not Available"
    fi

    # -------------------------------------------------------------------------
fi



###### OPTION: init ##################################################################################################
if [[ "$1" == "init" ]]; then
    # -- 01 FUNCTION: Traefik Install -----------------------------------------
    f_Traefik() {
        helm repo add traefik https://traefik.github.io/charts
        helm repo update

        # -- Gateway API CRDs (required; chart no longer ships them in future versions)
        kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml || f_error "Failed to install Gateway API CRDs"

        # -- Install Traefik with custom values
        #  --skip-crds \
        helm upgrade --install traefik traefik/traefik \
          -n traefik \
          --create-namespace \
          --wait --timeout 5m \
          -f <(envsubst < "$(dirname "$0")/values_traefik-default.yml") || f_error "Failed to install Traefik with Helm"


        echo "----- Check Traefik status ----------------------------------------------------"
        kubectl -n traefik get all
        echo "----- Check CRDs Gateway ------------------------------------------------------"
        kubectl get crd | grep gateway.networking.k8s.io
        echo "  ----- Check Gatewayclass ------------------------------------------------------"
        kubectl get gatewayclass
    }
    # -------------------------------------------------------------------------


    # -- 02 FUNCTION: PROMETHESU/GRAFANA stack Install ------------------------
    f_Prometheus_Stack() {
        helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
        helm repo update

        echo "----- List available versions of kube-prometheus-stack ------------------------"
        helm search repo prometheus-community/kube-prometheus-stack --versions  | head -n 20
        echo ""

        # -- Install kube-prometheus-stack with custom values
        helm upgrade --install \
          --version "${VER_PROMETHEUS_STACK}" \
          -n monitoring \
          --create-namespace \
          --wait --timeout 5m \
          kube-prometheus-stack prometheus-community/kube-prometheus-stack \
          -f <(envsubst < "$(dirname "$0")/values_kube-prometheus-stack.yml") || f_error "Failed to install kube-prometheus-stack with Helm"

        echo "----- Check kube-prometheus-stack status --------------------------------------"
        kubectl --namespace monitoring get all
    }
    # -------------------------------------------------------------------------


    # -- 03 FUNCTION: Nginx ---------------------------------------------------
    f_Nginx() {
        # envsubst with explicit var list — expands only ${PASSWORD_GUI}, leaves nginx $host/$remote_addr etc. untouched
        kubectl apply -f <(envsubst '${PASSWORD_GUI}' < "$(dirname "$0")/nginx.yml") || f_error "Failed to apply Nginx"

        echo "----- Check Nginx status ------------------------------------------------------"
        kubectl -n nginx get all
    }
    # -------------------------------------------------------------------------


    # -- FUNCTION: Post Restore Steps -----------------------------------------
    f_Post_Restore() {
        # -- Generate cert ----------------------
        echo "*** Generate TLS cert for Traefik with CN=${CLUSTER_NAME} and SAN DNS:${CLUSTER_NAME} ..."
        openssl req -x509 -nodes -days 3650 \
          -newkey rsa:2048 \
          -keyout tls.key -out tls.crt \
          -subj "/CN=${CLUSTER_NAME}" \
          -addext "subjectAltName=DNS:${CLUSTER_NAME}"

        # -- Create the secret Traefik expects
        kubectl create secret tls tls-traefik \
          -n traefik \
          --cert=tls.crt \
          --key=tls.key

        # -- Cleanup
        rm tls.key tls.crt
        # ---------------------------------------

        echo "*** Post restore steps for cluster ${CLUSTER_NAME} ..."
        kubectl apply -f <(envsubst < "$(dirname "$0")/gateway-post.yml")
    }
    # -------------------------------------------------------------------------

    f_Traefik
    f_Prometheus_Stack
    f_Nginx
    f_Post_Restore
fi



###### OPTION: delete ################################################################################################
if [[ "$1" == "delete" ]]; then
        export KUBECONFIG="${HOME}/.kube/config-${CLUSTER_NAME}.yml"

    # -- FUNCTION: Set All PV Retain -----------------------------------------
    f_Set_All_PV_Retain() {
        echo "*** Set all PVs to Retain for cluster ${CLUSTER_NAME} ..."
        kubectl get pv -o name | xargs -I{} kubectl patch {} \
            -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' || f_error "Failed to patch PVs"

        echo "----- Get PV ------------------------------------------------------------------"
        kubectl get pv
        echo ""
        kubectl get pv -o custom-columns=NAME:.metadata.name,RECLAIM:.spec.persistentVolumeReclaimPolicy,HETZNER:.spec.csi.volumeHandle
        echo ""
    }
    # -------------------------------------------------------------------------


    # -- FUNCTION: Velero Backup ----------------------------------------------
    f_Velero_Backup() {
        echo "*** Create Velero backup for cluster ${CLUSTER_NAME} ..."
        # 6 months
        velero backup create "${CLUSTER_NAME}-backup-$(date +%Y%m%d-%H%M)" \
            --include-cluster-resources=true \
            --include-namespaces "*" \
        --ttl 4380h --wait || f_error "Failed to create Velero backup for cluster ${CLUSTER_NAME}"

        echo "----- List Velero Backups and Describe Last Backup ----------------------------"
        prev_count=0
        for i in {1..30}; do
            curr_count=$(velero backup get -o json 2>/dev/null | jq '[.items[] | select(.status.phase == "Completed")] | length')
            if [[ "$curr_count" -gt 0 && "$curr_count" -eq "$prev_count" ]]; then
                break
            fi
            prev_count="$curr_count"
            echo "  Attempt $i: Found $curr_count backups, waiting for sync to settle..."
            sleep 5
        done
        velero backup get
        VELERO_LAST_BACKUP_NAME=$(velero backup get -o json 2>/dev/null | \
            jq -r '[.items[] | select(.status.phase == "Completed")] | sort_by(.status.startTimestamp) | last | .metadata.name // empty')
        echo ""

        velero backup describe "${VELERO_LAST_BACKUP_NAME}"
        # velero backup logs   "${VELERO_LAST_BACKUP_NAME}"
        echo ""
    }
    # -------------------------------------------------------------------------


    # -- FUNCTION: Delete all servers in current project ----------------------
    f_Delete_Server() {
        local servers
        servers=$(hcloud server list -o noheader -o columns=name)

        if [[ -z "$servers" ]]; then
            echo "*** No servers found in current project"
            return
        fi

        echo "*** Servers to delete:"
        echo "$servers"
        echo ""
        read -r -p "Delete all servers listed above? [y/N] " reply
        [[ "${reply,,}" == "y" ]] || { echo "Aborted."; return; }

        while IFS= read -r server; do
            echo "*** Deleting server: $server ..."
            hcloud server delete "$server" || echo "Warning: failed to delete $server" &
        done <<< "$servers"
        wait
    }
    # -------------------------------------------------------------------------


    f_Set_All_PV_Retain
    f_Velero_Backup 
    f_Delete_Server            master1
  fi



###### OPTION: restore ###############################################################################################
if [[ "$1" == "restore" ]]; then
    echo "*** Waiting for Velero to discover backups from Azure storage ..."
    prev_count=0
    for i in {1..30}; do
        curr_count=$(velero backup get -o json 2>/dev/null | jq '[.items[] | select(.status.phase == "Completed")] | length')
        if [[ "$curr_count" -gt 0 && "$curr_count" -eq "$prev_count" ]]; then
            break
        fi
        prev_count="$curr_count"
        echo "  Attempt $i: Found $curr_count backups, waiting for sync to settle..."
        sleep 5
    done
    VELERO_LAST_BACKUP_NAME=$(velero backup get -o json 2>/dev/null | \
        jq -r '[.items[] | select(.status.phase == "Completed")] | sort_by(.status.startTimestamp) | last | .metadata.name // empty')
    [[ -z "$VELERO_LAST_BACKUP_NAME" ]] && f_error "No Velero backups found after waiting"
    echo "Found backup: ${VELERO_LAST_BACKUP_NAME}"
    echo ""

    echo "----- Get All Velero Backups --------------------------------------------------"
    velero get backup

    echo "*** Create Velero restore for cluster ${CLUSTER_NAME} from backup ${VELERO_LAST_BACKUP_NAME} ..."
    velero restore create "${CLUSTER_NAME}-restore-$(date +%Y%m%d-%H%M)" \
      --from-backup "$VELERO_LAST_BACKUP_NAME" \
      --include-cluster-resources=true \
      --exclude-namespaces kube-system,kube-public,kube-node-lease,velero \
      --wait || f_error "Failed to create Velero restore"
fi


###### OPTIN: costs ##################################################################################################
if [[ "$1" == "init" || "$1" == "costs" ]]; then
IP_CLUSTER_MASTER=$(hcloud server describe "${CLUSTER_NAME}"-master1 -o format='{{.PublicNet.IPv4.IP}}')
echo "Kubernetes $CLUSTER_NAME Master IP address: $IP_CLUSTER_MASTER"
echo ""

PRICE_PER_GB_MO_NET="0.044"

{
  printf "%-8s %-10s %-42s %-12s %5s %7s %-6s %10s %10s\n" \
    "TYPE" "ID" "NAME" "MODEL" "CPU" "RAM_GB" "SITE" "PRICE/h" "PRICE/MO"

  {
    jq -s -r '
      .[0] as $servers
      | .[1] as $types
      | $servers[] as $s
      | ($s.datacenter.location.name // $s.location.name) as $loc
      | ($types[] | select(.id == $s.server_type.id)) as $t
      | ($t.prices[] | select(.location == $loc)) as $p
      | [
          "SERVER",
          $s.id,
          $s.name,
          $t.name,
          $t.cores,
          $t.memory,
          $loc,
          $p.price_hourly.net,
          $p.price_monthly.net
        ]
      | @tsv
    ' \
      <(hcloud server list -o json) \
      <(hcloud server-type list -o json)

    jq -s -r '
      .[0] as $lbs
      | .[1] as $types
      | $lbs[] as $lb
      | ($lb.location.name) as $loc
      | ($types[] | select(.id == $lb.load_balancer_type.id)) as $t
      | ($t.prices[] | select(.location == $loc)) as $p
      | [
          "LB",
          $lb.id,
          $lb.name,
          $t.name,
          "-",
          "-",
          $loc,
          $p.price_hourly.net,
          $p.price_monthly.net
        ]
      | @tsv
    ' \
      <(hcloud load-balancer list -o json) \
      <(hcloud load-balancer-type list -o json)

    hcloud volume list -o json |
    jq -r '
      .[]
      | [
          "VOLUME",
          .id,
          .name,
          ((.size | tostring) + "GB"),
          "-",
          .size,
          (.location.name // "-"),
          "0",
          "0"
        ]
      | @tsv
    ' |
    LC_ALL=C awk -F '\t' -v price_gb_mo="$PRICE_PER_GB_MO_NET" '
      {
        size = $6 + 0
        price_mo = size * price_gb_mo
        price_h = price_mo / 730

        printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%.4f\t%.2f\n",
          $1, $2, $3, $4, $5, $6, $7, price_h, price_mo
      }
    '
  } |
  LC_ALL=C awk -F '\t' '
    {
      printf "%-8s %-10s %-42s %-12s %5s %7s %-6s %10.4f %10.2f\n",
        $1, $2, $3, $4, $5, $6, $7, $8 + 0, $9 + 0
    }
  '
}  
fi
Enter fullscreen mode Exit fullscreen mode

🔹2. .env

# -- Configuration ------------------------------------------------------------

export CLUSTER_NAME=edok3s                                               # Cluster name, Basic Auth Password "!${CLUSTER_NAME}!"
export SSH_KEY='/mnt/aaa/bbb/.ssh/id_ed25519_ccc'                  # SSH key access to cluster nodes, must be added to Hetzner project
export PASSWORD_GUI="$(htpasswd -nbBC 10 admin '!'${CLUSTER_NAME}'!')"   # Password for  Nginx Basic Auth


# az login
# AZURE_SUBSCRIPTION_ID=$(az account list --all --query '[?isDefault].id' -o tsv)
# AZURE_TENANT_ID=$(az account list --all --query '[?isDefault].tenantId' -o tsv)
# AZURE_BACKUP_RESOURCE_GROUP=velero
# BLOB_CONTAINER=edok3s
# AZURE_STORAGE_ACCOUNT_ID="velero$(uuidgen | cut -d '-' -f5 | tr '[A-Z]' '[a-z]')"

# Release: https://github.com/velero-io/velero-plugin-for-microsoft-azure/releases
export VER_PLUGIN_AZURE='v1.14.0'

export AZURE_SUBSCRIPTION_ID='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
export AZURE_TENANT_ID='yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy'
export AZURE_BACKUP_RESOURCE_GROUP='velero'
export BLOB_CONTAINER='edok3s'
export AZURE_STORAGE_ACCOUNT_ID='velero44dfh567h5gh'


export VER_PROMETHEUS_STACK='84.5.0'

# -----------------------------------------------------------------------------
Enter fullscreen mode Exit fullscreen mode

🔹3. cluster.yml

# Quick Start:       https://github.com/vitobotta/hetzner-k3s#quick-start
# Full cluster.yaml: https://vitobotta.github.io/hetzner-k3s/Creating_a_cluster/

# hetzner-k3s create --config cluster.yml

hetzner_token: gfghxdhdgh01OwjffjjwugQ0G6kkhkheo01d4huI7T8p7Px1kcLpmgV3gnkGX965430FLDu58wqUe3VAs
cluster_name: ${CLUSTER_NAME}                                                  # Add name same as Hetzner project name
kubeconfig_path: "./kubeconfig"
k3s_version: v1.35.3+k3s1                                                      # https://github.com/k3s-io/k3s/releases | https://docs.k3s.io/

networking:
  ssh:
    port: 8512
    use_agent: false                                                           # set to true if your key has a passphrase
    use_private_ip: false                                                      # set to true to connect to nodes via their private IPs
    public_key_path: "${SSH_KEY}.pub"
    private_key_path: "${SSH_KEY}"
  allowed_networks:
    ssh:
      - 78.79.224.0/19                                                         # My1
      - 1.2.3.4/32                                                       # My2
      # - 0.0.0.0
    api: # this will firewall port 6443 on the nodes
      - 78.79.224.0/19                                                         # My1
      - 1.2.3.4/32                                                        # MY2
      # - 0.0.0.0
    # OPTIONAL: define extra inbound/outbound firewall rules.
    # Each entry supports the following keys:
    #   description (string, optional)
    #   direction   (in | out, default: in)
    #   protocol    (tcp | udp | icmp | esp | gre, default: tcp)
    #   port        (single port "80", port range "30000-32767", or "any") – only relevant for tcp/udp
    #   source_ips  (array of CIDR blocks) – required when direction is in
    #   destination_ips (array of CIDR blocks) – required when direction is out
    #
    # IMPORTANT: Outbound traffic is allowed by default (implicit allow-all).
    # If you add **any** outbound rule (direction: out), Hetzner Cloud switches
    # the outbound chain to an implicit **deny-all**; only traffic matching your
    # outbound rules will be permitted. Define outbound rules carefully to avoid
    # accidentally blocking required egress (DNS, updates, etc.).
    # NOTE: Hetzner Cloud Firewalls support **max 50 entries per firewall**. The built-
    # in rules (SSH, ICMP, node-port ranges, etc.) use ~10 slots. If the sum of the
    # default rules plus your custom ones exceeds 50, hetzner-k3s will abort with
    # an error.
    custom_firewall_rules:
      - description: "Allow MY own IP"
        direction: in
        protocol: tcp
        port: "443"
        source_ips:
        - 78.79.224.0/19
        - 1.2.3.4/32
      # -- List Cloudflare public Ipv4 addresses
      # curl -s https://www.cloudflare.com/ips-v4 \
      #   | sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4 \
      #   | awk 'BEGIN {
      #          print "      - description: \"Allow Cloudflare HTTPS IPv4\""
      #          print "        direction: in"
      #          print "        protocol: tcp"
      #          print "        port: \"443\""
      #          print "        source_ips:"
      #     } /^[0-9]/ { print "        - " $0 }'
      - description: "Allow Cloudflare HTTPS IPv4"
        direction: in
        protocol: tcp
        port: "443"
        source_ips:
        - 103.21.244.0/22
        - 103.22.200.0/22
        - 103.31.4.0/22
        - 104.16.0.0/13
        - 104.24.0.0/14
        - 108.162.192.0/18
        - 131.0.72.0/22
        - 141.101.64.0/18
        - 162.158.0.0/15
        - 172.64.0.0/13
        - 173.245.48.0/20
        - 188.114.96.0/20
        - 190.93.240.0/20
        - 197.234.240.0/22
        - 198.41.128.0/17
    # - description: "Allow HTTP from any IPv4"
    #   direction: in
    #   protocol: tcp
    #   port: 80
    #   source_ips:
    #     - 0.0.0.0/0
    #   - description: "UDP game servers (outbound)"
    #     direction: out
    #     protocol: udp
    #     port: 60000-60100
    #     destination_ips:
    #       - 203.0.113.0/24
  node_port_firewall_enabled: false                                             # optional: set false to disable NodePort firewall rules (TCP/UDP)
  # node_port_range: "30000-32767"                                              # optional: NodePort range to open on firewalls (TCP/UDP)
  public_network:
    ipv4: true
    ipv6: false
    # hetzner_ips_query_server_url: https://.. # for large clusters, see https://github.com/vitobotta/hetzner-k3s/blob/main/docs/Recommendations.md
    # use_local_firewall: false                # for large clusters, see https://github.com/vitobotta/hetzner-k3s/blob/main/docs/Recommendations.md
  private_network:
    enabled: true
    subnet: 10.0.0.0/16
    existing_network_name: ""
  cni:
    enabled: true
    encryption: false
    mode: flannel
    cilium:
      # Optional: specify a path to a custom values file for Cilium Helm chart
      # When specified, this file will be used instead of the default values
      # helm_values_path: "./cilium-values.yaml"
      # chart_version: "v1.17.2"

  # cluster_cidr: 10.244.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for pod IPs
  # service_cidr: 10.43.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for service IPs. Warning, if you change this, you should also change cluster_dns!
  # cluster_dns: 10.43.0.10 # optional: IPv4 Cluster IP for coredns service. Needs to be an address from the service_cidr range

datastore:
  mode: etcd # etcd (default) or external
  external_datastore_endpoint: postgres://....
#  etcd:
#    # etcd snapshot configuration (optional)
#    snapshot_retention: 24
#    snapshot_schedule_cron: "0 * * * *"
#
#    # S3 snapshot configuration (optional)
#    s3_enabled: false
#    s3_endpoint: "" # Can also be set with ETCD_S3_ENDPOINT environment variable
#    s3_region: "" # Can also be set with ETCD_S3_REGION environment variable
#    s3_bucket: "" # Can also be set with ETCD_S3_BUCKET environment variable
#    s3_access_key: "" # Can also be set with ETCD_S3_ACCESS_KEY environment variable
#    s3_secret_key: "" # Can also be set with ETCD_S3_SECRET_KEY environment variable
#    s3_folder: ""
#    s3_force_path_style: false

schedule_workloads_on_masters: true                                            # set to true to allow pods to be scheduled on master nodes (useful for small clusters) | Single instance cluster

image: ubuntu-24.04                                                            # optional: default is ubuntu-24.04 | hcloud image list | awk 'NR==1{print; next} {print | "sort -k3,3"}'
# autoscaling_image: 103908130                                                 # optional, defaults to the `image` setting
# snapshot_os: microos                                                         # optional: specified the os type when using a custom snapshot

masters_pool:
  # cpx22 shared 2cpu/4GB/80GB
  # cpx32 shared 4cpu/16GB/160GB
  instance_type: cpx22                                                         # hcloud server-type list  | grep -E "ID|shared" | grep -v arm
  instance_count: 1                                                            # for HA; you can also create a single master cluster for dev and testing (not recommended for production)
  locations:                                                                   # You can choose a single location for single master clusters or if you prefer to have all masters in the same location. For regional clusters (which are only available in the eu-central network zone), each master needs to be placed in a separate location.
    - fsn1
    # - hel1
    # - nbg1
  image: ubuntu-24.04

worker_node_pools: []                                                          # Single instance Cluster

# worker_node_pools:
# - name: default
#   # hcloud server-type list | grep -v arm | grep fsn1
#   instance_type: cpx22
#   instance_count: 2
#   location: fsn1
#   image: ubuntu-24.04
#   labels: # Kubernetes labels to apply to nodes in this pool (for node selection in workloads)
#     - key: pool
#       value: default
#   # taints: # Kubernetes taints to apply to nodes in this pool (to repel pods unless they tolerate the taint)
#   #   - key: something
#   #     value: value1:NoSchedule
# # - name: medium-autoscaled
# #   instance_type: cpx32
# #   location: fsn1
# #   autoscaling:
# #     enabled: true
# #     min_instances: 0
# #     max_instances: 3

addons:
#   csi_driver:
#     enabled: true   # Hetzner CSI driver (default true). Set to false to skip installation.
#     manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.20.2/deploy/kubernetes/hcloud-csi.yml"
#   traefik:
#     enabled: false  # built-in Traefik ingress controller. Disabled by default.
  servicelb:
    enabled: true                                                             # built-in ServiceLB. Disabled by default.
  metrics_server:
    enabled: true                                                             # Kubernetes metrics-server addon. Disabled by default.
#   cluster_autoscaler:
#     enabled: true                                                           # Cluster Autoscaler addon (default true). Set to false to omit autoscaling.
#     manifest_url: "https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/hetzner/examples/cluster-autoscaler-run-on-master.yaml"
#     container_image_tag: "v1.35.0"
#     scan_interval: "10s"                                                    # How often cluster is reevaluated for scale up or down
#     scale_down_delay_after_add: "10m"                                       # How long after scale up that scale down evaluation resumes
#     scale_down_delay_after_delete: "10s"                                    # How long after node deletion that scale down evaluation resumes
#     scale_down_delay_after_failure: "3m"                                    # How long after scale down failure that scale down evaluation resumes
#     max_node_provision_time: "15m"                                          # Maximum time CA waits for node to be provisioned
#     cloud_controller_manager:
#       enabled: true                                                         # Hetzner Cloud Controller Manager (default true). Disabling stops automatic LB provisioning for Service objects.
#     manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.30.1/ccm-networks.yaml"
#   system_upgrade_controller:
#     enabled: true                                                           # System Upgrade Controller (default true). Set to false to omit autoscaling.
#     deployment_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.19.2/system-upgrade-controller.yaml"
#     crd_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.19.2/crd.yaml"
#   embedded_registry_mirror:
#     enabled: false # Enables fast p2p distribution of container images between nodes for faster pod startup. Check if your k3s version is compatible before enabling this option. You can find more information at https://docs.k3s.io/installation/registry-mirror

protect_against_deletion: false                                               # prevents accidental deletion of the cluster with the "hetzner-k3s delete" command

create_load_balancer_for_the_kubernetes_api: true                             # creates a load balancer for HA API access; note: Hetzner firewalls can't yet restrict access to load balancers by IP

k3s_upgrade_concurrency: 1                                                    # how many nodes to upgrade at the same time; increase for faster upgrades in large clusters, but higher values may impact availability

# additional_packages:
# - somepackage

# additional_pre_k3s_commands:
# - apt update
# - apt upgrade -y

# additional_post_k3s_commands:
# - apt autoremove -y
# For more advanced usage like resizing the root partition for use with Rook Ceph, see [Resizing root partition with additional post k3s commands](./Resizing_root_partition_with_post_create_commands.md)

# kube_api_server_args:
# - arg1
# - ...
# kube_scheduler_args:
# - arg1
# - ...
# kube_controller_manager_args:
# - arg1
# - ...
# kube_cloud_controller_manager_args:
# - arg1
# - ...
# kubelet_args:
# - arg1
# - ...
# kube_proxy_args:
# - arg1
# - ...
# api_server_hostname: k8s.example.com # optional: DNS for the k8s API LoadBalancer. After the script has run, create a DNS record with the address of the API LoadBalancer.
Enter fullscreen mode Exit fullscreen mode

🔹4. values_traefik-default.yml

# Git:            https://github.com/traefik/traefik-helm-chart
# Default vaules: https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml
#

namespaceOverride: traefik

nodeSelector:
  pool: master

providers:
  kubernetesCRD:
    # kubectl get crd | grep gateway.networking.k8s.io
    # kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml
    enabled: true
  kubernetesIngress:
    enabled: false
  kubernetesGateway:
    enabled: true

gateway:
  enabled: true

service:
  enabled: true
  type: LoadBalancer

ports:
  web:
    port: 8000                        # default: 8000 — internal container port
    expose:
      default: true
    exposedPort: 80                   # external LB port
    protocol: TCP
    # # -- Enable for Hezner Cloud Load Balancer with TCP protocol --------------
    # proxyProtocol:
    #   trustedIPs:
    #     - "10.0.0.0/8"               # Hetzner private network range
  websecure:
    port: 8443                        # default: 8443 — internal container port
    expose:
      default: true
    exposedPort: 443                  # external LB port
    protocol: TCP
    # # -- Enable for Hezner Cloud Load Balancer with TCP protocol --------------
    # proxyProtocol:
    #   trustedIPs:
    #     - "10.0.0.0/8"               # Hetzner private network range
Enter fullscreen mode Exit fullscreen mode

🔹5. values_kube-prometheus-stack.yml

# values_kube-prometheus-stack
# kube-prometheus-stack configuration for hetzner-k3s
# Apply with: helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring -f values_kube-prometheus-stack

# ============================================================================
# K3S-SPECIFIC FIXES
# ============================================================================
# k3s runs all control-plane components in a single binary and exposes
# their metrics through the kubelet endpoint, not on separate ports.
# Disable the scrapers and alerts that assume vanilla Kubernetes.

kubeControllerManager:
  enabled: false

kubeScheduler:
  enabled: false

kubeProxy:
  enabled: false

kubeEtcd:
  enabled: false

defaultRules:
  create: true
  rules:
    etcd: false
    kubeProxy: false
    kubeSchedulerAlerting: false
    kubeSchedulerRecording: false
    kubeControllerManager: false

# ============================================================================
# PROMETHEUS
# ============================================================================
prometheus:
  prometheusSpec:
    nodeSelector:
      pool: master
    externalUrl: https://${CLUSTER_NAME}/prometheus
    routePrefix: /prometheus

    # How long to keep metrics
    retention: 7d
    retentionSize: "18GB"   # leave some headroom in the 20Gi PVC

    # Persistent storage on Hetzner block storage
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: hcloud-volumes
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi

    # Resource requests/limits — sized for a small dev cluster
    resources:
      requests:
        cpu: 200m
        memory: 512Mi
      limits:
        memory: 2Gi

    # Pick up ServiceMonitor / PodMonitor / PrometheusRule resources from any
    # namespace, not just those with the chart's release label.
    # This is what you want for a real cluster — apps in other namespaces
    # can declare their own scrape configs.
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    probeSelectorNilUsesHelmValues: false

    # Single replica is fine for dev; bump to 2 for HA
    replicas: 1

# ============================================================================
# ALERTMANAGER
# ============================================================================
alertmanager:
  alertmanagerSpec:
    nodeSelector:
      pool: master
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: hcloud-volumes
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi

    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        memory: 128Mi

    replicas: 1

  # Default alert routing — all alerts go to the "null" receiver (silenced).
  # Replace with Slack/PagerDuty/email config when you want real alerts.
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ["namespace", "alertname"]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: "null"
      routes:
        - matchers:
            - alertname = "Watchdog"
          receiver: "null"
    receivers:
      - name: "null"

# ============================================================================
# GRAFANA
# ============================================================================
grafana:
  enabled: true
  nodeSelector:
    pool: master

  # CHANGE THIS PASSWORD before applying
  adminPassword: "${PASSWORD_GUI}"

  grafana.ini:
    server:
      root_url: "https://${CLUSTER_NAME}/grafana"
      serve_from_sub_path: true
    auth.anonymous:
      enabled: true
      org_role: Admin

  persistence:
    enabled: true
    type: pvc
    storageClassName: hcloud-volumes
    accessModes: ["ReadWriteOnce"]
    size: 5Gi

  replicas: 1

  resources:
    requests:
      cpu: 250m
      memory: 256Mi
    limits:
      memory: 1024Mi

  # Default service is ClusterIP — use port-forward or expose via Gateway/Ingress
  service:
    type: ClusterIP
    port: 80

  # Pre-loaded Grafana dashboards from the chart
  defaultDashboardsEnabled: true
  defaultDashboardsTimezone: utc

  # Helpful Grafana plugins (optional — comment out if you want lean install)
  plugins: []
  # plugins:
  #   - grafana-piechart-panel
  #   - grafana-clock-panel

  # Sidecar that auto-loads dashboards from ConfigMaps with a label.
  # Lets you ship dashboards as Kubernetes manifests later.
  sidecar:
    dashboards:
      enabled: true
      label: grafana_dashboard
      labelValue: "1"
      searchNamespace: ALL
      provider:
        allowUiUpdates: true
    datasources:
      enabled: true
      label: grafana_datasource
      labelValue: "1"
      searchNamespace: ALL

# ============================================================================
# PROMETHEUS OPERATOR
# ============================================================================
prometheusOperator:
  enabled: true
  nodeSelector:
    pool: master
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      memory: 256Mi

  # Admission webhooks — keep enabled, they validate alert/rule syntax
  admissionWebhooks:
    enabled: true
    patch:
      enabled: true

# ============================================================================
# NODE EXPORTER (host-level metrics)
# ============================================================================
prometheus-node-exporter:
  enabled: true
  resources:
    requests:
      cpu: 50m
      memory: 32Mi
    limits:
      memory: 64Mi

# ============================================================================
# KUBE-STATE-METRICS (Kubernetes object metrics)
# ============================================================================
kube-state-metrics:
  enabled: true
  nodeSelector:
    pool: master
  resources:
    requests:
      cpu: 50m
      memory: 64Mi
    limits:
      memory: 128Mi

# ============================================================================
# KUBELET — keep enabled, this is where k3s exposes most metrics
# ============================================================================
kubelet:
  enabled: true
  serviceMonitor:
    metricRelabelings: []

# ============================================================================
# COREDNS — k3s ships with CoreDNS, scrape it
# ============================================================================
coreDns:
  enabled: true

# ============================================================================
# KUBE API SERVER — exposed on port 6443 in k3s, scraping works fine
# ============================================================================
kubeApiServer:
  enabled: true

# ============================================================================
# CRDs — let Helm manage them
# ============================================================================
crds:
  enabled: true

# ============================================================================
# CLEANUP JOB — removes leftover resources on uninstall
# ============================================================================
cleanPrometheusOperatorObjectNames: false

# ============================================================================
# COMMON LABELS — applied to all resources for easier filtering
# ============================================================================
commonLabels:
  environment: dev
  cluster: ${CLUSTER_NAME}

Enter fullscreen mode Exit fullscreen mode

🔹6. gateway-post.yml

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: traefik-gateway
  namespace: traefik
spec:
  gatewayClassName: traefik
  listeners:
    - name: web
      protocol: HTTP
      port: 8000
      allowedRoutes:
        namespaces:
          from: All
    - name: websecure
      protocol: HTTPS
      port: 8443
      allowedRoutes:
        namespaces:
          from: All
      tls:
        mode: Terminate
        certificateRefs:
          - name: tls-traefik
            namespace: traefik

---
apiVersion: v1
kind: Namespace
metadata:
  name: nginx

---
# -- HTTPRoute /grafana + /prometheus  →  nginx reverse proxy
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: nginx
  namespace: nginx
spec:
  parentRefs:
    - name: traefik-gateway
      namespace: traefik
  hostnames:
    - edok3s
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /grafana
      backendRefs:
        - name: nginx
          port: 80
    - matches:
        - path:
            type: PathPrefix
            value: /prometheus
      backendRefs:
        - name: nginx
          port: 80
Enter fullscreen mode Exit fullscreen mode

🔹How to Run

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Make sure that you are logged to AZURE and configured S3 bucket

az login
Enter fullscreen mode Exit fullscreen mode

2. Activated Hetzner project

hcloud context list
hcloud config list
Enter fullscreen mode Exit fullscreen mode

🔹1st Run - init

Hetzner server list before 'init'
Hetzner server list before 'init'


Hetzner volume list before
Hetzner volume list before <init>


1. Configure cluster.yml

  • IP addresses / Firewall
  • Hetzner Instance Type Model

2. Configure .env

  • Cluster name
  • SSH key
  • AZURE
  • Prometheus stack version
  • etc

3. Run

./init_cluster.sh init
Enter fullscreen mode Exit fullscreen mode

4. Sanity check

source .env 
export KUBECONFIG="${HOME}/.kube/config-${CLUSTER_NAME}.yml"

kubectl get deployment -A
Enter fullscreen mode Exit fullscreen mode

Everything must be 1/1, except for cluster-autoscaler

NAMESPACE        NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
kube-system      cluster-autoscaler                         0/1     1            0           23m
kube-system      coredns                                    1/1     1            1           28m
kube-system      hcloud-cloud-controller-manager            1/1     1            1           23m
kube-system      hcloud-csi-controller                      1/1     1            1           23m
kube-system      metrics-server                             1/1     1            1           28m
monitoring       kube-prometheus-stack-grafana              1/1     1            1           22m
monitoring       kube-prometheus-stack-kube-state-metrics   1/1     1            1           22m
monitoring       kube-prometheus-stack-operator             1/1     1            1           22m
nginx            nginx                                      1/1     1            1           22m
system-upgrade   system-upgrade-controller                  1/1     1            1           23m
traefik          traefik                                    1/1     1            1           22m
velero           velero                                     1/1     1            1           23m
Enter fullscreen mode Exit fullscreen mode

Successful installation with IP address and current costs[€]:

Kubernetes edok3s Master IP address: 178.105.69.141

TYPE     ID         NAME                                       MODEL          CPU  RAM_GB SITE      PRICE/h   PRICE/MO
SERVER   130758742  edok3s-master1                             cpx22            2       4 fsn1       0.0128       7.99
VOLUME   105698396  pvc-f0278cda-ba7a-422c-859f-054004afda6a   10GB             -      10 fsn1       0.0006       0.44
VOLUME   105698397  pvc-4880c913-33be-40a0-a2c2-9ebb5951ad08   10GB             -      10 fsn1       0.0006       0.44
VOLUME   105698398  pvc-30aa4c80-9c6e-40f8-99d9-5f824cc21a1a   20GB             -      20 fsn1       0.0012       0.88
Enter fullscreen mode Exit fullscreen mode

Hetzner server list after 'init'
Hetzner server list after 'init'


Hetzner volume list after 'init'
Hetzner volume list after 'init'


Access

  • Password: !CLUSTER_NAME!

🔹Delete (Power Off)

1. Run

./init_cluster.sh delete
Enter fullscreen mode Exit fullscreen mode

Hetzner server list after 'delete'
Hetzner server list after 'delete'


Hetzner volume list after 'delete'
Hetzner volume list after 'delete'


🔹Restore (Power On)

1. Run

./init_cluster.sh restore
Enter fullscreen mode Exit fullscreen mode

Done

NAME                          STATUS      ERRORS   WARNINGS   CREATED                          EXPIRES   STORAGE LOCATION   QUEUE POSITION   SELECTOR
09:28:41 +0200 CEST   182d      default                             <none>
edok3s-backup-20260513-0943   Completed   0        0          2026-05-13 09:43:33 +0200 CEST   182d      default                             <none>
edok3s-backup-20260513-1019   Completed   0        0          2026-05-13 10:19:53 +0200 CEST   182d      default                             <none>
edok3s-backup-20260513-1107   Completed   0        0          2026-05-13 11:07:37 +0200 CEST   182d      default                             <none>
edok3s-backup-20260513-1311   Completed   0        0          2026-05-13 13:11:18 +0200 CEST   182d      default                             <none>
*** Create Velero restore for cluster edok3s from backup edok3s-backup-20260513-1311 ...
Restore request "edok3s-restore-20260513-1316" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
......................
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe edok3s-restore-20260513-1316` and `velero restore logs edok3s-restore-20260513-1316`.

Enter fullscreen mode Exit fullscreen mode

Hetzner server list after 'restore'
Hetzner server list after 'restore'

Hetzner volume list after 'restore'
Hetzner volume list after 'restore'


🔹Costs

1. Run

./init_cluster.sh cost
Enter fullscreen mode Exit fullscreen mode

In EUROS [€]

Kubernetes edok3s Master IP address: 178.105.69.141

TYPE     ID         NAME                                       MODEL          CPU  RAM_GB SITE      PRICE/h   PRICE/MO
SERVER   130781232  edok3s-master1                             cpx22            2       4 fsn1       0.0128       7.99
VOLUME   105698396  pvc-f0278cda-ba7a-422c-859f-054004afda6a   10GB             -      10 fsn1       0.0006       0.44
VOLUME   105698397  pvc-4880c913-33be-40a0-a2c2-9ebb5951ad08   10GB             -      10 fsn1       0.0006       0.44
VOLUME   105698398  pvc-30aa4c80-9c6e-40f8-99d9-5f824cc21a1a   20GB             -      20 fsn1       0.0012       0.88
Enter fullscreen mode Exit fullscreen mode

Top comments (0)