DEV Community

Cover image for AKS DISASTER RECOVERY USING VELERO
Goodluck Ekeoma Adiole
Goodluck Ekeoma Adiole

Posted on

AKS DISASTER RECOVERY USING VELERO

This documentation teaches you Velero from the ground up and show you exactly how to install it and use it to back up an AKS cluster. I will assume you’re a beginner, so I created the documentation to explain what each step does and give you copy-paste commands. I’ll also point out important choices (CSI snapshots vs Restic) and recommend a simple path you can follow.

Notes before we start

  • These instructions show the common Azure/AKS flow: Velero stores backups in an Azure Storage account (blob container) and runs inside your AKS cluster. ([Velero][2])

1) Prerequisites (what you need first)

Make sure you have these on your laptop or a management machine:

  1. kubectl configured to talk to the AKS cluster (you should be able to run kubectl get nodes).
  2. Azure CLI (az) logged in as a user that can create resources and assign roles: az login.
  3. helm (optional — we’ll use Velero CLI approach, not helm).
  4. A shell (bash) and ability to create files locally.
  5. The Velero CLI binary (we’ll install it below).
  6. (Optional) jq helps with JSON parsing in examples.

If any of the above are missing, install them first. The Velero docs and AKS docs show these prerequisites. ([Velero][3])


2) High-level plan (what we’ll do)

  1. Install Velero CLI locally.
  2. Create an Azure Storage account + blob container to hold backups.
  3. Create Azure credentials for Velero (service principal or storage key).
  4. Install Velero server into AKS with the Azure plugin (and optionally Restic or CSI snapshot support).
  5. Make a test backup of a namespace / sample app.
  6. Restore from that backup.
  7. (Optional) Set a schedule and check/monitor backups.

Key docs (used to craft commands): Velero install docs, Velero Azure config, Velero backup/restore docs, and AKS guidance. ([Velero][3])


3) Install the Velero CLI (local machine)

Pick the latest Velero release from the Velero releases page, then download & install. Replace vX.Y.Z with the release version (e.g. v1.16.0 — check releases page). Example for Linux x86_64:

# example: change v1.16.0 to the version you choose
VERSION="v1.16.0"
curl -LO https://github.com/vmware-tanzu/velero/releases/download/${VERSION}/velero-${VERSION}-linux-amd64.tar.gz
tar -xvf velero-${VERSION}-linux-amd64.tar.gz
sudo mv velero-${VERSION}-linux-amd64/velero /usr/local/bin/velero
velero version
Enter fullscreen mode Exit fullscreen mode

On macOS you can use brew install velero. The official docs show the same options. ([Velero][3])


4) Create Azure storage (backup target)

Velero needs an object store. We’ll create a resource group, storage account, and a blob container.

Create a Storage Location for Backups

Velero needs a storage location — typically an Azure Blob Storage container.

Set environment variables:

export AZURE_RESOURCE_GROUP=<your-resource-group>
export AZURE_STORAGE_ACCOUNT_ID=<your-storage-account-name>
export BLOB_CONTAINER=<your-container-name>
export AZURE_SUBSCRIPTION_ID=<your-subscription-id>
export AZURE_TENANT_ID=<your-tenant-id>
export AZURE_CLIENT_ID=<your-app-client-id>
export AZURE_CLIENT_SECRET=<your-app-client-secret>
Enter fullscreen mode Exit fullscreen mode

(These credentials come from your Azure AD app with permissions to the storage account.)

# variables - change values to your naming
SUBSCRIPTION_ID=$(az account show --query id -o tsv)
RESOURCE_GROUP="velero-backups-rg"
LOCATION="eastus"            # choose your Azure region
STORAGE_ACCOUNT="velerobackups$RANDOM"  # must be globally unique, lower case, numbers
CONTAINER="velero"

# create resource group
az group create -n $RESOURCE_GROUP -l $LOCATION

# create storage account
az storage account create \
  --name $STORAGE_ACCOUNT \
  --resource-group $RESOURCE_GROUP \
  --sku Standard_LRS \
  --encryption-services blob \
  --https-only true

# create blob container
az storage container create \
  --account-name $STORAGE_ACCOUNT \
  --name $CONTAINER
Enter fullscreen mode Exit fullscreen mode

Velero requires the storage account and container to store backup tarballs and metadata. The Velero Azure plugin docs and Microsoft guidance explain this. ([GitHub][4])


5) Create Azure credentials for Velero (service principal recommended)

Velero needs permissions to write blobs and manage snapshots (if using CSI snapshots). We’ll create a Service Principal (SP) and give it the Storage Blob Data Contributor role on the storage account and (if you want snapshotting) appropriate snapshot permissions in the AKS infrastructure resource group (the MC_* resource group AKS created).

Create SP and give access to the storage account:

# create service principal (output will include appId, password, tenant)
AZURE_SP_NAME="velero-sp-$RANDOM"
az ad sp create-for-rbac --name $AZURE_SP_NAME --role "Contributor" --scopes "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP" -o json > velero-sp.json
Enter fullscreen mode Exit fullscreen mode

Note: using the "Contributor" role on the resource group is broad — for production you should scope further and create a custom role that grants only the needed permissions (blob write/list/read, snapshots). Many guides assign Storage Blob Data Contributor on storage account and Contributor or a custom role on the AKS resource group for snapshotting. ([DEV Community][5])

Create a file in the format Velero expects (an example credentials-velero):

# Example using service principal credentials
# Replace values from velero-sp.json
AZ_CLIENT_ID=$(jq -r .appId velero-sp.json)
AZ_CLIENT_SECRET=$(jq -r .password velero-sp.json)
AZ_TENANT_ID=$(jq -r .tenant velero-sp.json)

cat > credentials-velero <<EOF
AZURE_SUBSCRIPTION_ID=${SUBSCRIPTION_ID}
AZURE_TENANT_ID=${AZ_TENANT_ID}
AZURE_CLIENT_ID=${AZ_CLIENT_ID}
AZURE_CLIENT_SECRET=${AZ_CLIENT_SECRET}
AZURE_RESOURCE_GROUP=${RESOURCE_GROUP}
AZURE_CLOUD_NAME=AzurePublicCloud
EOF
Enter fullscreen mode Exit fullscreen mode

Velero can alternatively use a storage account key instead of SP; plugin docs show both options. ([GitHub][4])


6) Decide: CSI snapshots vs Restic

  • CSI Volume Snapshot: snapshot of the underlying cloud disk (fast restores). Requires the CSI snapshot driver and permissions; better for block volumes (e.g., Azure Disk). You must enable the EnableCSI feature flag when installing Velero. ([Velero][6])
  • Restic: Velero integrates restic to copy file-level data from pods to object storage. Simpler to enable (--use-restic) and works when snapshot support is not available, but it’s slower and requires restic to run as init/daemon pods. ([Velero][7])

For a beginner on AKS: if your AKS cluster has the Azure CSI drivers and supports snapshots, prefer CSI snapshots for PVs. If you’re unsure or want a simpler setup, enable Restic (it’s easier to get started). I’ll show the install command enabling Restic; I’ll also show how to enable CSI in case you want that.


7) Install Velero into AKS (server components)

Run this from your local machine (the velero CLI will push server components into the cluster). Replace placeholders:

# variables used in install
BUCKET="$CONTAINER"
BACKUP_RG="$RESOURCE_GROUP"
STORAGE_ACCOUNT="$STORAGE_ACCOUNT"
SECRET_FILE="./credentials-velero"

# Basic install with Azure plugin + restic
velero install \
  --provider azure \
  --plugins velero/velero-plugin-for-microsoft-azure:v1.11.1 \
  --bucket $BUCKET \
  --secret-file $SECRET_FILE \
  --backup-location-config resourceGroup=$BACKUP_RG,storageAccount=$STORAGE_ACCOUNT \
  --use-restic
Enter fullscreen mode Exit fullscreen mode

Notes & options

  • --plugins points to the velero Azure plugin image. Use the plugin version that matches Velero best (check release notes). ([GitHub][4])
  • To enable CSI snapshotting instead of (or in addition to) restic, add --features=EnableCSI and appropriate --snapshot-location-config values for Azure (specify the resourceGroup where snapshots belong). Example:
# Example install with CSI support (instead of restic)
velero install \
  --provider azure \
  --plugins velero/velero-plugin-for-microsoft-azure:v1.11.1 \
  --bucket $BUCKET \
  --secret-file $SECRET_FILE \
  --backup-location-config resourceGroup=$BACKUP_RG,storageAccount=$STORAGE_ACCOUNT \
  --snapshot-location-config resourceGroup=MC_myAksRG_myAksCluster_westeurope \
  --features=EnableCSI
Enter fullscreen mode Exit fullscreen mode
  • If you use both, Velero can snapshot volumes via CSI when possible and fall back to restic if needed.

The velero install guide and Azure config pages show these flags in detail. ([Velero][8])


8) Verify Velero server is running

kubectl get pods -n velero
kubectl get deployments -n velero
Enter fullscreen mode Exit fullscreen mode

You should see velero pods (and restic pods if you used --use-restic). If pods are CrashLooping, check logs:

kubectl logs deploy/velero -n velero
Enter fullscreen mode Exit fullscreen mode

If install succeeded, Velero is now watching your cluster and ready to back up resources.


9) Test: deploy a sample app (optional but recommended)

Deploy a small nginx deployment in namespace test-app:

kubectl create ns test-app
kubectl -n test-app apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80
EOF
Enter fullscreen mode Exit fullscreen mode

Wait until pod is Running:

kubectl -n test-app get pods -w
Enter fullscreen mode Exit fullscreen mode

10) Create a backup with Velero

Simple namespace backup (this will back up Kubernetes resources; if you used --use-restic or CSI snapshotting, PVs will be saved accordingly):

# create a one-off backup of namespace test-app
velero backup create test-app-backup --include-namespaces test-app --wait

# check backup status
velero backup describe test-app-backup --details
velero backup logs test-app-backup
Enter fullscreen mode Exit fullscreen mode

Important flags:

  • --snapshot-volumes can be added to ensure volumes are snapshotted (if CSI snapshotting is supported).
  • --wait makes the command wait until backup completes or fails.

Velero backup/restore docs explain scheduling, including/excluding resources, and common flags. ([Velero][9])


11) Simulate disaster & restore

Delete the namespace / app and restore:

# delete the namespace to simulate data loss
kubectl delete ns test-app

# restore from the backup
velero restore create --from-backup test-app-backup --wait

# watch the restore progress and logs
velero restore describe <restore-name> --details
velero restore logs <restore-name>
Enter fullscreen mode Exit fullscreen mode

Replace <restore-name> with the name Velero gave to the restore (or use the name you supplied). After restore completes, check resources:

kubectl get all -n test-app
Enter fullscreen mode Exit fullscreen mode

This demonstrates a full backup → delete → restore workflow. Velero restore docs explain API fields and options. ([Velero][10])


12) Useful Velero commands (cheat sheet)

  • velero backup create <name> --include-namespaces <ns> --wait — create backup.
  • velero backup get — list backups.
  • velero backup describe <name> — see details.
  • velero backup logs <name> — logs for that backup.
  • velero restore create --from-backup <name> --wait — restore backup.
  • velero restore logs <name> — restore logs.
  • velero schedule create <sched-name> --schedule "0 2 * * *" --include-namespaces <ns> — create scheduled backups.
  • kubectl -n velero get pods — see Velero pods. (Official docs contain many more options.) ([Velero][9])

13) Production considerations & tips

  • Credentials & least privilege: Create a minimal custom role for Velero rather than broad Contributor. Limit the scope to the storage account and the resource group that needs snapshot permissions. ([DEV Community][5])
  • Test restores regularly — a backup that hasn’t been restored is not proven.
  • Retention & lifecycle — configure retention policies and lifecycle on storage to avoid runaway costs.
  • Monitor & alerts — watch Velero backups and failures; integrate with monitoring/alerts.
  • Storage costs — backups in Azure Blob incur storage and egress costs; plan accordingly.
  • Snapshots vs restic — prefer CSI snapshots where possible (faster, cloud-native), but use restic when snapshots are unavailable or for file-level backups. ([Velero][6])

14) Troubleshooting quick hits

  • Velero pods CrashLoop? kubectl logs deploy/velero -n velero — common causes: bad credentials file, plugin image mismatch, or permission error.
  • Restic issues: ensure restic DaemonSet is present and pods run; check annotations to include/exclude volumes. ([Velero][7])
  • PVs not snapshotted: ensure CSI snapshot driver installed and EnableCSI feature is turned on in install. See CSI docs. ([Velero][6])

15) Short checklist you can copy & paste

  1. az login
  2. kubectl config current-context (should point to your AKS cluster)
  3. Create storage account & container (step 4 commands)
  4. Create SP & credentials file (step 5 commands)
  5. velero install ... --provider azure --plugins velero/velero-plugin-for-microsoft-azure:... --bucket <name> --secret-file ./credentials-velero --backup-location-config resourceGroup=<rg>,storageAccount=<sa> --use-restic
  6. kubectl get pods -n velero
  7. Deploy test app, velero backup create ... --wait
  8. velero restore create --from-backup ... --wait

Top comments (0)