Improving

Posted on Mar 18 • Originally published at improving.com

Backup and Restore Kubernetes Resources Across vCluster using Velero

#kubernetes #devops #tutorial

In Kubernetes environments, teams are constantly looking for ways to move faster without sacrificing security or efficiency. Managing multiple environments like development, testing, and staging often leads to cluster sprawl, higher costs, and complex maintenance. This is where virtual clusters come in.

Virtual clusters make it possible to create isolated, on-demand Kubernetes environments that share the same underlying infrastructure. They give developers the freedom to spin up their own clusters quickly for testing new features, running experiments, or deploying temporary workloads — all without waiting on cluster admins or consuming extra resources. Each virtual cluster runs its own control plane, offering stronger isolation and flexibility than namespace-based setups. We'll be using vCluster, an implementation of virtual clusters by Loft, to illustrate the concept in practice.

Managing workloads across multiple virtual clusters is a common pattern in multi-tenant environments. However, while virtual clusters make isolation easy, moving workloads across them is not straightforward. That's where Velero comes in — it is a powerful Kubernetes backup tool that migrates workloads from one virtual cluster to another.

In this blog post, we'll understand the importance of backups, how Velero works, and walk you through a practical migration of resources using Velero — from backing up one virtual cluster to restoring it in another.

What is Velero?

Velero is an open source tool to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a cloud provider or on-premises.

Velero lets you:

Take backups of your cluster and restore in case of loss
Migrate cluster resources to other clusters
Replicate your production cluster to development and testing clusters

Velero consists of:

Velero CLI
- Runs on your local machine.
- Used to create, schedule, and manage backups and restores.
Kubernetes API Server
- Receives backup requests from the Velero CLI.
- Stores Velero custom resources (like Backup) in etcd.
Velero Server (BackupController)
- Runs inside the Kubernetes cluster.
- Watches the Kubernetes API for Velero backup requests.
- Collects Kubernetes resource data and triggers backups.
Cloud Provider / Object Storage
- Stores backup data and metadata.
- Creates volume snapshots using the cloud provider's API (e.g., Azure Disk Snapshots).

How it works:

User runs a Velero backup command using the CLI: velero backup create my-backup
CLI creates a backup request in Kubernetes
The Velero server detects the request and gathers cluster resources
Backup data is uploaded to cloud object storage
Persistent volumes are backed up using cloud snapshots (if enabled)

Velero supports a variety of storage providers for different backup and snapshot operations. In this blog post, we will focus on the Azure provider.

What is vCluster?

vCluster enables building virtual clusters — a certified Kubernetes distribution that runs as isolated, virtual environments within a physical host cluster. They enhance isolation and flexibility in multi-tenant Kubernetes setups. Multiple teams can work independently on shared infrastructure, helping minimize conflicts, increase team autonomy, and reduce infrastructure costs.

A virtual cluster:

Runs inside a namespace of the host cluster
Has an API server, control plane, and syncer
Maintains its own set of Kubernetes resources, operating like a full cluster

Why Backup and Migrate Workloads Using vCluster?

Common reasons to back up or migrate workloads between vClusters:

Promoting apps from dev to staging or prod: Backing up and restoring workloads between vClusters allows smooth promotion of applications across environments, ensuring consistent configurations and deployments without manual rework.
Replicating test environments: It helps recreate identical test setups quickly, enabling developers to reproduce issues, validate fixes, or test new features in isolated environments.
Disaster recovery (DR) setup: Regular backups across vClusters ensure business continuity by allowing workloads to be restored rapidly in another cluster if the primary one fails.
Tenant migration in multi-tenant environments: vClusters make it easier to move tenants between isolated environments without affecting others, maintaining data security and minimizing downtime.
Cluster version upgrades or deprecations: When upgrading or decommissioning a cluster, backing up workloads to another vCluster ensures a seamless transition without losing data or configurations.

Why Use Velero with vCluster?

Virtual clusters built with vCluster are lightweight and isolated, but they don't provide built-in mechanisms for backing up workloads, restoring them, or moving applications between clusters. Without a backup solution, recovery and migration can be risky.

Using Velero with vCluster fills this gap by enabling simple backup, restore, and migration workflows directly inside virtual clusters. It allows you to move applications between clusters with minimal setup and perform migrations with little to no downtime, especially for stateless workloads.

How to Backup and Migrate Workloads Between vClusters

Let's see how to use Velero to back up workloads from one vCluster and restore them into another. Think of it as moving your app from dev to staging across two clusters running on two different Azure clusters.

Prerequisites

Before starting, make sure you have the following:

Two clusters up and running on Azure (any cloud offering works)
Two running vClusters (source and destination)
Velero CLI installed on your machine

Step-by-step Guide

In the source vCluster and destination vCluster, we will install Velero with the same configuration, deploy a sample MySQL Pod, take its backup at source, and restore it in the destination vCluster. We will be using the Azure provider to run Velero.

To set up Velero on Azure, you have to:

Create an Azure storage account and blob container
Get the resource group details
Set permissions for Velero

Velero needs access to your Azure storage account to upload and retrieve backups. You'll need to assign the "Storage Blob Data Contributor" role (or equivalent) to the identity or service principal Velero uses, ensuring it can read, write, and manage backup data in the blob container.

1. Create Azure Resources

Create a resource group:

AZURE_RESOURCE_GROUP=<YOUR_RESOURCE_GROUP>
az group create --name $AZURE_RESOURCE_GROUP --location <YOUR_LOCATION>

Create the storage account:

AZURE_STORAGE_ACCOUNT=<YOUR_STORAGE_ACCOUNT>
az storage account create \
  --name $AZURE_STORAGE_ACCOUNT \
  --resource-group $AZURE_RESOURCE_GROUP \
  --sku Standard_GRS \
  --encryption-services blob \
  --https-only true \
  --kind BlobStorage \
  --access-tier Hot

Create a blob container:

BLOB_CONTAINER=velero
az storage container create \
  --name $BLOB_CONTAINER \
  --public-access off \
  --account-name $AZURE_STORAGE_ACCOUNT

2. Create a Service Principal with Contributor Privileges

AZURE_SUBSCRIPTION_ID=$(az account list --query '[?isDefault].id' -o tsv)
AZURE_TENANT_ID=$(az account list --query '[?isDefault].tenantId' -o tsv)

az ad sp create-for-rbac \
  --name "velero" \
  --role "Contributor" \
  --scopes /subscriptions/$AZURE_SUBSCRIPTION_ID \
  --query '{clientId: appId, clientSecret: password, tenantId: tenant}'

This outputs clientId, clientSecret, subscriptionId, and tenantId. Store these values.

Get the Client ID and store it in a variable:

AZURE_CLIENT_ID=$(az ad sp list --display-name "velero" --query '[0].appId' -o tsv)

Assign additional permissions to the Client ID:

az role assignment create \
  --assignee $AZURE_CLIENT_ID \
  --role "Storage Blob Data Contributor" \
  --scope /subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AZURE_RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/$AZURE_STORAGE_ACCOUNT

3. Prepare Credentials

With the output received above, create bsl-creds and cloud-creds for the Velero setup.

BSL (Backup Storage Location) — the blob container where Velero stores backups. Velero needs a secret to access this storage location.
cloud-creds — credentials required to access the Azure cluster.

You will need the following values:

AZURE_SUBSCRIPTION_ID=<YOUR_SUBSCRIPTION_ID>
AZURE_TENANT_ID=<YOUR_TENANT_ID>
AZURE_CLIENT_ID=<YOUR_CLIENT_ID>
AZURE_CLIENT_SECRET=<YOUR_CLIENT_SECRET>
AZURE_RESOURCE_GROUP=<YOUR_RESOURCE_GROUP>
AZURE_CLOUD_NAME=AzurePublicCloud
AZURE_ENVIRONMENT=AzurePublicCloud

4. Log in to vCluster and Create Velero Namespace

kubectl create namespace velero

5. Create BSL and Cloud Credentials

bsl-creds.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: bsl-creds
  namespace: velero
type: Opaque
data:
  cloud: <BASE64_ENCODED_VALUE>
  # Encode the following as base64:
  # [default]
  # storageAccount: <YOUR_STORAGE_ACCOUNT>
  # storageAccountKey: <YOUR_STORAGE_ACCOUNT_KEY>
  # subscriptionId: <YOUR_SUBSCRIPTION_ID>
  # resourceGroup: <YOUR_RESOURCE_GROUP>

cloud-creds.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: cloud-creds
  namespace: velero
type: Opaque
data:
  cloud: <BASE64_ENCODED_VALUE>
  # Encode the following as base64:
  # AZURE_SUBSCRIPTION_ID=<YOUR_SUBSCRIPTION_ID>
  # AZURE_TENANT_ID=<YOUR_TENANT_ID>
  # AZURE_CLIENT_ID=<YOUR_CLIENT_ID>
  # AZURE_CLIENT_SECRET=<YOUR_CLIENT_SECRET>
  # AZURE_RESOURCE_GROUP=<YOUR_RESOURCE_GROUP>
  # AZURE_CLOUD_NAME=AzurePublicCloud

Apply the secrets:

kubectl apply -f bsl-creds.yaml -n velero
kubectl apply -f cloud-creds.yaml -n velero

6. Install Velero Using Helm

Use the following values.yaml. Both the source and destination vClusters use the same file:

configuration:
  backupStorageLocation:
    - name: default
      provider: azure
      bucket: velero
      config:
        resourceGroup: <YOUR_RESOURCE_GROUP>
        storageAccount: <YOUR_STORAGE_ACCOUNT>
        subscriptionId: <YOUR_SUBSCRIPTION_ID>
      credential:
        name: bsl-creds
        key: cloud

  volumeSnapshotLocation:
    - name: default
      provider: azure
      config:
        resourceGroup: <YOUR_RESOURCE_GROUP>
        subscriptionId: <YOUR_SUBSCRIPTION_ID>
      credential:
        name: cloud-creds
        key: cloud

credentials:
  useSecret: true
  existingSecret: cloud-creds

deployNodeAgent: true

nodeAgent:
  podVolumePath: /var/lib/kubelet/pods
  privileged: true

Install the Helm chart:

helm install velero vmware-tanzu/velero \
  --namespace velero \
  -f values.yaml

Once installed, you will see velero and node-agent pods running in the velero namespace:

kubectl get pods -n velero

Repeat the same Velero installation steps in the destination vCluster.

Backup and Restore a Sample MySQL Pod

Deploy MySQL in Source vCluster

mysql-pod.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: mysql-pod
  namespace: default
  labels:
    app: mysql
spec:
  containers:
    - name: mysql
      image: mysql:8.0
      env:
        - name: MYSQL_ROOT_PASSWORD
          value: rootpassword
        - name: MYSQL_DATABASE
          value: testdb
      volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
  volumes:
    - name: mysql-storage
      persistentVolumeClaim:
        claimName: mysql-pvc

Apply the manifest:

kubectl apply -f mysql-pod.yaml

Add Test Data

Exec into the pod:

kubectl exec -it mysql-pod -- /bin/bash

Run the following commands inside the pod to add test files:

echo "test data 1" > /var/lib/mysql/test1.txt
echo "test data 2" > /var/lib/mysql/test2.txt

This creates test1.txt and test2.txt.

Take a Backup

velero backup create mysql-backup \
  --include-namespaces default \
  --default-volumes-to-fs-backup \
  --wait

Check backup status:

velero backup get

The backup status should show Completed.

Restore in Destination vCluster

Update values.yaml for Destination

Make sure the Velero config is the same as the source. Use the same values.yaml, but update these two parameters:

# Change these in values.yaml for destination cluster
configuration:
  backupStorageLocation:
    - name: default
      # Keep all values the same as source — point to the same blob container
      accessMode: ReadOnly   # Destination reads from source's storage

After Velero is installed at the destination vCluster, verify you can see the source backups:

velero backup get

You will see the same backup list as the source vCluster.

Create a Restore

restore.yaml:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: mysql-restore
  namespace: velero
spec:
  backupName: mysql-backup
  includedNamespaces:
    - default
  restorePVs: true
  itemOperationTimeout: 4h

Apply the restore:

kubectl apply -f restore.yaml -n velero

Check restore status:

velero restore get
velero restore describe mysql-restore --details

To verify the restore, attach the PVC (created after restore completes) to a pod, exec into it, and confirm the data (test1.txt and test2.txt) is present.

Troubleshooting Tips

Issue 1: Backup status is `PartiallyFailed` or `FailedValidation`

Solution: Describe the backup for details:

velero backup describe mysql-backup --details

Check the backup logs:

velero backup logs mysql-backup

If nothing useful appears, check the Velero pod logs:

kubectl logs -n velero deployment/velero | grep mysql-backup

After running the above three commands, you'll likely find the root cause. Common causes include permission issues or incorrect credentials. Sometimes partial failures occur because the node-agent pod isn't running on a node — in that case, manually schedule a pod on that node.

Issue 2: Node Agent Pod is Not Running

node-agent-xxxxx   0/1   Pending   0   5m

Solution: There is a node with no pods running on it, so the node-agent DaemonSet pod is also not scheduled. Manually schedule a sample pod on that node to trigger scheduling. Once a sample pod is running, the node-agent pod will also be scheduled and start running.

Issue 3: Restore Fails Without Specific Errors

Solution: Restart the restore process from scratch:

Delete all resources created by the restore job (pods, statefulsets, deployments, PVCs, etc.)

OR

If restoring a whole namespace, delete the entire restored namespace.
Delete the restore job:

velero restore delete mysql-restore

After the restore job is deleted, ArgoCD (if used) will automatically sync and recreate the restore job, triggering the Velero restoration.

Conclusion

Using Velero to back up and restore workloads across vClusters provides a robust and flexible approach for managing multi-tenant Kubernetes environments. Whether you're migrating applications between development and production, setting up disaster recovery, or replicating environments for testing, Velero simplifies the process significantly.

In this blog post, we explored how to back up and restore Kubernetes clusters using Velero. While the process is straightforward in principle, production environments can introduce added complexity — factors like cluster size, workloads, and configurations often make a difference.

Originally published at improving.com

DEV Community

Backup and Restore Kubernetes Resources Across vCluster using Velero

What is Velero?

What is vCluster?

Why Backup and Migrate Workloads Using vCluster?

Why Use Velero with vCluster?

How to Backup and Migrate Workloads Between vClusters

Prerequisites

Step-by-step Guide

1. Create Azure Resources

2. Create a Service Principal with Contributor Privileges

3. Prepare Credentials

4. Log in to vCluster and Create Velero Namespace

5. Create BSL and Cloud Credentials

6. Install Velero Using Helm

Backup and Restore a Sample MySQL Pod

Deploy MySQL in Source vCluster

Add Test Data

Take a Backup

Restore in Destination vCluster

Update values.yaml for Destination

Create a Restore

Troubleshooting Tips

Issue 1: Backup status is `PartiallyFailed` or `FailedValidation`

Issue 2: Node Agent Pod is Not Running

Issue 3: Restore Fails Without Specific Errors

Conclusion

Top comments (0)

What is Velero?

What is vCluster?

Why Backup and Migrate Workloads Using vCluster?

Why Use Velero with vCluster?

How to Backup and Migrate Workloads Between vClusters

Prerequisites

Step-by-step Guide

1. Create Azure Resources

2. Create a Service Principal with Contributor Privileges

3. Prepare Credentials

4. Log in to vCluster and Create Velero Namespace

5. Create BSL and Cloud Credentials

6. Install Velero Using Helm

Backup and Restore a Sample MySQL Pod

Deploy MySQL in Source vCluster

Add Test Data

Take a Backup

Restore in Destination vCluster

Update values.yaml for Destination

Create a Restore

Troubleshooting Tips

Issue 1: Backup status is PartiallyFailed or FailedValidation

Issue 2: Node Agent Pod is Not Running

Issue 3: Restore Fails Without Specific Errors

Conclusion

Issue 1: Backup status is `PartiallyFailed` or `FailedValidation`