In Kubernetes environments, teams are constantly looking for ways to move faster without sacrificing security or efficiency. Managing multiple environments like development, testing, and staging often leads to cluster sprawl, higher costs, and complex maintenance. This is where virtual clusters come in.
Virtual clusters make it possible to create isolated, on-demand Kubernetes environments that share the same underlying infrastructure. They give developers the freedom to spin up their own clusters quickly for testing new features, running experiments, or deploying temporary workloads — all without waiting on cluster admins or consuming extra resources. Each virtual cluster runs its own control plane, offering stronger isolation and flexibility than namespace-based setups. We'll be using vCluster, an implementation of virtual clusters by Loft, to illustrate the concept in practice.
Managing workloads across multiple virtual clusters is a common pattern in multi-tenant environments. However, while virtual clusters make isolation easy, moving workloads across them is not straightforward. That's where Velero comes in — it is a powerful Kubernetes backup tool that migrates workloads from one virtual cluster to another.
In this blog post, we'll understand the importance of backups, how Velero works, and walk you through a practical migration of resources using Velero — from backing up one virtual cluster to restoring it in another.
What is Velero?
Velero is an open source tool to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a cloud provider or on-premises.
Velero lets you:
- Take backups of your cluster and restore in case of loss
- Migrate cluster resources to other clusters
- Replicate your production cluster to development and testing clusters
Velero consists of:
-
Velero CLI
- Runs on your local machine.
- Used to create, schedule, and manage backups and restores.
-
Kubernetes API Server
- Receives backup requests from the Velero CLI.
- Stores Velero custom resources (like
Backup) in etcd.
-
Velero Server (BackupController)
- Runs inside the Kubernetes cluster.
- Watches the Kubernetes API for Velero backup requests.
- Collects Kubernetes resource data and triggers backups.
-
Cloud Provider / Object Storage
- Stores backup data and metadata.
- Creates volume snapshots using the cloud provider's API (e.g., Azure Disk Snapshots).
How it works:
- User runs a Velero backup command using the CLI:
velero backup create my-backup - CLI creates a backup request in Kubernetes
- The Velero server detects the request and gathers cluster resources
- Backup data is uploaded to cloud object storage
- Persistent volumes are backed up using cloud snapshots (if enabled)
Velero supports a variety of storage providers for different backup and snapshot operations. In this blog post, we will focus on the Azure provider.
What is vCluster?
vCluster enables building virtual clusters — a certified Kubernetes distribution that runs as isolated, virtual environments within a physical host cluster. They enhance isolation and flexibility in multi-tenant Kubernetes setups. Multiple teams can work independently on shared infrastructure, helping minimize conflicts, increase team autonomy, and reduce infrastructure costs.
A virtual cluster:
- Runs inside a namespace of the host cluster
- Has an API server, control plane, and syncer
- Maintains its own set of Kubernetes resources, operating like a full cluster
Why Backup and Migrate Workloads Using vCluster?
Common reasons to back up or migrate workloads between vClusters:
- Promoting apps from dev to staging or prod: Backing up and restoring workloads between vClusters allows smooth promotion of applications across environments, ensuring consistent configurations and deployments without manual rework.
- Replicating test environments: It helps recreate identical test setups quickly, enabling developers to reproduce issues, validate fixes, or test new features in isolated environments.
- Disaster recovery (DR) setup: Regular backups across vClusters ensure business continuity by allowing workloads to be restored rapidly in another cluster if the primary one fails.
- Tenant migration in multi-tenant environments: vClusters make it easier to move tenants between isolated environments without affecting others, maintaining data security and minimizing downtime.
- Cluster version upgrades or deprecations: When upgrading or decommissioning a cluster, backing up workloads to another vCluster ensures a seamless transition without losing data or configurations.
Why Use Velero with vCluster?
Virtual clusters built with vCluster are lightweight and isolated, but they don't provide built-in mechanisms for backing up workloads, restoring them, or moving applications between clusters. Without a backup solution, recovery and migration can be risky.
Using Velero with vCluster fills this gap by enabling simple backup, restore, and migration workflows directly inside virtual clusters. It allows you to move applications between clusters with minimal setup and perform migrations with little to no downtime, especially for stateless workloads.
How to Backup and Migrate Workloads Between vClusters
Let's see how to use Velero to back up workloads from one vCluster and restore them into another. Think of it as moving your app from dev to staging across two clusters running on two different Azure clusters.
Prerequisites
Before starting, make sure you have the following:
- Two clusters up and running on Azure (any cloud offering works)
- Two running vClusters (source and destination)
- Velero CLI installed on your machine
Step-by-step Guide
In the source vCluster and destination vCluster, we will install Velero with the same configuration, deploy a sample MySQL Pod, take its backup at source, and restore it in the destination vCluster. We will be using the Azure provider to run Velero.
To set up Velero on Azure, you have to:
- Create an Azure storage account and blob container
- Get the resource group details
- Set permissions for Velero
Velero needs access to your Azure storage account to upload and retrieve backups. You'll need to assign the "Storage Blob Data Contributor" role (or equivalent) to the identity or service principal Velero uses, ensuring it can read, write, and manage backup data in the blob container.
1. Create Azure Resources
Create a resource group:
AZURE_RESOURCE_GROUP=<YOUR_RESOURCE_GROUP>
az group create --name $AZURE_RESOURCE_GROUP --location <YOUR_LOCATION>
Create the storage account:
AZURE_STORAGE_ACCOUNT=<YOUR_STORAGE_ACCOUNT>
az storage account create \
--name $AZURE_STORAGE_ACCOUNT \
--resource-group $AZURE_RESOURCE_GROUP \
--sku Standard_GRS \
--encryption-services blob \
--https-only true \
--kind BlobStorage \
--access-tier Hot
Create a blob container:
BLOB_CONTAINER=velero
az storage container create \
--name $BLOB_CONTAINER \
--public-access off \
--account-name $AZURE_STORAGE_ACCOUNT
2. Create a Service Principal with Contributor Privileges
AZURE_SUBSCRIPTION_ID=$(az account list --query '[?isDefault].id' -o tsv)
AZURE_TENANT_ID=$(az account list --query '[?isDefault].tenantId' -o tsv)
az ad sp create-for-rbac \
--name "velero" \
--role "Contributor" \
--scopes /subscriptions/$AZURE_SUBSCRIPTION_ID \
--query '{clientId: appId, clientSecret: password, tenantId: tenant}'
This outputs clientId, clientSecret, subscriptionId, and tenantId. Store these values.
Get the Client ID and store it in a variable:
AZURE_CLIENT_ID=$(az ad sp list --display-name "velero" --query '[0].appId' -o tsv)
Assign additional permissions to the Client ID:
az role assignment create \
--assignee $AZURE_CLIENT_ID \
--role "Storage Blob Data Contributor" \
--scope /subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AZURE_RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/$AZURE_STORAGE_ACCOUNT
3. Prepare Credentials
With the output received above, create bsl-creds and cloud-creds for the Velero setup.
- BSL (Backup Storage Location) — the blob container where Velero stores backups. Velero needs a secret to access this storage location.
- cloud-creds — credentials required to access the Azure cluster.
You will need the following values:
AZURE_SUBSCRIPTION_ID=<YOUR_SUBSCRIPTION_ID>
AZURE_TENANT_ID=<YOUR_TENANT_ID>
AZURE_CLIENT_ID=<YOUR_CLIENT_ID>
AZURE_CLIENT_SECRET=<YOUR_CLIENT_SECRET>
AZURE_RESOURCE_GROUP=<YOUR_RESOURCE_GROUP>
AZURE_CLOUD_NAME=AzurePublicCloud
AZURE_ENVIRONMENT=AzurePublicCloud
4. Log in to vCluster and Create Velero Namespace
kubectl create namespace velero
5. Create BSL and Cloud Credentials
bsl-creds.yaml:
apiVersion: v1
kind: Secret
metadata:
name: bsl-creds
namespace: velero
type: Opaque
data:
cloud: <BASE64_ENCODED_VALUE>
# Encode the following as base64:
# [default]
# storageAccount: <YOUR_STORAGE_ACCOUNT>
# storageAccountKey: <YOUR_STORAGE_ACCOUNT_KEY>
# subscriptionId: <YOUR_SUBSCRIPTION_ID>
# resourceGroup: <YOUR_RESOURCE_GROUP>
cloud-creds.yaml:
apiVersion: v1
kind: Secret
metadata:
name: cloud-creds
namespace: velero
type: Opaque
data:
cloud: <BASE64_ENCODED_VALUE>
# Encode the following as base64:
# AZURE_SUBSCRIPTION_ID=<YOUR_SUBSCRIPTION_ID>
# AZURE_TENANT_ID=<YOUR_TENANT_ID>
# AZURE_CLIENT_ID=<YOUR_CLIENT_ID>
# AZURE_CLIENT_SECRET=<YOUR_CLIENT_SECRET>
# AZURE_RESOURCE_GROUP=<YOUR_RESOURCE_GROUP>
# AZURE_CLOUD_NAME=AzurePublicCloud
Apply the secrets:
kubectl apply -f bsl-creds.yaml -n velero
kubectl apply -f cloud-creds.yaml -n velero
6. Install Velero Using Helm
Use the following values.yaml. Both the source and destination vClusters use the same file:
configuration:
backupStorageLocation:
- name: default
provider: azure
bucket: velero
config:
resourceGroup: <YOUR_RESOURCE_GROUP>
storageAccount: <YOUR_STORAGE_ACCOUNT>
subscriptionId: <YOUR_SUBSCRIPTION_ID>
credential:
name: bsl-creds
key: cloud
volumeSnapshotLocation:
- name: default
provider: azure
config:
resourceGroup: <YOUR_RESOURCE_GROUP>
subscriptionId: <YOUR_SUBSCRIPTION_ID>
credential:
name: cloud-creds
key: cloud
credentials:
useSecret: true
existingSecret: cloud-creds
deployNodeAgent: true
nodeAgent:
podVolumePath: /var/lib/kubelet/pods
privileged: true
Install the Helm chart:
helm install velero vmware-tanzu/velero \
--namespace velero \
-f values.yaml
Once installed, you will see velero and node-agent pods running in the velero namespace:
kubectl get pods -n velero
Repeat the same Velero installation steps in the destination vCluster.
Backup and Restore a Sample MySQL Pod
Deploy MySQL in Source vCluster
mysql-pod.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: mysql-pod
namespace: default
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
value: rootpassword
- name: MYSQL_DATABASE
value: testdb
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: mysql-pvc
Apply the manifest:
kubectl apply -f mysql-pod.yaml
Add Test Data
Exec into the pod:
kubectl exec -it mysql-pod -- /bin/bash
Run the following commands inside the pod to add test files:
echo "test data 1" > /var/lib/mysql/test1.txt
echo "test data 2" > /var/lib/mysql/test2.txt
This creates test1.txt and test2.txt.
Take a Backup
velero backup create mysql-backup \
--include-namespaces default \
--default-volumes-to-fs-backup \
--wait
Check backup status:
velero backup get
The backup status should show Completed.
Restore in Destination vCluster
Update values.yaml for Destination
Make sure the Velero config is the same as the source. Use the same values.yaml, but update these two parameters:
# Change these in values.yaml for destination cluster
configuration:
backupStorageLocation:
- name: default
# Keep all values the same as source — point to the same blob container
accessMode: ReadOnly # Destination reads from source's storage
After Velero is installed at the destination vCluster, verify you can see the source backups:
velero backup get
You will see the same backup list as the source vCluster.
Create a Restore
restore.yaml:
apiVersion: velero.io/v1
kind: Restore
metadata:
name: mysql-restore
namespace: velero
spec:
backupName: mysql-backup
includedNamespaces:
- default
restorePVs: true
itemOperationTimeout: 4h
Apply the restore:
kubectl apply -f restore.yaml -n velero
Check restore status:
velero restore get
velero restore describe mysql-restore --details
To verify the restore, attach the PVC (created after restore completes) to a pod, exec into it, and confirm the data (test1.txt and test2.txt) is present.
Troubleshooting Tips
Issue 1: Backup status is PartiallyFailed or FailedValidation
Solution: Describe the backup for details:
velero backup describe mysql-backup --details
Check the backup logs:
velero backup logs mysql-backup
If nothing useful appears, check the Velero pod logs:
kubectl logs -n velero deployment/velero | grep mysql-backup
After running the above three commands, you'll likely find the root cause. Common causes include permission issues or incorrect credentials. Sometimes partial failures occur because the node-agent pod isn't running on a node — in that case, manually schedule a pod on that node.
Issue 2: Node Agent Pod is Not Running
node-agent-xxxxx 0/1 Pending 0 5m
Solution: There is a node with no pods running on it, so the node-agent DaemonSet pod is also not scheduled. Manually schedule a sample pod on that node to trigger scheduling. Once a sample pod is running, the node-agent pod will also be scheduled and start running.
Issue 3: Restore Fails Without Specific Errors
Solution: Restart the restore process from scratch:
Delete all resources created by the restore job (pods, statefulsets, deployments, PVCs, etc.)
OR
If restoring a whole namespace, delete the entire restored namespace.Delete the restore job:
velero restore delete mysql-restore
- After the restore job is deleted, ArgoCD (if used) will automatically sync and recreate the restore job, triggering the Velero restoration.
Conclusion
Using Velero to back up and restore workloads across vClusters provides a robust and flexible approach for managing multi-tenant Kubernetes environments. Whether you're migrating applications between development and production, setting up disaster recovery, or replicating environments for testing, Velero simplifies the process significantly.
In this blog post, we explored how to back up and restore Kubernetes clusters using Velero. While the process is straightforward in principle, production environments can introduce added complexity — factors like cluster size, workloads, and configurations often make a difference.
Originally published at improving.com
Top comments (0)