DEV Community

Cover image for A missing step: Backup Azure DevOps Repositories
Igor Bertnyk
Igor Bertnyk

Posted on

4 3

A missing step: Backup Azure DevOps Repositories

Table of content

To backup or not to backup?

Don't get me wrong, I like Azure DevOps. There are some frustrations here and there, for example in managing permissions and caching build resources. And each of Azure DevOps modules (Dashboards/Wiki, Boards, Repos, Pipelines, Test Plans, Artifacts) might not be THE BEST on a market. But integration and ease of use make it greater than sum of the parts, especially for small and medium-size projects.
Still, there is one thing that puzzles me. Backing up your Git repositories seems to me like common sense and a good practise. It also can be a policy in some companies. However there is no way to do it now either manually or on schedule. Of course, Microsoft is committed to keep the data safe, including periodic backups and geo-replication, but we do not have any control over it. And it does not prevent from unintentional or malicious actions leading to data loss.
Microsoft's response to such requests, and I quote: "In current Azure DevOps, there is no out of the box solution to this, you could backup your projects by downloading them as zip to save it on your local and then upload it to restore them. And you also could backup your work items by open them with Excel to save in your local machine."
I mean what, LOL. Excel as a backup tool is possibly a new high in data safety. Anyway, are there ways to twist control back into our hands?
Of course there are, and today we explore two of them.

Backup repository using plain old git bash script

One of the methods is to use a bash script to get a complete copy of the repository. Let's not run it from our laptop, but rather spin up a small VM in the cloud.

Plan of attack:

  • Create a cheap Linux virtual machine in Azure
  • Generate new SSH Key Pair
  • Add SSH Public key to Azure DevOps
  • Create bash script to mirror Git Repo
  • Execute that script on schedule

Not diving into too much details, but it is quite easy to create a Linux VM in Azure. It already comes with everything we need: Git and shell scripts. Then we can SSH into it and create a bash script, which I named "devopsbackup.sh".
A script is rather primitive, but it gets the job done. Essentially, it deletes a previous backup and creates a mirror copy of the Git repo. Don't forget to replace variables in angle brackets with your own values.

#!/bin/bash
error_exit()
{
        echo "${PROGNAME}: ${1:-"Unknown Error"}" 1>&2
        exit 1
}
#
echo "Executing Azure DevOps Repos backup"
cd /home/devopsadmin
rm -rf repos/
mkdir -p repos
cd repos/
git clone --mirror git@ssh.dev.azure.com:v3/<organization>/<project>/<repo> || error_exit "$LINENO: "

cd ..
exit 0
Enter fullscreen mode Exit fullscreen mode

Allow script execution:

chmod 0755 devopsbackup.sh

We also need to generate SSH key pair by using command

ssh-keygen -C "devopsbackup"

By default, keys will be generated in "~/.ssh" folder. We need to copy a public key "id_rsa.pub" from there and paste into Azure DevOps. Go to the profile settings on a top right and add a new key from there:
Azure DevOps SSH Key

We can easily create a scheduled execution for our script. Go ahead, type "crontab -e" in the command line and add something like this to the Cron config:

20 1 * * * /home/devopsadmin/bin/devopsbackup.sh >/dev/null 2>&1

Next step could be to extend this script using Azure CLI and upload this archive into Azure Blob Storage or Data Lake.
Alternatively, Azure also has a great feature that allows you to create a daily/weekly backup for your VM. So you can just store a snapshot of the whole VM and don't bother with Blob storage, if you like.

Backup default branch using Azure Devops API

That's all well and good, but is there some more modern way that does not require a dedicated VM and shell scripts/cron? Azure DevOps REST API seems to be promising and allows to manipulate Azure DevOps data, including work items and repositories. Unfortunately this API does not have a parity with Git and full code history cannot be preserved using this method.
However if all you require is a periodic snapshot of the master branch then it could be used to create a simple backup solution. One advantage over previous solution is that we can automatically retrieve information about all our projects and repos, and do not need to hardcode them. So if you add a new project, no modification is required.

Approach:

  • Use REST API to retrieve hierarchy of projects, repositories, items and blobs
  • Use Azure DevOps token (PAT) for the API authentication
  • Use Azure Function with timer trigger to run this on schedule
  • Use Azure Blob Storage to keep an archive.

Without further ado, here is a gist for Azure Function. It requires the following parameters that you can set up in Application Settings:
"storageAccountKey", "storageName", "token", "organization"

using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
using Microsoft.WindowsAzure.Storage;
using Newtonsoft.Json;
using RestSharp;
namespace AzureDevopsBackupFunction
{
public static class BackupFunction
{
private const string version = "api-version=5.1";
[FunctionName("BackupFunction")]
public static async Task Run([TimerTrigger("0 0 20 * * *")]TimerInfo myTimer, ILogger log) //, RunOnStartup = true
{
log.LogInformation($"DevOps BackupFunction function starting execution at: {DateTime.Now}");
// configure connections
string storageAccountKey = Environment.GetEnvironmentVariable("storageAccountKey", EnvironmentVariableTarget.Process);
string storageName = Environment.GetEnvironmentVariable("storageName", EnvironmentVariableTarget.Process);
string token = Environment.GetEnvironmentVariable("token", EnvironmentVariableTarget.Process);
string organization = Environment.GetEnvironmentVariable("organization", EnvironmentVariableTarget.Process);
string storageConnection = $"DefaultEndpointsProtocol=https;AccountName={storageName};AccountKey={storageAccountKey};EndpointSuffix=core.windows.net";
string devopsURL = $"https://dev.azure.com/{organization}/";
// make API request to get all projects
string auth = "Basic " + Convert.ToBase64String(System.Text.Encoding.ASCII.GetBytes(string.Format("{0}:{1}", "", token)));
var clientProjects = new RestClient($"{devopsURL}_apis/projects?{version}");
var requestProjects = new RestRequest(Method.GET);
requestProjects.AddHeader("Authorization", auth);
var responseProjects = clientProjects.Execute(requestProjects);
if(responseProjects.StatusCode != System.Net.HttpStatusCode.OK)
{
throw new Exception("API Request failed: " + responseProjects.StatusCode + " " + responseProjects.ErrorMessage);
}
Projects projects = JsonConvert.DeserializeObject<Projects>(responseProjects.Content);
// connect to Azure Storage
var storageAccount = CloudStorageAccount.Parse(storageConnection);
var client = storageAccount.CreateCloudBlobClient();
var container = client.GetContainerReference("devopsbackup");
await container.CreateIfNotExistsAsync();
foreach (Project project in projects.value)
{
log.LogInformation(project.name);
// get repositories
var clientRepos = new RestClient($"{devopsURL}{project.name}/_apis/git/repositories?{version}");
var requestRepos = new RestRequest(Method.GET);
requestRepos.AddHeader("Authorization", auth);
var responseRepos = clientRepos.Execute(requestRepos);
Repos repos = JsonConvert.DeserializeObject<Repos>(responseRepos.Content);
foreach (Repo repo in repos.value)
{
log.LogInformation("Repo: " + repo.name);
// get file mapping
var clientItems = new RestClient($"{devopsURL}_apis/git/repositories/{repo.id}/items?recursionlevel=full&{version}");
var requestItems = new RestRequest(Method.GET);
requestItems.AddHeader("Authorization", auth);
var responseItems = clientItems.Execute(requestItems);
Items items = JsonConvert.DeserializeObject<Items>(responseItems.Content);
log.LogInformation("Items count: " + items.count);
if (items.count > 0)
{
// get files as zip
var clientBlob = new RestClient($"{devopsURL}_apis/git/repositories/{repo.id}/blobs?{version}");
var requestBlob = new RestRequest(Method.POST);
requestBlob.AddJsonBody(items.value.Where(itm => itm.gitObjectType == "blob").Select(itm => itm.objectId).ToList());
requestBlob.AddHeader("Authorization", auth);
requestBlob.AddHeader("Accept", "application/zip");
var zipfile = clientBlob.DownloadData(requestBlob);
// upload blobs to Azure Storage
string name = $"{project.name}_{repo.name}_blob.zip";
var blob = container.GetBlockBlobReference(name);
await blob.DeleteIfExistsAsync();
await blob.UploadFromByteArrayAsync(zipfile, 0, zipfile.Length);
// upload file mapping
string namejson = $"{project.name}_{repo.name}_tree.json";
var blobjson = container.GetBlockBlobReference(name);
await blobjson.DeleteIfExistsAsync();
blobjson.Properties.ContentType = "application/json";
await blobjson.UploadTextAsync(responseItems.Content);
/* TODO:
* File mapping defines relationship between blob IDs and file names/paths.
* To reproduce a full file structure
* 1. Recreate all folders for <item.isFolder>
* 2. Extract all other items to <item.path>
*/
}
}
}
log.LogInformation($"DevOps BackupFunction function finished at: {DateTime.Now}");
}
}
struct Project
{
public string name;
}
struct Projects
{
public List<Project> value;
}
struct Repo
{
public string id;
public string name;
}
struct Repos
{
public List<Repo> value;
}
struct Item
{
public string objectId;
public string gitObjectType;
public string commitId;
public string path;
public bool isFolder;
public string url;
}
struct Items
{
public int count;
public List<Item> value;
}
}

Conclusion

Comparing these two approaches we can see that newer is not always better. With a help of a simple shell script we can produce a full copy of the repository that could be easily restored or imported into the new project. On the other side, if all you want is a periodic repo snapshot, Azure DevOps REST API and scheduled Azure Function can make those things effortless.

That is all for today, and remember that you always have to protect your work, like a cat protects its spoils from a dog on the image below.

A Cat Protecting Spoils from a Dog
Dirk Valckenburg, A Cat Protecting Spoils from a Dog, 1717
Cover image by Hebi B. from Pixabay

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more