DEV Community

Cover image for Backing Up GitHub Repositories to Amazon S3 (What Nobody Warns You About)
Phaustin Karani
Phaustin Karani

Posted on • Edited on

Backing Up GitHub Repositories to Amazon S3 (What Nobody Warns You About)

I didn’t start backing up my GitHub repositories because I distrusted GitHub.

I started because I realized something uncomfortable: GitHub had become a single point of failure for work I actually cared about.

Between long-lived projects, experiments I might want years later, and repositories that quietly became important, I didn’t like the idea that a deleted repo, a locked account, or a bad force-push could wipe everything out.

I wanted an off-platform, boring, automated backup.

Amazon S3 fit that mental model perfectly:

  • Independent of GitHub
  • Cheap
  • Extremely durable
  • Built for long-term storage

What sounded simple turned out to be very easy to get wrong.
This article documents the approach that finally worked — including the mistakes.

What this article covers

This guide shows how to:

  • Back up multiple GitHub repositories
  • Run backups weekly
  • Preserve full Git history (branches + tags)
  • Avoid AWS access keys
  • Use OIDC + temporary credentials
  • Store backups safely in Amazon S3

This is not a ZIP download tutorial.
This is a real backup.

High-level architecture (correct model)

Architecture flow

  1. GitHub Actions runs on a schedule
  2. GitHub issues an OIDC identity token
  3. AWS STS validates the token
  4. AWS issues temporary credentials
  5. The workflow uploads backups to S3

No IAM users.
No static secrets.
Nothing long-lived.

Why git bundle (and not ZIP files)

ZIP files look tempting — until you need to restore.

ZIP backups:

  • ❌ Lose commit history
  • ❌ Drop branches and tags
  • ❌ Are painful to restore correctly

A git bundle is different. It contains:

  • All commits
  • All branches
  • All tags
  • In a single portable file

Creating a bundle

git bundle create repo-backup.bundle --all
Enter fullscreen mode Exit fullscreen mode

If your backup can’t restore history, it’s not a backup.

The IAM problem that caused most of the pain

The hardest part wasn’t GitHub Actions.
It was AWS permissions.

The confusing part

AWS uses two different policy types:

Policy type Used for Requires Principal
IAM role policy Identity permissions ❌ No
S3 bucket policy Resource permissions ✅ Yes

They look similar.
They behave very differently.

Why “invalid principal” kept appearing

At one point, everything looked correct — but AWS kept returning:

Invalid principal

The reason:

  • An IAM policy was pasted into an S3 bucket policy
  • Or the principal ARN didn’t match the actual role

The rule that finally made it click

  • IAM role policies never define a Principal
  • S3 bucket policies must define who is allowed access

S3 authorization model (the missing mental model)

This diagram explains the core issue that caused most confusion.

Key idea

An S3 upload succeeds only if BOTH are true:

  1. The IAM role policy allows the action
  2. The S3 bucket policy allows the same role

If either side denies it → AccessDenied


The GitHub Actions workflow (clean and boring)

Once the security model was clear, the workflow itself became simple.

name: Weekly S3 Repo Backup

on:
  schedule:
    - cron: "15 3 * * 0"   # Weekly
  workflow_dispatch: {}

permissions:
  id-token: write
  contents: read

jobs:
  backup:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout full history
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Create git bundle
        run: |
          set -e
          REPO_NAME="${GITHUB_REPOSITORY#*/}"
          TS="$(date -u +%Y-%m-%dT%H-%M-%SZ)"
          mkdir -p backups
          git bundle create "backups/${REPO_NAME}-${TS}.bundle" --all
          sha256sum "backups/${REPO_NAME}-${TS}.bundle" > "backups/${REPO_NAME}-${TS}.sha256"

      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: <IAM_ROLE_ARN>
          aws-region: <AWS_REGION>

      - name: Upload to S3
        run: |
          aws s3 cp backups/ \
            s3://<bucket-name>/github-backups/${GITHUB_REPOSITORY}/ \
            --recursive
Enter fullscreen mode Exit fullscreen mode

Nothing clever.
Nothing hidden.
That’s intentional.


Terraform setup (AWS side)

This is a minimal Terraform configuration — no extras.

GitHub OIDC provider

resource "aws_iam_openid_connect_provider" "github" {
  url = "https://token.actions.githubusercontent.com"

  client_id_list = [
    "sts.amazonaws.com"
  ]

  thumbprint_list = [
    "6938fd4d98bab03faadb97b34396831e3780aea1"
  ]
}
Enter fullscreen mode Exit fullscreen mode

IAM role for GitHub Actions

resource "aws_iam_role" "github_backup" {
  name = "github-actions-s3-backup"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = "sts:AssumeRoleWithWebIdentity"
      Principal = {
        Federated = aws_iam_openid_connect_provider.github.arn
      }
      Condition = {
        StringLike = {
          "token.actions.githubusercontent.com:sub" = "repo:*/*:*"
        }
      }
    }]
  })
}
Enter fullscreen mode Exit fullscreen mode

IAM role policy (write-only S3 access)

resource "aws_iam_role_policy" "s3_backup" {
  role = aws_iam_role.github_backup.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = ["s3:ListBucket"]
        Resource = "arn:aws:s3:::example-backup-bucket"
      },
      {
        Effect = "Allow"
        Action = [
          "s3:PutObject",
          "s3:AbortMultipartUpload"
        ]
        Resource = "arn:aws:s3:::example-backup-bucket/*"
      }
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

Restoring from a backup

Restoring is refreshingly simple.

git clone repo-backup.bundle restored-repo
cd restored-repo
git push --all origin
git push --tags origin
Enter fullscreen mode Exit fullscreen mode

No GitHub API.
No special tooling.
Just Git.

Lessons learned

  • Sketch trust relationships before writing policies
  • Don’t trust AWS error messages blindly
  • Never use root as a bucket principal
  • Test with one repo before scaling
  • Keep backups boring

Final thoughts

This setup isn’t flashy — and that’s the point.

A good backup system is something you forget about until the day you need it.
And when that day comes, it should just work.

Top comments (0)