DEV Community

Cover image for GitHub Is Not Your Backup. One Suspended Account Proved It This Week.
Phil Rentier Digital
Phil Rentier Digital

Posted on • Originally published at rentierdigital.xyz

GitHub Is Not Your Backup. One Suspended Account Proved It This Week.

One developer lost 104 repos this week. I'd already stopped trusting GitHub with mine.

This week a developer found out at 9am that his 104 GitHub repos no longer existed. No hack. No bug. No drive that died. Just an automated email: account suspended, we re-evaluated your 2019 Student Pack, we now think you weren't eligible. Six years later. 24 of those repos had no backup anywhere else. Gone in an algorithmic decision made on a Tuesday morning.

TLDR: If you're in the same spot (everything on GitHub, no mirror, no plan B) you're one classifier away from the same email at 9am. There is a fix. It costs nothing, runs on a server you probably already rent, and almost nobody bothers. The question is why.

The replies under that post are the other story. Hundreds of devs realizing in real time that they're in exactly the same situation. Everything on GitHub. No mirror. No plan B. And the same question coming back in every thread: ok but concretely, what do I do now?

I didn't wait. Not out of paranoia, not because I had a crystal ball. Just because at some point in your career as a dev you stop confusing "hosted somewhere else" with "backed up." GitHub is a great tool. It's a great forge. It's a great social network for code. It is not a backup, that has never been their job, and it's written in black and white in their terms of service.

104 Repos. One Automated Decision. No Warning.

The story that went around this week is simple enough to tell in two sentences. A developer signed up for a free Student Pack in 2019. Six years later, an automated review re-evaluated the original eligibility, decided retroactively that it was never legitimate, and suspended the account. 104 repos hidden. 24 of them never pushed anywhere else.

It doesn't matter whether the original Student Pack claim was legit or not. What matters is that years of work can disappear behind a process you have no say in, on a timeline you cannot predict, triggered by a 2026 classifier re-grading a form a 19-year-old filled out in 2019.

The threads under the post are full of people doing the math in public. I have 60 repos. I've been on GitHub since 2014. I never thought about a mirror. The ones who are confident are the ones with self-hosted git running somewhere. There aren't many of them.

You don't get a warning before this happens to you.

What GitHub Actually Promises You (It's Less Than You Think)

GitHub does not promise to keep your repos. They promise to run a platform.

Read the Terms of Service sober (it's a chore, do it once). The relevant clauses are not hidden. GitHub can suspend or terminate accounts at their discretion. Your content can be removed if it violates policies, including policies that didn't exist when you created the account. And your obligation as a user is to maintain your own copies of anything you cannot afford to lose.

There is even a name for this. The cloud industry has been using it for over a decade and every major provider has a page on it: the Shared Responsibility Model. The provider runs the platform. The customer owns the data. GitHub doesn't put it on the homepage because nobody would sign up if it said "your data is your problem", but the contract is the same.

The mistake almost everybody makes is one of category. We treat GitHub like a filesystem. It looks like one (folders, files, history). It feels like one (always there, always in sync). But it's a managed service with a TOS, and managed services have an exit door operated by the provider.

I'm not arguing against the contract. I'm just reading it. Once you've read it, you can't unread it.

I Don't Wait for Incidents. That's a Design Principle.

I have a rule for any third-party I depend on: if losing it would hurt, it gets a local copy. Not because I expect the provider to fail. Because the cost of being wrong about that one is too high.

This is standard infra hygiene, not paranoia. You don't argue with the DBA about whether the primary "might" go down. You set up the replica because that's how you build infra that survives a Tuesday.

Same principle is why I rebuilt my entire AI agent setup the week Anthropic killed my $200/month OpenClaw setup and forced me to rebuild it for $15. I didn't wait for the announcement to bite me twice. The moment a vendor changes the rules unilaterally, the right reaction is not to renegotiate. It's to own the next version.

Security people call this security by design. The decisions you make at architecture time are decisions you don't have to make under stress. You don't design a fire escape during the fire. You don't write a backup strategy at 9am while staring at a suspension email and a coffee that's gone cold.

So I had a mirror. Before any of this. For one reason: my code is the only deliverable I cannot recreate. Servers I can rebuild. Configs I can rewrite. Six years of commits in 39 private repos, that one I cannot.

The mirror exists so I never have to write a Sunday evening blog post titled How I Recovered From a GitHub Suspension.

My Setup: Forgejo Mirror Behind a NetBird Mesh

Architecture flow diagram. GitHub on the left, arrow labeled


Git Mirror Architecture

Two pieces. That's the whole setup.

Forgejo is a self-hosted git forge. It forked from Gitea in 2022 when Gitea moved to a for-profit company structure (yes, this is the kind of detail that matters once you've started caring about who owns your tools). It runs in a single container with SQLite. No PostgreSQL cluster, no Redis, no microservices. It speaks the git protocol natively, no web-layer abstraction. If you cloned a repo from Forgejo, you wouldn't notice you weren't on GitHub. Same logic I made the case for in why CLIs beat MCP for AI agents: the primitive beats the wrapper, every time.

NetBird is a WireGuard-based mesh. My laptop, my VPS and a couple of other devices are on a private network with private IPs. No public exposure. No reverse proxy. No TLS certificate to renew. If you're not on the mesh, the port doesn't even respond.

The Forgejo container looks like this:

services:
  forgejo:
    image: codeberg.org/forgejo/forgejo:11
    container_name: forgejo
    restart: unless-stopped
    ports:
      - "100.69.51.147:3000:3000"
    volumes:
      - forgejo_data:/data
    environment:
      - FORGEJO__server__ROOT_URL=http://forgejo.mesh:3000
      - FORGEJO__server__DISABLE_SSH=true
      - FORGEJO__service__DISABLE_REGISTRATION=true
      - FORGEJO__mirror__DEFAULT_INTERVAL=8h
Enter fullscreen mode Exit fullscreen mode

Four choices to call out:

The port binds to the mesh IP only (100.69.51.147). Not 0.0.0.0. Not exposed to the public internet. The mirror is a private resource that lives behind the same fence as my other internal services.

SSH is disabled. I never push to the mirror. It's read-only. Disabling SSH removes an entire attack surface I don't need.

Registration is disabled. Single-user instance. No sign-up form for some bot to find on a Tuesday.

The mirror sync interval is 8 hours. Forgejo has native pull mirror support: you give it a GitHub URL and a PAT, and it pulls every 8 hours forever. No cron, no script, no webhook. The forge does it itself.

Then a small script registers all my GitHub repos as mirrors via the Forgejo API. It's idempotent: it lists the repos, checks which ones already have a mirror, and creates only the missing ones. Run it once at install. Run it again every time you create a new GitHub repo. Or schedule it weekly, your call. The single API call per repo looks like this:

curl -X POST "$FORGEJO_URL/api/v1/repos/migrate" \
  -H "Authorization: token $FORGEJO_TOKEN" \
  -d '{
    "clone_addr": "https://github.com/myorg/repo.git",
    "repo_name": "repo",
    "mirror": true,
    "private": true,
    "auth_token": "'$GITHUB_PAT'",
    "service": "github"
  }'
Enter fullscreen mode Exit fullscreen mode

Wrap that in a loop over gh repo list myorg, with a check on whether the mirror already exists, and you're done. The PAT and Forgejo token come from a self-hosted secrets manager at runtime, never on disk.

Total resource footprint: about 100MB of RAM, 2GB of disk for 39 repos. The container restarts in two seconds. The 8h sync runs in the background and I forget it exists for weeks at a time.

What This Covers, and What It Doesn't

This is the part most "self-host your git" articles skip. A pull mirror is not a full GitHub backup.

What the mirror saves is the git side of things: every commit, every branch, every tag, full history across all branches. Submodules and LFS work too if you take the extra step to configure them, and you should if you use them.

What the mirror does NOT save is everything that lives outside the git protocol. Issues. Pull requests and review comments. Wiki pages. Actions run logs (the YAML files yes, the run history no). Repo settings, webhooks, deploy keys, collaborator access lists. All of that is GitHub-specific metadata, stored in their database, not in your .git directory. If GitHub vanishes tomorrow, my 39 repos are intact, up to 8 hours stale. My issues and PRs are not.

For my use case (private repos, mostly solo work, infrastructure code) that's an acceptable trade. For a larger team running half their workflow inside GitHub Issues, the conversation is different. You'd want the official GitHub repo backup tool, or a third-party that hits the API for issues and PRs as well, on top of the git mirror.

There's also the case of me deleting a repo on GitHub by accident. The pull mirror notices the upstream is gone, but Forgejo doesn't auto-delete the local copy. The last synced state stays. That's actually a feature: a destructive action upstream doesn't propagate. (I'm not going to claim I designed this on purpose. I noticed it the first time I cleaned up an old org and saw the mirror still sitting there a month later. Free safety net, kept it.)

Know what your filter does and what it doesn't. Don't sell yourself a backup story that doesn't match the contract you actually have with your tooling.

You Don't Have to Wait for Your Own Incident

The 104 repos story is not exceptional. It's just visible. The same thing happens every week to people whose audience is too small for the post to travel. Account suspensions, mistaken DMCA takedowns, billing disputes, classifier false positives, payment method expired in a country GitHub's billing system handles weirdly. The list is long. The fix is the same in every case.

In six months, GitHub will publish a blog post on "improving how we communicate account actions". There will be a new dashboard, a refreshed FAQ, a prettier status page. Nobody will read it before the next wave.

Meanwhile the devs who ship will keep shipping. With a mirror. On their own infra. Reachable when GitHub is down, when a classifier mis-fires, when a 2019 Student Pack suddenly becomes a 2026 problem. Not much. Just a docker-compose and three hours of config one Sunday.

Git is distributed by design. We're the ones who decided to forget.

Sources

(*) The cover is AI-generated. No actual GitHub repos were harmed in the making of this image.

Top comments (0)