Manuel Weiss

Posted on Mar 30

Automating Database Branches: PR-Triggered Provisioning, Scheduled Refreshes, and TTL Cleanup in Practice

#data #database #ttl

Your application pipeline is fully automated. A developer runs git checkout -b feature/user-auth, opens a PR, and within minutes a preview environment is live with the new code deployed. The PR closes and the environment tears down. Nobody had to think about it.

The database didn't get that memo.

From what I've seen across most teams, the developer ends up doing one of three things: pointing their feature branch at a shared staging environment, waiting for someone to manually provision (set up and configure) a fresh instance, or patching together a one-off clone script that nobody else maintains or remembers to update. This isn't a tooling problem. It's an automation gap. The database is the one component in the stack that still needs a human to make a decision at every stage of its lifecycle.

The fix is applying the same event-driven (triggered by a specific action, like opening a PR) and schedule-based patterns you already use for application code to your database branches. In this article, I walk through three concrete patterns: PR-triggered branch creation, scheduled refreshes, and TTL-based (TTL, or "time-to-live," is a rule that automatically deletes something after a set period) cleanup. I'll show how to wire each one into GitHub Actions using real API calls and shell scripts. These patterns assume a database platform that exposes branch lifecycle operations (create, list, delete) via a REST API. Branching needs to be a core, built-in feature here, not something you're faking with dump-and-restore (manually exporting and re-importing data).

Pattern 1: Auto-Create a Branch When a PR Opens

The trigger is pull_request with types [opened, reopened, closed]. The job calls your database API to create a branch named after the PR number, captures the returned connection string, and injects it into the deployment environment. When the PR closes, a second job in the same workflow file deletes the branch. Teardown is not a separate concern — it lives in the same file.

Here's how the full lifecycle flows:

And here's the complete GitHub Actions workflow:

name: Database Branch Lifecycle

on:
  pull_request:
    types: [opened, reopened, closed]

env:
  XATA_API_KEY: ${{ secrets.XATA_API_KEY }}
  DB_BRANCH_NAME: pr-${{ github.event.pull_request.number }}
  BASE_URL: https://${{ vars.XATA_WORKSPACE }}.us-east-1.xata.sh/db/${{ vars.XATA_DB }}

jobs:
  create-db-branch:
    if: github.event.action == 'opened' || github.event.action == 'reopened'
    runs-on: ubuntu-latest
    outputs:
      connection_string: ${{ steps.create.outputs.connection_string }}
    steps:
      - name: Create database branch
        id: create
        run: |
          HTTP_STATUS=$(curl -s -o /tmp/response.json -w "%{http_code}" -X POST \
            -H "Authorization: Bearer $XATA_API_KEY" \
            -H "Content-Type: application/json" \
            -d '{"from": "main"}' \
            "$BASE_URL:$DB_BRANCH_NAME")

          if [ "$HTTP_STATUS" -eq 409 ]; then
            echo "Branch already exists — fetching existing connection string"
          elif [ "$HTTP_STATUS" -ne 201 ]; then
            echo "Branch creation failed: $HTTP_STATUS"
            cat /tmp/response.json
            exit 1
          fi

          CONN=$(cat /tmp/response.json | jq -r '.databaseURL')
          echo "connection_string=$CONN" >> $GITHUB_OUTPUT

      - name: Inject connection string into deployment environment
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          gh secret set DATABASE_URL \
            --env "pr-${{ github.event.pull_request.number }}" \
            --body "${{ steps.create.outputs.connection_string }}"

  delete-db-branch:
    if: github.event.action == 'closed'
    runs-on: ubuntu-latest
    steps:
      - name: Delete database branch
        run: |
          curl -s -X DELETE \
            -H "Authorization: Bearer $XATA_API_KEY" \
            "$BASE_URL:$DB_BRANCH_NAME"

Xata exposes branch lifecycle operations (create, delete, list) as first-class REST endpoints, which is what makes the curl-based approach here work without any additional tooling. The POST to create and DELETE to remove are the only two calls the PR lifecycle needs. The 409 handling is already in the creation step, making the job safe to re-run.

Branch from `main` or from a production snapshot?

Branching from main gives you the current schema plus your seeded test data — predictable, fast, zero PII risk. Branching from a recent production snapshot gives you realistic data volume and distribution, which matters when you're testing query plans, pagination behavior, or anything that scales with row count. For feature branches doing schema changes, main is usually sufficient. For branches that touch performance-sensitive paths, the snapshot is worth the extra setup.

GitLab CI Equivalent

If you're on GitLab, the trigger is $CI_PIPELINE_SOURCE == "merge_request_event". The API calls are identical — only the event context variables change in this case:

create-db-branch:
  stage: setup
  script:
    - |
      curl -s -X POST \
        -H "Authorization: Bearer $XATA_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{"from": "main"}' \
        "https://$XATA_WORKSPACE.us-east-1.xata.sh/db/$XATA_DB:mr-$CI_MERGE_REQUEST_IID"
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: on_success

delete-db-branch:
  stage: cleanup
  script:
    - |
      curl -s -X DELETE \
        -H "Authorization: Bearer $XATA_API_KEY" \
        "https://$XATA_WORKSPACE.us-east-1.xata.sh/db/$XATA_DB:mr-$CI_MERGE_REQUEST_IID"
  rules:
    - if: '$CI_MERGE_REQUEST_STATE == "merged" || $CI_MERGE_REQUEST_STATE == "closed"'

All secrets go into GitLab CI/CD variables at the project level — same concept, just a different location.

Pattern 2: Scheduled Refreshes to Keep Branches in Sync

Long-lived feature branches go stale. Schema migrations (changes to the database structure, like adding or renaming columns) land on main every day. After 3 to 4 days, a branch that was created from main no longer reflects what production will actually look like when the PR merges. Tests pass on the outdated branch, then fail on merge. The review environment ends up showing behavior that production won't reproduce.

The fix is a scheduled workflow that identifies branches older than a configurable threshold and rebuilds them from the latest main snapshot.

name: Refresh Stale Database Branches

on:
  schedule:
    - cron: '0 3 * * *'  # 3am UTC daily

jobs:
  refresh-branches:
    runs-on: ubuntu-latest
    env:
      XATA_API_KEY: ${{ secrets.XATA_API_KEY }}
      BASE_URL: https://${{ vars.XATA_WORKSPACE }}.us-east-1.xata.sh/db/${{ vars.XATA_DB }}
      STALENESS_DAYS: 3
    steps:
      - name: List and refresh stale branches
        run: |
          BRANCHES=$(curl -s \
            -H "Authorization: Bearer $XATA_API_KEY" \
            "$BASE_URL")

          STALE_CUTOFF=$(date -d "-${STALENESS_DAYS} days" -u +%Y-%m-%dT%H:%M:%SZ)

          echo "$BRANCHES" | jq -r --arg cutoff "$STALE_CUTOFF" \
            '.branches[] | select(.name != "main") | select(.createdAt < $cutoff) | .name' \
          | while read -r BRANCH_NAME; do

            echo "Refreshing: $BRANCH_NAME"

            curl -s -X DELETE \
              -H "Authorization: Bearer $XATA_API_KEY" \
              "$BASE_URL:$BRANCH_NAME"

            curl -s -X POST \
              -H "Authorization: Bearer $XATA_API_KEY" \
              -H "Content-Type: application/json" \
              -d '{"from": "main"}' \
              "$BASE_URL:$BRANCH_NAME"

            echo "Done: $BRANCH_NAME"
          done

The jq filter passes $cutoff as a proper variable rather than interpolating it into the query string. This is cleaner and less prone to breaking on dates with unusual formatting.

The PII problem. If your production snapshot contains personal data (PII - personally identifiable information like emails, phone numbers, or addresses), you can't feed it directly into a branch that's accessible in CI without first masking it. The right place to do this is between the POST request that creates the branch and the step that hands the connection string over to your deployment environment. The pattern goes: create branch, run a masking or seeding script, then expose DATABASE_URL.

A practical masking step replaces email columns with user-{id}@example.com, phone fields with 555-0100, and anything resembling a name or address with a synthetic equivalent. If building that pipeline isn't feasible right now, schema-only branches are a reasonable fallback. The branch gets the current database structure but no real records, and a seed script (a script that populates the database with safe, fake data) fills it with synthetic data before the connection string is passed to CI.

Pattern 3: TTL-Based Cleanup for Abandoned Branches

PR-triggered creation has a silent failure mode: PRs that are abandoned rather than formally closed. A developer opens a branch, opens a PR, then the work gets deprioritized. The PR stays open but idle for weeks. The closed event never fires, so the database branch never gets deleted. Branches from merged or closed PRs can also linger if the cleanup step failed silently.

Over time these accumulate. Even on platforms that scale compute to zero, storage accrues on idle branches. A team running 30 active developers with a typical amount of WIP can easily accumulate 50–100 orphaned branches without a cleanup mechanism.

This Bash script handles both cases. It lists all branches older than a configurable TTL and deletes them, with one exception: if the corresponding PR is still open, the branch is left alone. That check protects slow-moving but active PRs from being cleaned up prematurely.

#!/usr/bin/env bash
set -euo pipefail

XATA_API_KEY="${XATA_API_KEY:?XATA_API_KEY is required}"
GITHUB_TOKEN="${GITHUB_TOKEN:?GITHUB_TOKEN is required}"
XATA_WORKSPACE="${XATA_WORKSPACE:?}"
XATA_DB="${XATA_DB:?}"
GITHUB_REPO="${GITHUB_REPO:?}"  # format: owner/repo
TTL_DAYS="${TTL_DAYS:-7}"

BASE_URL="https://${XATA_WORKSPACE}.us-east-1.xata.sh/db/${XATA_DB}"
CUTOFF=$(date -d "-${TTL_DAYS} days" -u +%Y-%m-%dT%H:%M:%SZ)

echo "Checking for branches older than ${TTL_DAYS} days (cutoff: ${CUTOFF})"

BRANCHES=$(curl -s -H "Authorization: Bearer ${XATA_API_KEY}" "${BASE_URL}")

echo "${BRANCHES}" | jq -r --arg cutoff "${CUTOFF}" \
  '.branches[] | select(.name != "main") | select(.createdAt < $cutoff) | .name' \
| while read -r BRANCH_NAME; do

  echo "Evaluating: ${BRANCH_NAME}"

  # Extract PR number — expects format pr-NNN or mr-NNN
  PR_NUM=$(echo "${BRANCH_NAME}" | grep -oE '[0-9]+$' || true)

  if [ -z "${PR_NUM}" ]; then
    echo "  Skipping: cannot parse PR number from '${BRANCH_NAME}'"
    continue
  fi

  PR_STATE=$(curl -s \
    -H "Authorization: Bearer ${GITHUB_TOKEN}" \
    -H "Accept: application/vnd.github+json" \
    "https://api.github.com/repos/${GITHUB_REPO}/pulls/${PR_NUM}" \
    | jq -r '.state // "not_found"')

  if [ "${PR_STATE}" = "open" ]; then
    echo "  Skipping: PR #${PR_NUM} is still open"
    continue
  fi

  echo "  Deleting ${BRANCH_NAME} (PR #${PR_NUM} state: ${PR_STATE})"
  curl -s -X DELETE \
    -H "Authorization: Bearer ${XATA_API_KEY}" \
    "${BASE_URL}:${BRANCH_NAME}"
done

echo "Cleanup complete."

Save this as scripts/db-cleanup.sh and run it on a schedule:

name: Database Branch Cleanup

on:
  schedule:
    - cron: '0 4 * * *'  # 4am UTC daily — runs after the 3am refresh

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run TTL cleanup
        env:
          XATA_API_KEY: ${{ secrets.XATA_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          XATA_WORKSPACE: ${{ vars.XATA_WORKSPACE }}
          XATA_DB: ${{ vars.XATA_DB }}
          GITHUB_REPO: ${{ github.repository }}
          TTL_DAYS: 7
        run: bash scripts/db-cleanup.sh

Hard TTL vs. activity-based TTL

Hard TTL deletes any branch older than N days regardless of PR state. It's simple and fast, but risks deleting a branch that belongs to a slow-moving but active PR. Activity-based TTL checks whether the PR is still open before deleting, which is safer, but makes one GitHub API call per branch. I'd start with activity-based TTL at 7 days. If you're running hundreds of branches and the API calls start hitting GitHub's rate limits (a cap on how many API requests you can make in a given time window), switch to hard TTL at 14 or 21 days.

Wiring the Patterns Together

Now let's see how you can put it all together.

Connection String Injection

Never put a connection string directly in a workflow file. The right approach is to create a GitHub Actions deployment environment for each PR, named pr-{number}, and scope the DATABASE_URL secret to that environment. When the PR closes and the branch is torn down, the secret is revoked along with it. Your deployment job reads DATABASE_URL from the named environment, keeping credentials scoped to one PR, isolated from others, and automatically cleaned up when no longer needed.

For passing a connection string between jobs in the same workflow run, use job outputs:

jobs:
  create-db-branch:
    outputs:
      db_url: ${{ steps.create.outputs.connection_string }}

  deploy:
    needs: create-db-branch
    environment: pr-${{ github.event.pull_request.number }}
    env:
      DATABASE_URL: ${{ needs.create-db-branch.outputs.db_url }}

Idempotency

The 409 check in Pattern 1 isn't optional. CI jobs fail and get re-run all the time, whether from a flaky network call, a GitHub-side timeout, or a manual re-trigger. If your branch creation step doesn't handle a 409 Conflict response (which means the branch already exists) gracefully, every re-run will error out before it even reaches the steps that deploy your code.

Monorepo Considerations

If multiple services share one database, branch creation should still be one-per-PR, not one-per-service. Each service reads from the same branch connection string. Use the create-db-branch job as a dependency and share the output:

jobs:
  create-db-branch:
    outputs:
      db_url: ${{ steps.create.outputs.connection_string }}

  deploy-api:
    needs: create-db-branch
    env:
      DATABASE_URL: ${{ needs.create-db-branch.outputs.db_url }}

  deploy-workers:
    needs: create-db-branch
    env:
      DATABASE_URL: ${{ needs.create-db-branch.outputs.db_url }}

The branch name stays pr-{number} regardless of which service file changed. One PR, one database branch, multiple consumers.

Webhook vs. Poll

PR-triggered workflows run on webhooks (notifications that GitHub sends instantly when something happens), so the workflow fires within seconds of a PR opening or closing. Refresh and cleanup workflows run on a schedule instead. This split is intentional. Webhooks give you a near-instant response to lifecycle events. Scheduled polling is the right model for background maintenance where a few minutes' delay doesn't matter. Don't try to make cleanup event-driven. You'll end up with race conditions (where two processes, like a cleanup job and a creation job, run at the same time and conflict with each other) that scheduled polling avoids entirely.

Data Management Decisions You Need to Make Before You Automate

This is the decision that trips up most teams when they first automate database branching, and it's worth settling before you write a single YAML line.

Three options for what data goes into a branch:

1. Schema-only with synthetic data

The branch gets the current schema, no records. A seed script runs after creation and populates synthetic data before the connection string is handed to CI. Safe, fast, no PII risk. The downside: synthetic data rarely reflects real-world distributions, which can hide bugs in complex queries, aggregations, or pagination edge cases.

2. Anonymized production snapshot

Realistic data volume and shape, with PII scrubbed before the branch is mounted. This requires a masking pipeline:

capture snapshot → run anonymization → create branch from masked data.

Emails become user-{id}@example.com, names become deterministic fakes, phone numbers become 555-0100. Operationally heavier to set up, but meaningfully better for integration and performance testing.

3. Full production copy

Only appropriate if your infrastructure is fully isolated, your team has explicit compliance approval, and you've thought through the access control implications end-to-end. Most teams running CI environments should not do this.

The important thing is to encode this decision in the automation script rather than leaving it to whoever opens a PR. The branch creation job should enforce the data policy automatically, whether that means running a seed script, applying a masking step, or rejecting the operation if the required conditions aren't met. Individual developers shouldn't be making this call each time they create a branch.

Xata supports schema-only branch creation via its API, which makes option (a) straightforward to implement without managing snapshot files or dump logistics.

Your Starting Point: A Minimal PR-Triggered Workflow You Can Ship Today

Here's the complete lifecycle workflow: branch create on open, branch delete on close, in under 30 lines of YAML:

name: DB Branch Lifecycle

on:
  pull_request:
    types: [opened, reopened, closed]

env:
  BRANCH_NAME: pr-${{ github.event.pull_request.number }}
  BASE_URL: https://${{ vars.XATA_WORKSPACE }}.us-east-1.xata.sh/db/${{ vars.XATA_DB }}

jobs:
  manage-db-branch:
    runs-on: ubuntu-latest
    steps:
      - name: Create branch
        if: github.event.action != 'closed'
        run: |
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST \
            -H "Authorization: Bearer ${{ secrets.XATA_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d '{"from": "main"}' \
            "${{ env.BASE_URL }}:${{ env.BRANCH_NAME }}")
          [ "$STATUS" -eq 201 ] || [ "$STATUS" -eq 409 ] || exit 1

      - name: Delete branch
        if: github.event.action == 'closed'
        run: |
          curl -s -X DELETE \
            -H "Authorization: Bearer ${{ secrets.XATA_API_KEY }}" \
            "${{ env.BASE_URL }}:${{ env.BRANCH_NAME }}"

Four steps to get this running in any repo:

Store your Xata API key as a GitHub Actions secret named XATA_API_KEY
Add XATA_WORKSPACE and XATA_DB as Actions variables — these aren't sensitive, so variables (not secrets) are appropriate
Commit the workflow file to .github/workflows/db-branch.yml
Open a test PR and verify the branch appears in your database dashboard within the first workflow run

That's the skeleton. Scheduled refresh, TTL cleanup, and connection string injection each layer on top without modifying the creation logic. Start here, ship it, confirm it works, and add the patterns your specific workflow actually needs.

If you want to explore how Xata structures its branching API, including schema-only branch creation and migration tracking, their documentation covers the full surface: xata.io/docs. Which automation pattern are you tackling first: PR-triggered provisioning, scheduled refresh, or TTL cleanup? Drop your setup in the comments. Edge cases from real pipelines are always welcome.

DEV Community

Automating Database Branches: PR-Triggered Provisioning, Scheduled Refreshes, and TTL Cleanup in Practice

Pattern 1: Auto-Create a Branch When a PR Opens

Branch from `main` or from a production snapshot?

GitLab CI Equivalent

Pattern 2: Scheduled Refreshes to Keep Branches in Sync

Pattern 3: TTL-Based Cleanup for Abandoned Branches

Hard TTL vs. activity-based TTL

Wiring the Patterns Together

Connection String Injection

Idempotency

Monorepo Considerations

Webhook vs. Poll

Data Management Decisions You Need to Make Before You Automate

1. Schema-only with synthetic data

2. Anonymized production snapshot

3. Full production copy

Your Starting Point: A Minimal PR-Triggered Workflow You Can Ship Today

Top comments (0)

Pattern 1: Auto-Create a Branch When a PR Opens

Branch from main or from a production snapshot?

GitLab CI Equivalent

Pattern 2: Scheduled Refreshes to Keep Branches in Sync

Pattern 3: TTL-Based Cleanup for Abandoned Branches

Hard TTL vs. activity-based TTL

Wiring the Patterns Together

Connection String Injection

Idempotency

Monorepo Considerations

Webhook vs. Poll

Data Management Decisions You Need to Make Before You Automate

1. Schema-only with synthetic data

2. Anonymized production snapshot

3. Full production copy

Your Starting Point: A Minimal PR-Triggered Workflow You Can Ship Today

Branch from `main` or from a production snapshot?