Kriss

Posted on Apr 29 • Edited on May 5

Monitoring GitHub Actions scheduled workflows: a practical guide

#devops #github #tutorial #monitoring

Monitoring GitHub Actions scheduled workflows: a practical guide

GitHub Actions is a surprisingly capable cron scheduler. Schedule a workflow, let it run nightly, forget about it.

Until it stops running. And you don't notice for two weeks.

Scheduled workflows in GitHub Actions are quietly unreliable. GitHub delays them, skips them during high load, and — most importantly — gives you no built-in alerting when they fail silently. Adding external monitoring takes about 5 minutes and saves you from that two-week discovery.

The basic setup

Here's a minimal scheduled workflow with monitoring:

name: Nightly export

on:
  schedule:
    - cron: '0 2 * * *'  # 2am UTC every day
  workflow_dispatch:  # allows manual triggering for testing

jobs:
  export:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run export
        run: python scripts/export.py

      - name: Ping DeadManCheck
        if: success()
        run: curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }} > /dev/null

The last step pings DeadManCheck only if all previous steps succeeded (if: success()). If the export script fails, the ping doesn't fire, and you get alerted after your configured grace period.

Set up the monitor with a 25-hour interval (giving a 1-hour buffer on the 24-hour schedule). Store your token in GitHub: Settings → Secrets and variables → Actions → New repository secret named DEADMANCHECK_TOKEN.

Adding start/end pings for longer jobs

For jobs that run more than a few minutes, use the start/end pattern. This catches jobs that hang:

steps:
  - uses: actions/checkout@v4

  - name: Ping start
    run: curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/start > /dev/null || true

  - name: Run ETL
    id: etl
    run: |
      python scripts/run_etl.py
      echo "rows=$(cat /tmp/etl_row_count.txt)" >> $GITHUB_OUTPUT

  - name: Ping done
    if: success()
    run: |
      curl -fsS -X POST -H "Content-Type: application/json" \
        -d "{\"count\": ${{ steps.etl.outputs.rows }}}" \
        "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}" \
        > /dev/null || true

  - name: Ping fail
    if: failure()
    run: curl -fsS https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/fail > /dev/null || true

Your ETL script writes the row count to /tmp/etl_row_count.txt. The monitoring step picks it up and includes it in the ping — so your monitor can alert on zero-output runs, not just missed runs.

The gotchas

GitHub delays scheduled workflows

This is the big one. GitHub's docs admit that scheduled workflows may be delayed during high load. A workflow scheduled for 2:00am UTC might run at 2:23am or 2:51am. During busy periods, delays of 30–60 minutes aren't unusual.

Don't set your DeadManCheck interval to exactly 24 hours. Set it to 25 hours. That buffer absorbs GitHub's scheduling jitter without letting real failures go undetected.

Scheduled workflows stop on inactive repos

If a repository has no commits in 60 days, GitHub disables scheduled workflows. You'll get an email warning. If you miss it, the job silently stops running — and your external monitor will catch it where GitHub's notification didn't reach you.

Test with workflow_dispatch before trusting the schedule

Always add workflow_dispatch as a trigger (it's in all examples above). You can trigger the workflow manually from the Actions tab or via the CLI:

gh workflow run nightly-export.yml

Test your monitoring integration before the first scheduled run. Confirm the ping appears in your DeadManCheck dashboard with the correct count.

Secrets aren't available in forks

If your repo is public and someone forks it, secrets.DEADMANCHECK_TOKEN will be empty in their fork. The curl will fail silently. This is fine — you don't want random forks pinging your monitor — but be aware of it when debugging.

Full production example

name: Nightly database backup

on:
  schedule:
    - cron: '0 2 * * *'
  workflow_dispatch:

jobs:
  backup:
    runs-on: ubuntu-latest
    timeout-minutes: 30  # hard limit — prevent hung jobs accumulating

    steps:
      - uses: actions/checkout@v4

      - name: Ping start
        run: |
          curl -fsS \
            "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/start" \
            > /dev/null || true  # don't fail if monitoring is down

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Run backup
        id: backup
        run: |
          python scripts/backup.py
          echo "rows=$(cat /tmp/backup_row_count.txt)" >> $GITHUB_OUTPUT

      - name: Upload to S3
        run: aws s3 cp /backups/latest.dump s3://my-backups/

      - name: Ping done
        if: success()
        run: |
          curl -fsS -X POST -H "Content-Type: application/json" \
            -d "{\"count\": ${{ steps.backup.outputs.rows }}}" \
            "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}" \
            > /dev/null || true

      - name: Ping fail
        if: failure()
        run: |
          curl -fsS \
            "https://deadmancheck.io/ping/${{ secrets.DEADMANCHECK_TOKEN }}/fail" \
            > /dev/null || true

A few things worth noting:

timeout-minutes: 30 is a hard ceiling. Without it, a hung job can sit there for 6 hours consuming a runner.
|| true on the monitoring pings means a DeadManCheck outage won't cause your backup job to report failed.
The row count flows from the backup step through $GITHUB_OUTPUT to the ping step.

After deploying

Trigger the workflow manually and confirm:

The workflow runs end-to-end without errors
DeadManCheck shows a recent ping on your monitor dashboard
The count looks correct for what the job processed

Wait for the first scheduled run and verify again. Two successful data points before you trust it.

Scheduled workflows are one of those things that feel reliable until the day they aren't. External monitoring is the difference between finding out immediately and finding out when someone asks why the weekly report is missing.

DEV Community

Monitoring GitHub Actions scheduled workflows: a practical guide

Monitoring GitHub Actions scheduled workflows: a practical guide

The basic setup

Adding start/end pings for longer jobs

The gotchas

GitHub delays scheduled workflows

Scheduled workflows stop on inactive repos

Test with workflow_dispatch before trusting the schedule

Secrets aren't available in forks

Full production example

After deploying

Top comments (0)