DEV Community

Cover image for The Definitive GitHub Actions Debugging Guide: 65+ Real Errors and How to Fix Them
Hector Flores
Hector Flores

Posted on • Originally published at htek.dev

The Definitive GitHub Actions Debugging Guide: 65+ Real Errors and How to Fix Them

GitHub Actions is the CI/CD backbone for millions of repositories. It's also the source of some of the most confusing, silent, and undocumented failure modes in modern DevOps.

I've spent years debugging Actions workflows — first across 500+ repository migrations at an enterprise scale, then building agentic DevOps platforms that push Actions to its limits. This guide is the result: every error message I've collected, every silent failure I've traced, and every workaround that actually works.

This is a reference guide, not a tutorial. Bookmark it. Search it when something breaks. Every section includes the actual error message (so you can Ctrl+F or Google it), the root cause, and the fix with copy-paste code.

Quick Diagnosis Flowchart

Quick diagnosis flowchart showing 6 debugging paths for GitHub Actions failures
Start here: identify your failure category before diving into 65+ specific scenarios.

Before diving into 65+ scenarios, start here:

  1. Workflow never appears in Actions tab?YAML Syntax Issues or Trigger Problems
  2. Workflow runs but a step fails? → Check the error message against the sections below
  3. Workflow runs but produces wrong results silently?Silent Failures
  4. Secrets are empty or permissions denied?Secrets & Permissions
  5. Cache miss or artifact not found?Caching & Artifacts
  6. Jobs cancelled unexpectedly?Concurrency Issues

Pro tip: Install actionlint right now. It catches the majority of syntax and context issues in this guide before you push. Run it locally or add it to your CI: uses: raven-actions/actionlint@v2.


YAML Syntax & Validation Errors

These errors prevent your workflow from even registering with GitHub. No run appears — the workflow is silently rejected.

Unexpected or Typo'd YAML Keys

Error:

The workflow is not valid. .github/workflows/ci.yml (Line: 6, Col: 5):
Unexpected value 'default'

unexpected key "Shell" for step to run shell command. expected one of
"continue-on-error", "env", "id", "if", "name", "run", "shell",
"timeout-minutes", "working-directory" [syntax-check]
Enter fullscreen mode Exit fullscreen mode

Root cause: YAML key names in GitHub Actions are case-sensitive. default: is not defaults:. Shell: is not shell:. branch: is not branches:.

Fix: Use actionlint to catch these before pushing. Common corrections:

  • default:defaults:
  • branch:branches:
  • Shell:shell:

Standard YAML linters (yamllint, Python yaml.safe_load()) won't catch these because the YAML is syntactically valid — it's semantically wrong for GitHub Actions.

Missing Required Keys

Error:

"runs-on" section is missing in job "test" [syntax-check]
"jobs" section should not be empty [syntax-check]
Enter fullscreen mode Exit fullscreen mode

Fix: Every job needs runs-on: and at least one entry in steps:. Matrix keys are compared case-insensitively — node and NODE cannot coexist.

Expression Syntax Errors

Error:

got unexpected character '"' while lexing expression...
do you mean string literals? only single quotes are available
for string delimiter [expression]
Enter fullscreen mode Exit fullscreen mode

Root cause: GitHub Actions expressions use a custom mini-language, not JavaScript. Double quotes are not valid string delimiters. The + operator doesn't exist for concatenation.

Fix:

# ❌ Wrong
run: echo "${{ "hello" }}"
run: echo "${{ var1 + var2 }}"

# ✅ Correct
run: echo "${{ 'hello' }}"
run: echo "${{ format('{0}{1}', var1, var2) }}"
Enter fullscreen mode Exit fullscreen mode

Context Variable Type Errors

Error:

receiver of object dereference "owner" must be type of object but
got "string" [expression]
Enter fullscreen mode Exit fullscreen mode

Root cause: github.repository is a string ("owner/repo"), not an object. People try github.repository.owner expecting the org name.

Fix: Use github.repository_owner for the owner. Use toJSON(env) to dump environment variables, not ${{ env }} (which outputs the string 'Object').

secrets.* in Unexpected Contexts — Silent Failures

Error: No error. The workflow behaves unexpectedly or steps are silently skipped.

Root cause: While secrets is technically available in step if: conditions, using it there can cause unexpected behavior — particularly in composite actions, reusable workflows, or when the secret is undefined. The expression evaluates to empty string for undefined secrets, which can cause conditions to behave differently than expected.

Fix:

# ⚠️ Can behave unexpectedly with undefined secrets
- if: ${{ secrets.MY_SECRET != '' }}
  run: echo "has secret"

# ✅ Map to env first, then check env (more reliable)
- env:
    MY_SECRET: ${{ secrets.MY_SECRET }}
  run: |
    if [ -n "$MY_SECRET" ]; then
      echo "has secret"
    fi
Enter fullscreen mode Exit fullscreen mode

This pattern is especially dangerous because the failure mode is silence — no error, no notification. The env-mapping approach is more explicit and actionlint can validate it.

env Context Unavailable in Reusable Workflow with:

Error:

Unrecognized named-value: 'env'. Located at position 1 within
expression: env.SOMETHING
Enter fullscreen mode Exit fullscreen mode

Root cause: The env context is not available in the with: block when calling reusable workflows. This is a confirmed open bug with 226+ reactions.

Fix: Pass values via github.event.inputs, secrets: inherit, or hardcode them. There is no clean workaround — this is a known platform limitation.

if: Conditionals Always Evaluating to true

Error: No error. The step always runs regardless of condition.

Root cause: Using YAML block scalar |, trailing spaces, or wrapping ${{ }} with extra characters makes the condition a non-empty string — which is always truthy.

# ❌ Always true — trailing newline from |
if: |
  ${{ github.event_name == 'push' }}

# ❌ Always true — trailing space
if: "${{ github.event_name == 'push' }} "

# ❌ Always true — extra characters between ${{ }} blocks
if: ${{ github.event_name == 'push' }} && ${{ github.ref_name == 'main' }}
Enter fullscreen mode Exit fullscreen mode

Fix:

# ✅ Correct — no extra characters
if: github.event_name == 'push'

# ✅ Correct — single expression, no wrapping needed
if: github.event_name == 'push' && github.ref_name == 'main'
Enter fullscreen mode Exit fullscreen mode

Boolean Inputs Are Strings in Composite Actions

# In composite action — this is ALWAYS false:
if: ${{ inputs.realRun == true }}
Enter fullscreen mode Exit fullscreen mode

Root cause: Composite actions receive all inputs as strings, even when declared with type: boolean. This is a confirmed bug with 117+ reactions.

Fix: Compare to the string 'true':

if: ${{ inputs.realRun == 'true' }}
Enter fullscreen mode Exit fullscreen mode

Composite Actions: No defaults: Support

Root cause: Composite actions do not support the defaults: key. You cannot set a default shell. Every run: step must explicitly specify shell:.

Fix:

runs:
  using: composite
  steps:
    - run: echo "hello"
      shell: bash        # Required on EVERY step
    - run: echo "world"
      shell: bash        # Must repeat
Enter fullscreen mode Exit fullscreen mode

Tab Characters in YAML

Error:

found a tab character where an indentation space is expected
Enter fullscreen mode Exit fullscreen mode

Fix: YAML does not allow tabs for indentation. In VS Code: View → Render Whitespace. Add to .editorconfig:

[*.yml]
indent_style = space
indent_size = 2
Enter fullscreen mode Exit fullscreen mode

Silent Failures: The Most Dangerous Category

Silent failures in CI/CD — everything looks green but hidden problems lurk beneath the surface
The most dangerous bugs are the ones your pipeline says passed.

These are the scenarios where nothing visibly breaks — your workflow just does the wrong thing.

Scheduled Workflows Silently Disabled After 60 Days

Symptom: A cron workflow that's been running for months just stops. No notification.

Root cause: GitHub automatically disables schedule-triggered workflows after 60 days of repository inactivity (no commits). Workflow runs themselves don't count as activity.

Fix:

- uses: gautamkrishnar/keepalive-workflow@v2
  with:
    time_elapsed: '45'  # triggers 15 days before the 60-day cutoff
Enter fullscreen mode Exit fullscreen mode

Or re-enable manually:

gh workflow enable "Workflow Name" --repo OWNER/REPO
Enter fullscreen mode Exit fullscreen mode

GITHUB_TOKEN Cannot Trigger Downstream Workflows

Symptom: A workflow pushes a commit or creates a tag, but the expected downstream workflow (triggered by on: push) never fires.

Root cause: This is by design. Commits made with GITHUB_TOKEN do not trigger further workflow runs — it's GitHub's recursion prevention mechanism.

Fix: Use a GitHub App installation token or a PAT:

- uses: actions/create-github-app-token@v1
  id: app-token
  with:
    app-id: ${{ vars.APP_ID }}
    private-key: ${{ secrets.APP_PRIVATE_KEY }}

- uses: actions/checkout@v4
  with:
    token: ${{ steps.app-token.outputs.token }}
Enter fullscreen mode Exit fullscreen mode

Cache Rate Limiting Falls Through as "Cache Not Found"

Error:

Warning: Failed to restore: Failed to GetCacheEntryDownloadURL:
Rate limited: Failed request: (429) Too Many Requests
Cache not found for input keys: ...
Enter fullscreen mode Exit fullscreen mode

Root cause: When the cache API rate limits you, the action reports it as a cache miss — not a rate limit error. Your build proceeds without cache, silently slower.

Fix: Don't trigger hundreds of parallel matrix jobs all saving caches simultaneously. Stagger cache operations or use fewer, broader cache keys.

Fork PR Secrets Evaluate to Empty Strings

Symptom: A contributor opens a PR from a fork. Secret-dependent steps fail or skip silently.

Root cause: Secrets are not passed to workflows triggered by pull_request from forks. This is a deliberate security boundary.

Fix: Design CI to not require secrets for tests. For deployment previews after code review, use pull_request_target with a mandatory label gate:

on:
  pull_request_target:
    types: [labeled]

jobs:
  deploy-preview:
    if: github.event.label.name == 'safe to test'
    # ...
Enter fullscreen mode Exit fullscreen mode

⚠️ Security warning: Never checkout fork code with pull_request_target and then run it with repository secrets. This creates a pwn-request vulnerability.


Runner & Environment Problems

Self-Hosted Runner Registration & Update Loops

Error:

Runner update in progress, do not shutdown runner.
Downloading 2.277.1 runner... Generate and execute update script.
Runner will exit shortly for update, should back online within 10 seconds.
[...loops again...]
Enter fullscreen mode Exit fullscreen mode

Root cause: Containerized runners built on older Ubuntu images (18.04) hit glibc incompatibility when auto-update downloads a newer runner binary.

Fix:

  1. Rebuild container on Ubuntu 22.04+
  2. Disable auto-update: DISABLE_AUTO_UPDATE=1
  3. Add rm -rf /home/runner/actions-runner to container entrypoint before ./config.sh
  4. Add watchdog cron polling GET /orgs/{org}/actions/runners every 5 minutes

Runner Out of Disk Space

Error:

No space left on device (os error 28)
Enter fullscreen mode Exit fullscreen mode

Root cause: GitHub-hosted ubuntu-latest runners have ~14GB usable, but pre-installed toolchains (Android SDK ~8GB, .NET ~1.5GB, Haskell ~5GB) consume most of it.

Fix: Add a cleanup step before heavy builds:

- name: Free Disk Space
  uses: jlumbroso/free-disk-space@main
  with:
    tool-cache: false
    android: true
    dotnet: true
    haskell: true
    large-packages: true
Enter fullscreen mode Exit fullscreen mode

This reclaims ~10-15GB.

Environment Variables Not Persisting Between Steps

Error:

Warning: The `set-output` command is deprecated and will be disabled soon.
Enter fullscreen mode Exit fullscreen mode

Root cause: ::set-output and ::set-env were deprecated in favor of environment files.

Fix:

# ❌ Deprecated
- run: echo "::set-output name=dir::$(yarn cache dir)"

# ✅ Current
- run: echo "dir=$(yarn cache dir)" >> $GITHUB_OUTPUT

# For multi-line values:
- run: |
    echo "MY_VAR<<EOF" >> $GITHUB_ENV
    echo "$multiline_value" >> $GITHUB_ENV
    echo "EOF" >> $GITHUB_ENV
Enter fullscreen mode Exit fullscreen mode

Tools Not Found in Next Step (PATH Issues)

Error:

/bin/bash: my-tool: command not found
Enter fullscreen mode Exit fullscreen mode

Root cause: Each run: step spawns a fresh shell. export PATH=... is lost when that step ends.

Fix: Write to $GITHUB_PATH, not PATH:

- name: Install tool
  run: |
    pip install my-cli-tool
    echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Use tool  # PATH is now updated
  run: my-cli-tool --version
Enter fullscreen mode Exit fullscreen mode

Docker Not Available on Runner

Error:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
Is the docker daemon running?
Enter fullscreen mode Exit fullscreen mode

Root cause: ubuntu-latest-slim, ARC containers, and self-hosted runners without DinD don't expose Docker.

Fix:

  • Standard ubuntu-latest: Docker is available natively
  • ARC/containerized: Use DinD sidecar or switch to JavaScript/composite actions
  • For private registry pulls, add docker/login-action before container actions

Service Container Connectivity

Error:

connection to server at "localhost", port 5432 failed: Connection refused
Enter fullscreen mode Exit fullscreen mode

Root cause: In containerized jobs (container: at job level), service containers are on a Docker bridge network. localhost doesn't work.

Fix: Always add health checks, and use the service label as hostname in containerized jobs:

services:
  postgres:
    image: postgres:15
    env:
      POSTGRES_PASSWORD: password
    ports:
      - 5432:5432
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
Enter fullscreen mode Exit fullscreen mode

For containerized jobs, connect to postgres:5432 (the service label), not localhost:5432.

Runner Image Deprecation

Error:

No hosted runners with requested label(s): 'ubuntu-18.04' can be found.
sudo: docker-compose: command not found
Enter fullscreen mode Exit fullscreen mode

Fix:

# ❌ Removed
- run: sudo docker-compose up -d

# ✅ Docker Compose v2 plugin syntax
- run: sudo docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Track upcoming removals at the actions/runner-images releases.

Windows Runner Gotchas

Error:

AssertionError: expected '40-learnings\\passesdefaultgate.md' to contain '40-learnings/'
Enter fullscreen mode Exit fullscreen mode

Root cause: Path separators (\ vs /), missing POSIX tools (jq, sed), shebangs not honored, CRLF line endings.

Fix:

defaults:
  run:
    shell: bash  # uses Git Bash on Windows

# Install missing tools
- if: runner.os == 'Windows'
  run: choco install jq -y
  shell: pwsh

# Disable CRLF auto-conversion
- run: git config --global core.autocrlf false
Enter fullscreen mode Exit fullscreen mode

Node.js Runtime Deprecation

Error:

Node.js 16 actions are deprecated. Please update the following actions
to use Node.js 20: actions/checkout@v3, actions/cache@v3
Enter fullscreen mode Exit fullscreen mode

Fix: Bump to latest major versions of all actions. For own actions, update action.yml to runs.using: node24. Emergency workaround:

env:
  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: 'true'
Enter fullscreen mode Exit fullscreen mode

Deprecation timeline: node12 (cutoff mid-2023) → node16 (mid-2024) → node20 (enforcement rolling out 2025-2026). Check the GitHub Actions changelog for the latest timeline.


Secrets, Permissions & Authentication

GitHub Actions permission model — nested security layers from repository settings to GITHUB_TOKEN to OIDC federation
The GitHub Actions permission model: repo defaults → workflow permissions block → GITHUB_TOKEN scope. The #1 source of 403 errors.

GITHUB_TOKEN Permission Denied (403)

Error:

remote: Permission to org/repo.git denied to github-actions[bot].
fatal: unable to access '...': The requested URL returned error: 403
Enter fullscreen mode Exit fullscreen mode

Root cause: Default GITHUB_TOKEN is read-only since GitHub tightened defaults for new repos and orgs in February 2023.

Fix: Add explicit permissions: to the job:

permissions:
  contents: write       # git push
  pull-requests: write  # PR creation
  packages: write       # GHCR push
Enter fullscreen mode Exit fullscreen mode

Critical: The permissions: block completely replaces defaults. Any permission not listed becomes none. Listing only contents: write drops all other permissions including pull-requests.

OIDC Federation Failures with AWS

Error:

Could not assume role with OIDC: Not authorized to perform
sts:AssumeRoleWithWebIdentity
Enter fullscreen mode Exit fullscreen mode

Root causes and fixes:

  1. Reusable workflows change the sub claim. The OIDC JWT subject reflects the calling repo, not the reusable workflow's repo. IAM trust policies must match the caller.

  2. Missing permissions: id-token: write on the calling job.

  3. Audience mismatch:

- uses: aws-actions/configure-aws-credentials@v4
  with:
    audience: sts.amazonaws.com  # must match trust policy
    role-to-assume: arn:aws:iam::123456789012:role/MyRole
    aws-region: us-east-1
Enter fullscreen mode Exit fullscreen mode

Cross-Repo Access (403)

Error:

remote: Permission to other-org/other-repo.git denied to github-actions[bot].
Enter fullscreen mode Exit fullscreen mode

Root cause: GITHUB_TOKEN is scoped to a single repository. It cannot access other repos — this is a security boundary by design.

Fix: Use a GitHub App installation token (recommended) or a PAT:

- uses: actions/create-github-app-token@v1
  id: app-token
  with:
    app-id: ${{ vars.APP_ID }}
    private-key: ${{ secrets.APP_PRIVATE_KEY }}
    repositories: "target-repo"

- uses: actions/checkout@v4
  with:
    token: ${{ steps.app-token.outputs.token }}
    repository: org/target-repo
Enter fullscreen mode Exit fullscreen mode

Environment Protection Rules Blocking Deployments

Error:

This deployment was rejected
Enter fullscreen mode Exit fullscreen mode

Root cause: The triggering ref doesn't match the environment's allowed branches/tags filter, or the required reviewer also triggered the workflow (GitHub doesn't allow self-approval).

Fix: Ensure the triggering ref matches the environment's branch filter pattern. Add a second reviewer if the triggering user is the sole required reviewer.

GitHub App Token Generation Failures

Error:

error:0909006C:PEM routines:get_name:no start line
Enter fullscreen mode Exit fullscreen mode

Root cause: Private key corrupted during shell escaping or base64 encoding.

Fix: Store the raw PEM file directly as a GitHub secret:

gh secret set APP_PRIVATE_KEY < my-app.private-key.pem
Enter fullscreen mode Exit fullscreen mode

Use actions/create-github-app-token@v1 (official, node20-native) instead of tibdex/github-app-token.

Docker Registry Auth (GHCR)

Error:

denied: installation not allowed to Write organization package
Enter fullscreen mode Exit fullscreen mode

Fix:

  1. Add permissions: packages: write to the job
  2. For org packages: visit package settings → Manage Actions Access → add the repository with Write access
  3. Don't set DOCKER_CONFIG: $HOME/.docker at job level — it breaks credential persistence

Dependabot Secrets Namespace

Root cause: Dependabot runs in a separate secrets namespace. Repository secrets are not available to Dependabot-triggered workflows.

Fix: Add secrets to both namespaces:

gh secret set NPM_TOKEN --body "npm_xxx" --app actions
gh secret set NPM_TOKEN --body "npm_xxx" --app dependabot
Enter fullscreen mode Exit fullscreen mode

PAT vs. GITHUB_TOKEN Decision Matrix

Scenario Use
Push to same repo GITHUB_TOKEN + contents: write
Create PR on same repo GITHUB_TOKEN + pull-requests: write
Push to different repo GitHub App token or PAT
Trigger another workflow PAT (GITHUB_TOKEN can't trigger workflows)
Cross-org operations Classic PAT with repo scope

Prefer GitHub App tokens over PATs: PATs are tied to individuals (leave org = token breaks), expire, and are harder to audit.


Caching, Artifacts & Dependencies

Cache Miss Despite Recent Save

Error:

Cache not found for input keys: Linux-node-abc123def456
Enter fullscreen mode Exit fullscreen mode

Root causes:

  1. Branch scoping: Caches from main are accessible to branches, but not vice-versa
  2. Version mismatch: Changing OS or compression tool changes the cache version hash
  3. Rate limiting: 429s fall through silently as "cache not found"
  4. Infrastructure outage: Check githubstatus.com

Fix: Always prime cache on the default branch first. Use the List Caches API to debug version mismatches.

cache-hit Output Semantics

# ❌ Wrong — cache-hit is empty string (not 'false') on full miss
if: steps.cache.outputs.cache-hit == 'false'

# ✅ Correct — always use != 'true'
if: steps.cache.outputs.cache-hit != 'true'
Enter fullscreen mode Exit fullscreen mode

cache-hit is 'true' on exact key match, empty string on miss, and 'false' on restore-keys match. Yes, really.

Cache Size Limit (10 GB Per Repo)

Symptom: Random cache misses on older branches.

Root cause: Repos have a 10 GB total cache limit. Oldest caches are LRU-evicted silently.

Fix: Clean up branch caches on PR close:

on:
  pull_request:
    types: [closed]
jobs:
  cleanup:
    runs-on: ubuntu-latest
    permissions:
      actions: write
    steps:
      - run: |
          for id in $(gh cache list --ref refs/pull/${{ github.event.pull_request.number }}/merge \
            --limit 100 --json id --jq '.[].id'); do
            gh cache delete $id
          done
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          GH_REPO: ${{ github.repository }}
Enter fullscreen mode Exit fullscreen mode

upload-artifact v3 → v4 Breaking Changes

Error:

An artifact with the same name already exists for the associated workflow run.
Enter fullscreen mode Exit fullscreen mode

Root cause: v4 artifacts are immutable. Multiple jobs can no longer upload to the same artifact name.

Fix:

# v4 — unique names per matrix job
- uses: actions/upload-artifact@v4
  with:
    name: build-${{ matrix.os }}-${{ matrix.node }}

# Download all and merge
- uses: actions/download-artifact@v4
  with:
    pattern: build-*
    merge-multiple: true
    path: dist/
Enter fullscreen mode Exit fullscreen mode

Cross-Workflow Artifact Download

Error:

Unable to download artifact(s): Artifact not found for name: my-artifact
Enter fullscreen mode Exit fullscreen mode

Fix: Both upload and download must use the same version family (v3↔v3 or v4↔v4 — they use different storage backends):

- uses: actions/download-artifact@v4
  with:
    name: my-artifact
    github-token: ${{ secrets.GITHUB_TOKEN }}  # required for cross-workflow
    run-id: ${{ github.event.workflow_run.id }}
Enter fullscreen mode Exit fullscreen mode

npm ci Cache Save Timeout

Error:

The operation was canceled.
Enter fullscreen mode Exit fullscreen mode

Root cause: Cache save (tar compression) on large node_modules exceeds the job timeout. Missing zstd in DinD containers forces slow gzip fallback.

Fix: Cache ~/.npm (the npm cache directory), not node_modules:

- uses: actions/cache@v5
  with:
    path: ${{ steps.npm-cache-dir.outputs.dir }}
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
Enter fullscreen mode Exit fullscreen mode

For DinD environments, install zstd: apt-get install -y zstd.

Docker Layer Caching

Error:

cache export feature is currently not supported for docker driver
Enter fullscreen mode Exit fullscreen mode

Fix: You must use docker/setup-buildx-action first — the default Docker driver doesn't support cache export:

- uses: docker/setup-buildx-action@v3

- uses: docker/build-push-action@v6
  with:
    cache-from: type=gha,scope=${{ github.workflow }}
    cache-to: type=gha,mode=max,scope=${{ github.workflow }}
Enter fullscreen mode Exit fullscreen mode

Cache Corruption

Error:

tar: Error is not recoverable: exiting now
gzip: stdin: unexpected end of file
Enter fullscreen mode Exit fullscreen mode

Fix: Delete the corrupt cache via CLI:

gh cache list --repo owner/repo
gh cache delete <cache-id> --repo owner/repo
Enter fullscreen mode Exit fullscreen mode

Prevent future corruption with a download timeout:

env:
  SEGMENT_DOWNLOAD_TIMEOUT_MINS: 5
Enter fullscreen mode Exit fullscreen mode

Git LFS Files Not Downloaded

Symptom: Binary files are 140-byte text pointers instead of actual content.

Fix:

- uses: actions/checkout@v4
  with:
    lfs: true
    fetch-depth: 1
Enter fullscreen mode Exit fullscreen mode

Cache LFS objects to reduce bandwidth:

- uses: actions/cache@v5
  with:
    path: .git/lfs
    key: ${{ runner.os }}-lfs-${{ hashFiles('.lfsconfig') }}
Enter fullscreen mode Exit fullscreen mode

Lockfile Hash Returns Empty String

Error:

Cache not found for input keys: Linux-node-
Enter fullscreen mode Exit fullscreen mode

Root cause: hashFiles('**/package-lock.json') matched no files, returning empty string.

Fix: Debug with:

- run: |
    echo "Hash: ${{ hashFiles('**/package-lock.json') }}"
    find . -name "package-lock.json" -not -path "*/node_modules/*"
Enter fullscreen mode Exit fullscreen mode

Correct patterns per ecosystem:

# npm
key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
# pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements*.txt', '**/pyproject.toml') }}
# Gradle
key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*', '**/gradle-wrapper.properties') }}
Enter fullscreen mode Exit fullscreen mode

Trigger Problems

Workflow Not Triggering At All

No error. No run appears.

Root causes (in priority order):

  1. Workflow file is not on the default branch
  2. YAML syntax error (silently rejected)
  3. Branch filter mismatch (branches: [master] but default is main)
  4. Workflow disabled via UI or inactivity
  5. Commit made by GITHUB_TOKEN (won't trigger downstream)

Fix:

# Check workflow state
gh workflow list
gh workflow view "My Workflow"
Enter fullscreen mode Exit fullscreen mode

workflow_dispatch Button Not Showing

Root causes:

  1. Workflow file not on default branch (most common)
  2. No write access to repository
  3. Wrong YAML indentation:
# ❌ Wrong — nested under push
on:
  push:
    branches: [main]
    workflow_dispatch:      # indented under push

# ✅ Correct — sibling of push
on:
  push:
    branches: [main]
  workflow_dispatch:        # same level as push
Enter fullscreen mode Exit fullscreen mode

Cron Schedule Running Late or Not Running

Root cause: GitHub does not guarantee cron timing. During high load, scheduled runs can be delayed by hours or skipped entirely. Minimum interval is 5 minutes. Public/free-tier repos are deprioritized. All times are UTC.

A real-world case: workflow configured for */10 * * * * (expected ~144 runs/day), but only 4 runs fired in 32 hours.

Fix: For time-sensitive operations, use an external cron service to trigger workflow_dispatch via API. Accept a ±1 hour SLA for GitHub-hosted scheduled workflows.

workflow_run Not Firing

Root causes:

  1. The listener workflow must be on the default branch
  2. workflows: ["CI Build"] must exactly match the source workflow's name: field
  3. Missing types: [completed] — without it, fires on both start and finish
  4. Source workflow triggered by GITHUB_TOKEN (recursion prevention)

Fix:

on:
  workflow_run:
    workflows: ["CI Build"]     # exact match to name: in source workflow
    types: [completed]

jobs:
  post-build:
    if: github.event.workflow_run.conclusion == 'success'
Enter fullscreen mode Exit fullscreen mode

repository_dispatch Returns 204 But Workflow Doesn't Run

Root cause: API returns 204 even when event_type doesn't match — the mismatch is silent.

Fix: Verify event_type exactly matches the workflow's types::

on:
  repository_dispatch:
    types: [docker-image-updated]  # must EXACTLY match API call
Enter fullscreen mode Exit fullscreen mode

Path Filters Not Working as Expected

Root cause: paths: and paths-ignore: are mutually exclusive — using both on the same event is not supported. docs (without /**) matches a file literally named docs, not the directory.

Fix:

# Correct: ignore docs directory
on:
  push:
    paths-ignore:
      - 'docs/**'
      - '*.md'
Enter fullscreen mode Exit fullscreen mode

Tag Push vs. Release Published

Trigger When It Fires Use Case
push: tags: [v*] On tag push Binary build
release: types: [created] Release created Build + draft release
release: types: [published] Explicit publish Deploy to prod

Concurrency & Timing

Jobs Cancelled Unexpectedly

Root cause: Overly broad concurrency group key. Using group: ${{ github.workflow }} alone means all runs compete, even on different branches.

Fix:

# PR workflows — cancel stale runs on same PR
concurrency:
  group: ci-${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

# Production deploys — queue, never cancel
concurrency:
  group: deploy-production
  cancel-in-progress: false

# Branch-sensitive — cancel only on non-default branches
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
Enter fullscreen mode Exit fullscreen mode

Empty head_ref Causing Cross-Branch Cancellation

Root cause: github.head_ref is empty for push events. All push-triggered runs get the same group key and cancel each other.

Fix:

concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Enter fullscreen mode Exit fullscreen mode

Job needs Failure Cascading

Symptom: A downstream job is Skipped even though you want it to run after upstream failure.

Root cause: Default if: on every job is success(), meaning "only run if ALL needs jobs succeeded."

Fix:

# Always run (notifications, cleanup)
final-job:
  needs: [job-a, job-b]
  if: always()
  steps:
    - if: contains(needs.*.result, 'failure')
      run: exit 1
Enter fullscreen mode Exit fullscreen mode

Default Timeout is 6 Hours

Root cause: A hung test suite silently consumes a runner for 6 hours.

Fix: Always set timeout-minutes at the job level:

jobs:
  test:
    timeout-minutes: 20
    steps:
      - run: npm test
        timeout-minutes: 10
Enter fullscreen mode Exit fullscreen mode

Matrix include vs. exclude Confusion

Key insight:

  • include entries that match ALL existing keys add properties to the existing row — they don't create a new job
  • include entries that match NO existing cell create a new job
  • exclude requires ALL keys to exist in the base matrix — unknown keys are silently ignored
  • Max 256 matrix jobs per workflow run
strategy:
  fail-fast: false  # strongly recommended for diagnostics
  matrix: ${{ fromJSON(needs.prepare.outputs.matrix) }}
Enter fullscreen mode Exit fullscreen mode

Dynamic Matrix and Required Status Checks

The problem: Matrix job names like test (ubuntu-latest, 16) change when matrix values change. Branch protection requires exact string matches — no wildcards.

Fix: Add a stable summary job and require that instead:

test-summary:
  needs: [test]
  if: always()
  runs-on: ubuntu-latest
  steps:
    - if: needs.test.result != 'success'
      run: exit 1
Enter fullscreen mode Exit fullscreen mode

Known Unsolved Problems

These are confirmed platform limitations with no clean workaround. Understanding them saves hours of debugging dead ends.

No SSH / Interactive Debugging (#241 — 107 👍, open since 2019)

The runner has no TTY allocated. Interactive debugging is not possible natively. Workarounds like mxschmitt/action-tmate open SSH reverse tunnels but are a security risk (session URL is in public logs).

No Step-Level Retry

There's no native retry: 3 syntax on steps. Use nick-fields/retry for run: steps, or a bash loop:

for i in 1 2 3; do
  flaky-command && break || sleep 15
done
Enter fullscreen mode Exit fullscreen mode

No Early-Exit / Step Flow Control (#662 — 1,031 👍)

The highest-voted open runner issue. You cannot exit a job early with a specific conclusion (success/neutral). Every step must use if: guards to skip, creating verbose YAML.

Reusable Workflows Cannot Be Called from Composite Actions

Composite actions are inlined steps on the parent runner. Calling a reusable workflow (which spawns a separate runner) from inside a composite action is architecturally impossible without a lifecycle model redesign.

No services: or container: in Composite Actions (ADR 0549)

By architectural decision. Service containers require Docker lifecycle management at the job level — composite actions don't have job-level lifecycle.

Secret Masking Edge Cases (#475 — 68 👍, open since 2020)

::add-mask:: echoes the secret value before the mask takes effect. Short secrets (1-3 chars) cause entire log lines to become ***. Base64 and URL-encoded versions of secrets may not be masked.

Cost/Billing Opacity

No per-workflow, per-job, or per-repository breakdown of Actions minutes. The billing page shows total org-level usage. Use gh api /repos/{owner}/{repo}/actions/runs/{id} for approximate per-run duration.


Essential Tooling

actionlint — The Single Most Impactful Tool

rhysd/actionlint catches the majority of syntax, context, and type errors in this guide before you push:

# Install
go install github.com/rhysd/actionlint/cmd/actionlint@latest
# Or brew install actionlint

# Run
actionlint

# In CI
- uses: raven-actions/actionlint@v2
Enter fullscreen mode Exit fullscreen mode

It validates: YAML syntax, expression types, context availability, matrix configurations, reusable workflow inputs/outputs, shell script syntax, and action version compatibility.

Online Playground

Don't want to install anything? Use the actionlint playground — paste your workflow YAML and get instant feedback.

Debug Logging

Enable debug logging for any workflow run:

  1. Go to the failed run → "Re-run all jobs" → check "Enable debug logging"
  2. Or set repository variable ACTIONS_STEP_DEBUG to true (adds ##[debug] output to all steps)

gh CLI for Debugging

# List workflow runs
gh run list --workflow ci.yml

# View specific run logs
gh run view <run-id> --log

# Download logs for grep
gh run view <run-id> --log | grep 'error'

# List and delete caches
gh cache list
gh cache delete <id>

# Check workflow state
gh workflow list
gh workflow enable "Workflow Name"
Enter fullscreen mode Exit fullscreen mode

Cross-Reference: Related Guides

If you're working with GitHub Actions in the context of platform engineering and DevOps automation, these related articles go deeper on specific patterns:


Resources

Every error message, workaround, and fix in this guide is sourced from real GitHub Issues, official documentation, and architecture decision records:

This guide covers the scenarios that have cost me and thousands of other developers the most debugging hours. If your specific error isn't here, open an issue or reach out on LinkedIn — I'll add it to the next update.

Top comments (0)