DEV Community

cloud-sky-ops
cloud-sky-ops

Posted on

Mastering GitHub Actions: Insights and pitfalls of "if conditions"

As a DevOps engineer, I live inside CI/CD pipelines. GitHub Actions has become one of my go-to tools, and I’ve spent countless hours tweaking workflows, battling edge cases, and learning lessons the hard way. One of the powerful offering of Actions is the use of if conditions on jobs and steps.

In this post, I’ll walk you through the main if conditions available, share a real-world scenario where things went sideways for me, and explain how I fixed it. Hopefully, this saves you from a few hours of frustration the next time you’re wiring up your own pipelines.


The Essential if Conditions in GitHub Actions

Here are the most common conditional expressions you’ll run into:

  • if: success() – Runs the job only if all dependencies succeeded.
  • if: failure() – Runs the job only if at least one dependency failed.
  • if: cancelled() – Runs the job only if a dependency was cancelled.
  • if: always() – Runs the job regardless of the success/failure of dependencies. The catch with always is that it will also run the job is the workflow is cancelled by the user. So always means ALWAYS, no exceptions.

Due to this, it safer (and recommended by GitHub) to use the following option:

  • if: !cancelled() – Runs the job unless a dependency was cancelled (my personal favorite fallback for robust pipelines).

recommendation

These look straightforward, but once you combine them with needs: and multi trigger workflows, like I did, a real quirks emerges.


When if: ${{ !cancelled() }} Bit Me Back

I was working on a CI/CD workflow that needed to support two types of pushes: to main and to release/* branches. The requirement was simple enough, there were 4 jobs:

  • run-test-cases: runs for all branches.
  • run-script-on-main: runs only when the branch is main.
  • docker-build-push: should run on both branches. On main, it must wait for run-script-on-main to complete. On release/*, it should still run regardless of the run-script-on-main job being skipped
  • trigger-deployment: runs after docker-build-push (and run-test-cases) for both main and release/*.

Here’s the workflow:

name: CI/CD Pipeline

on:
  push:
    branches:
      - main
      - release/*

jobs:
  run-test-cases:
    runs-on: ubuntu-latest
    steps:
      - run: ./scripts/run-tests.sh

  run-script-on-main:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - run: ./scripts/main-only.sh

  docker-build-push:
    needs: [run-test-cases, run-script-on-main]
    if: ${{ !cancelled() }}
    runs-on: ubuntu-latest
    steps:
      - run: ./scripts/docker-build-push.sh

  trigger-deployment:
    needs: [run-test-cases, docker-build-push]
    runs-on: ubuntu-latest
    steps:
      - run: ./scripts/trigger-deployment.sh
Enter fullscreen mode Exit fullscreen mode

The Problem

What I expected:

  • On main: run-script-on-main runs, then docker-build-push, then trigger-deployment.
  • On release/*: skip run-script-on-main, run docker-build-push, then trigger-deployment.

What actually happened:

  • On main: ✅ everything worked fine.
  • On release/*: ❌ docker-build-push ran, but trigger-deployment was skipped.

This was confusing to me because trigger-deployment didn’t even depend on run-script-on-main (using needs section).


DAG Construction & Pruning (root cause)

In GitHub Actions, the entire workflow is modeled as a Directed Acyclic Graph (DAG):

  • Directed: The flow of execution moves in one direction, from dependencies to dependents.
  • Acyclic: There are no cycles; a job cannot depend on itself (directly or indirectly).
  • Graph: Jobs are nodes, and needs: defines the edges (dependencies) between them.

This DAG determines the exact order in which jobs are executed, skipped, or pruned. GitHub Actions builds the DAG at workflow parsing time before it executes any jobs.


Example

Imagine three jobs:

jobs:
  A:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Job A"

  B:
    needs: A
    runs-on: ubuntu-latest
    steps:
      - run: echo "Job B"

  C:
    needs: [A, B]
    runs-on: ubuntu-latest
    steps:
      - run: echo "Job C"
Enter fullscreen mode Exit fullscreen mode

The DAG looks like this:

A → B → C
Enter fullscreen mode Exit fullscreen mode
  • A runs first.
  • Once A succeeds, B can run.
  • Once both A and B succeed, C can run.

If A fails, both B and C are automatically skipped because their prerequisites were never satisfied.


Practical Example in GitHub Workflows

Take a workflow where lint and unit-tests must run in parallel, and build depends on both.

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - run: npm run lint

  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - run: npm test

  build:
    needs: [lint, unit-tests]
    runs-on: ubuntu-latest
    steps:
      - run: npm run build
Enter fullscreen mode Exit fullscreen mode

Here the DAG is:

[ lint && build ] → unit-tests
Enter fullscreen mode Exit fullscreen mode
  • If both lint and unit-tests succeed, build executes.
  • If either lint or unit-tests fails (or gets skipped), build won’t run unless you explicitly allow it with if: always() or if: !cancelled().

DAG in Practice: Why It Matters

The problem I hit with if: ${{ !cancelled() }} happened because DAG pruning is strict.

  • When a job is skipped, it’s not just “ignored”—it still exists as a node in the DAG.
  • Any job that needs it inherits that skipped state unless conditions explicitly override it.
  • That’s why my trigger-deployment got skipped even though its direct needs (docker-build-push, run-test-cases) were satisfied: GitHub propagated the “skipped upstream” status through the graph.

👉 Understanding DAGs is the key to predicting workflow behavior in GitHub Actions. If you model jobs as a dependency graph instead of thinking in linear “step order,” it becomes much clearer why some jobs run, skip, or silently disappear.


The Fix

After banging my head on figuring out the why, I started looking into the workarounds, I realized the simplest and most robust solution was to avoid making docker-build-push depend on an optional job. Instead, I folded the branch-specific logic into a step:

docker-build-push:
  needs: [run-test-cases]
  runs-on: ubuntu-latest
  steps:
    - run: ./scripts/docker-build-push.sh
    - if: github.ref == 'refs/heads/main'
      run: ./scripts/main-only.sh
Enter fullscreen mode Exit fullscreen mode

Now the dependency chain is clean:

  • No skipped optional jobs breaking the DAG.
  • docker-build-push always runs.
  • On main, the extra script runs as a conditional step.

This avoided the skipped-job propagation entirely and made the workflow behave exactly as intended.


Key Takeaways

  1. Be cautious with needs: when optional jobs are involved.
  2. if: ${{ !cancelled() }} is safer than if: always(), but it can still create tricky dependency issues.
  3. When in doubt, prefer conditional steps inside a single job over branching jobs with if:. It keeps the DAG simpler and avoids hidden propagation quirks.

In the end, this was another lesson in the “everything is a DAG” world of GitHub Actions. The more predictable your graph, the fewer surprises you’ll get down the road. Thanks for making it till the end. Do share your thoughts and any alternate approaches on the solution. Have a beautiful day.

Top comments (0)