As a DevOps engineer, I spend a lot of time trying to make CI/CD pipelines bulletproof. GitHub Actions is powerful, but for some cases you need to throw in some custom automation logic to bring your use case to completetion. Recently, I ran into one of those this should have been simple
problems that ended up forcing me deep into Bash territory.
In this post, I’ll share how I solved it, the tricks I learned along the way, and why those tricks are now staples in my automation toolkit.
The Problem Scenario
I needed to build a GitHub Workflow that:
- Loops through all the changed files in a PR.
- Runs a service-specific script based on which directories were touched.
- Ensures scripts are run only once per unique service.
Sounds easy, right? But here’s what I ran into:
- Filenames and directories contained spaces, dashes, and special characters, which caused word-splitting nightmares.
- Multiple directories had overlapping file changes, and I needed to deduplicate them cleanly before execution.
Naive attempts with for file in $(git diff …)
blew up instantly when spaces or weird characters showed up. Deduplication with sort | uniq
worked halfway, but it wasn’t reliable when integrated into the workflow.
I also thought of introducing path filters in the workflow trigger to target specific directories.
on:
pull_request:
paths:
- "services/service-a/**"
But it would require updates to the workflow whenever new directories are added, so it didn't seem scalable.
I needed a bash-powered solution that was safe, idempotent, and CI-friendly.
Step-by-Step Troubleshooting & Resolution
Step 1: Getting the changed files
Inside the workflow, I grabbed the changed files with:
git diff --name-only origin/main...HEAD > changed_files.txt
That gave me a newline-delimited list. Great — until filenames with spaces appeared.
Step 2: Handling spaces and special characters
I realized I couldn’t rely on newlines alone. I switched to NUL-terminated strings, which led to the first crucial trick.
Step 3: Deduplicating directories
Next, I needed to strip each path down to its top-level directory, then deduplicate. A bit of awk
and sort -u
magic did the trick.
Step 4: Iterating safely in CI
Finally, I wrote a loop to iterate over each unique directory and run the associated script. The key was keeping it robust against any edge cases.
Advanced Bash Trick #1: NUL-Separated Processing
Before jumping into the fix, let’s talk about what this means.
Normally, when you run a command like git diff --name-only
, it prints filenames separated by newlines (\n
). That works most of the time — until you run into filenames with spaces, quotes, tabs, or even emojis. Bash sees those spaces and thinks they’re “separators” between different items, which can break loops in subtle ways.
That’s where NUL-separated processing comes in.
- A NUL (
\0
) character is a special invisible character that represents “nothing.” It’s guaranteed never to appear in a valid filename on Unix-like systems. - If we use NULs instead of newlines as separators, Bash can handle structured data safely, no matter how strange the entries look.
- Commands like
git diff
,xargs
, andsort
all have flags (-z
or-0
) to enable this mode.
Think of it like replacing a fragile delimiter (newline) with an unbreakable one (NUL).
Here’s the heart of the fix:
git diff --name-only -z origin/main...HEAD | \
xargs -0 -n1 dirname | \
awk -F/ '{print $1}' | sort -u -z > changed_dirs.txt
Why this works
-
-z
makesgit diff
output NUL (\0
) separated strings instead of newline-separated ones. -
xargs -0
ensures even entries with spaces, quotes, or emojis (!) are handled safely. - By the time we hit
sort -u -z
, we have a clean, deduplicated list of changed top-level directories.
Real-world use cases
NUL-separated processing is useful well beyond CI workflows:
- Iterating over structured data with spaces/special chars: e.g., parsing JSON keys piped into Bash.
jq -r '.keys[]' file.json | tr '\n' '\0' | xargs -0 -n1 echo
- Handling user input or filenames from external sources: bulk renaming photos with spaces in names.
- Safely processing logs or configs: when entries might contain whitespace or unusual delimiters.
Advanced Bash Trick #2: Process Substitution
Let’s pause for a second — because this one looks weird the first time you see it.
In Bash, when you want to feed the output of one command into another, you normally use a pipe (|
) or redirect to a temporary file. But sometimes, you need the output to “pretend” to be a file itself.
That’s where process substitution comes in.
- It uses the syntax
< <(command)
to tell Bash: “Run this command, and treat its output as if it were a file I can read from.” - This avoids writing temp files to disk.
- It makes loops and comparisons cleaner.
Think of it as creating a fake file on the fly, backed by a running process.
Here’s how I used it in the workflow loop:
while IFS= read -r -d '' dir; do
echo "Running checks for $dir"
./scripts/run-checks.sh "$dir"
done < <(git diff --name-only -z origin/main...HEAD \
| xargs -0 -n1 dirname \
| awk -F/ '{print $1}' \
| sort -u -z)
Why this works
-
< <(...)
is process substitution, which feeds the output of a command directly into a loop as if it were a file. - No need for temporary files (
changed_dirs.txt
). - Works seamlessly inside GitHub Actions runners.
Real-world use cases
Process substitution shines in many contexts:
- Streaming structured data directly into loops: e.g., looping through a filtered list of Kubernetes resources.
while read pod; do kubectl logs "$pod"; done < <(kubectl get pods -o name)
- On-the-fly comparisons:
diff <(ls dir1) <(ls dir2)
- Data processing pipelines: feeding transformed logs into analysis scripts without creating temp files.
- Parallel workflows: redirecting different process outputs into the same loop for consolidated handling.
Final Workflow Snippet
Here’s the cleaned-up version that now powers my workflow:
jobs:
detect-and-run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run service-specific checks
run: |
while IFS= read -r -d '' dir; do
echo "Running checks for $dir"
./scripts/run-checks.sh "$dir"
done < <(git diff --name-only -z origin/main...HEAD \
| xargs -0 -n1 dirname \
| awk -F/ '{print $1}' \
| sort -u -z)
Key Takeaways
- Don’t trust whitespace in CI — structured data can (and will) break naive loops. Always handle it safely.
-
NUL-separated processing (
-z
,xargs -0
,read -d ''
) is your best friend for robustness. - Process substitution makes scripts cleaner, avoids temp files, and opens doors for powerful inline comparisons.
- While GitHub’s path filters look appealing, they fall short in complex repos with multiple services. A Bash-driven approach offers the flexibility and resilience needed for real-world CI/CD.
Since adopting these patterns, I’ve reused them in multiple workflows — from linting to selective deployments — and they’ve held up every time.
Have you ever hit whitespace or multi-service workflow issues in GitHub Actions? I’d love to hear how you solved them in the comments! Thank you for your time.
Top comments (0)