cloud-sky-ops

Posted on Sep 9

Bash Solutions: NUL and process substitutions

#bash #devops #automation

As a DevOps engineer, I spend a lot of time trying to make CI/CD pipelines bulletproof. GitHub Actions is powerful, but for some cases you need to throw in some custom automation logic to bring your use case to completetion. Recently, I ran into one of those this should have been simple problems that ended up forcing me deep into Bash territory.

In this post, I’ll share how I solved it, the tricks I learned along the way, and why those tricks are now staples in my automation toolkit.

The Problem Scenario

I needed to build a GitHub Workflow that:

Loops through all the changed files in a PR.
Runs a service-specific script based on which directories were touched.
Ensures scripts are run only once per unique service.

Sounds easy, right? But here’s what I ran into:

Filenames and directories contained spaces, dashes, and special characters, which caused word-splitting nightmares.
Multiple directories had overlapping file changes, and I needed to deduplicate them cleanly before execution.

Naive attempts with for file in $(git diff …) blew up instantly when spaces or weird characters showed up. Deduplication with sort | uniq worked halfway, but it wasn’t reliable when integrated into the workflow.

I also thought of introducing path filters in the workflow trigger to target specific directories.

on:
  pull_request:
    paths:
      - "services/service-a/**"

But it would require updates to the workflow whenever new directories are added, so it didn't seem scalable.

I needed a bash-powered solution that was safe, idempotent, and CI-friendly.

Step-by-Step Troubleshooting & Resolution

Step 1: Getting the changed files

Inside the workflow, I grabbed the changed files with:

git diff --name-only origin/main...HEAD > changed_files.txt

That gave me a newline-delimited list. Great — until filenames with spaces appeared.

Step 2: Handling spaces and special characters

I realized I couldn’t rely on newlines alone. I switched to NUL-terminated strings, which led to the first crucial trick.

Step 3: Deduplicating directories

Next, I needed to strip each path down to its top-level directory, then deduplicate. A bit of awk and sort -u magic did the trick.

Step 4: Iterating safely in CI

Finally, I wrote a loop to iterate over each unique directory and run the associated script. The key was keeping it robust against any edge cases.

Advanced Bash Trick #1: NUL-Separated Processing

Before jumping into the fix, let’s talk about what this means.

Normally, when you run a command like git diff --name-only, it prints filenames separated by newlines (\n). That works most of the time — until you run into filenames with spaces, quotes, tabs, or even emojis. Bash sees those spaces and thinks they’re “separators” between different items, which can break loops in subtle ways.

That’s where NUL-separated processing comes in.

A NUL (\0) character is a special invisible character that represents “nothing.” It’s guaranteed never to appear in a valid filename on Unix-like systems.
If we use NULs instead of newlines as separators, Bash can handle structured data safely, no matter how strange the entries look.
Commands like git diff, xargs, and sort all have flags (-z or -0) to enable this mode.

Think of it like replacing a fragile delimiter (newline) with an unbreakable one (NUL).

Here’s the heart of the fix:

git diff --name-only -z origin/main...HEAD | \
  xargs -0 -n1 dirname | \
  awk -F/ '{print $1}' | sort -u -z > changed_dirs.txt

Why this works

-z makes git diff output NUL (\0) separated strings instead of newline-separated ones.
xargs -0 ensures even entries with spaces, quotes, or emojis (!) are handled safely.
By the time we hit sort -u -z, we have a clean, deduplicated list of changed top-level directories.

Real-world use cases

NUL-separated processing is useful well beyond CI workflows:

Iterating over structured data with spaces/special chars: e.g., parsing JSON keys piped into Bash.

  jq -r '.keys[]' file.json | tr '\n' '\0' | xargs -0 -n1 echo

Handling user input or filenames from external sources: bulk renaming photos with spaces in names.
Safely processing logs or configs: when entries might contain whitespace or unusual delimiters.

Advanced Bash Trick #2: Process Substitution

Let’s pause for a second — because this one looks weird the first time you see it.

In Bash, when you want to feed the output of one command into another, you normally use a pipe (|) or redirect to a temporary file. But sometimes, you need the output to “pretend” to be a file itself.

That’s where process substitution comes in.

It uses the syntax < <(command) to tell Bash: “Run this command, and treat its output as if it were a file I can read from.”
This avoids writing temp files to disk.
It makes loops and comparisons cleaner.

Think of it as creating a fake file on the fly, backed by a running process.

Here’s how I used it in the workflow loop:

while IFS= read -r -d '' dir; do
  echo "Running checks for $dir"
  ./scripts/run-checks.sh "$dir"
done < <(git diff --name-only -z origin/main...HEAD \
          | xargs -0 -n1 dirname \
          | awk -F/ '{print $1}' \
          | sort -u -z)

Why this works

< <(...) is process substitution, which feeds the output of a command directly into a loop as if it were a file.
No need for temporary files (changed_dirs.txt).
Works seamlessly inside GitHub Actions runners.

Real-world use cases

Process substitution shines in many contexts:

Streaming structured data directly into loops: e.g., looping through a filtered list of Kubernetes resources.

  while read pod; do kubectl logs "$pod"; done < <(kubectl get pods -o name)

On-the-fly comparisons:

  diff <(ls dir1) <(ls dir2)

Data processing pipelines: feeding transformed logs into analysis scripts without creating temp files.
Parallel workflows: redirecting different process outputs into the same loop for consolidated handling.

Final Workflow Snippet

Here’s the cleaned-up version that now powers my workflow:

jobs:
  detect-and-run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run service-specific checks
        run: |
          while IFS= read -r -d '' dir; do
            echo "Running checks for $dir"
            ./scripts/run-checks.sh "$dir"
          done < <(git diff --name-only -z origin/main...HEAD \
                    | xargs -0 -n1 dirname \
                    | awk -F/ '{print $1}' \
                    | sort -u -z)

Key Takeaways

Don’t trust whitespace in CI — structured data can (and will) break naive loops. Always handle it safely.
NUL-separated processing (-z, xargs -0, read -d '') is your best friend for robustness.
Process substitution makes scripts cleaner, avoids temp files, and opens doors for powerful inline comparisons.
While GitHub’s path filters look appealing, they fall short in complex repos with multiple services. A Bash-driven approach offers the flexibility and resilience needed for real-world CI/CD.

Since adopting these patterns, I’ve reused them in multiple workflows — from linting to selective deployments — and they’ve held up every time.

Have you ever hit whitespace or multi-service workflow issues in GitHub Actions? I’d love to hear how you solved them in the comments! Thank you for your time.

DEV Community

Bash Solutions: NUL and process substitutions

The Problem Scenario

Step-by-Step Troubleshooting & Resolution

Step 1: Getting the changed files

Step 2: Handling spaces and special characters

Step 3: Deduplicating directories

Step 4: Iterating safely in CI

Advanced Bash Trick #1: NUL-Separated Processing

Why this works

Real-world use cases

Advanced Bash Trick #2: Process Substitution

Why this works

Real-world use cases

Final Workflow Snippet

Key Takeaways

Top comments (0)