DEV Community

Cover image for Poor man's parallel in Bash
Florian Stolzenhain
Florian Stolzenhain

Posted on

Poor man's parallel in Bash

Original post on my blog, happy to include feedback!
Cover: Paris bibliothéques, via Clawmarks

Topics:

  1. Running scripts in parallel
  2. Tools to limit concurrent jobs
  3. Shell process handling cheatsheet
  4. Limit concurrent jobs with Bash
  5. Bonus: One-liners

1) Running scripts in parallel

does not take much effort: I've been speeding up builds by running commands simultaneously with an added & ampersand:

# stuff can happen concurrently
# use `&` to run in a sub-shell
cmd1 &
cmd2 &
cmd3 &
# wait on sub-processes
wait
# these need to happen sequentially
cmd3
cmd4

echo Done!
Enter fullscreen mode Exit fullscreen mode

Job control is a shell feature: commands are put into a background process and run at the same time.

Now assuming you want to loop over more than a few commands, e.g. converting files:

for file in *.jpg; do
    # start optimizing every file at once
    jpegoptim -m 45 "${file}" &
done
# finish queue
wait
Enter fullscreen mode Exit fullscreen mode

Running a lot of processes this way is still faster than a regular loop. But compared to just a few concurrent jobs there are no speed gains – even possible slowdowns on async disk I/O [Quotation needed].

So you'll want to use

2) Tools to limit concurrent jobs

by either 1) installing custom tools like parallel or xjobs or 2) relying on xargs, which is a feature-rich tool but more complicated.

Transforming wait to xargs code is described here: an example for parallel batch jobs. The article notes small differences between POSIX flavours – e.g. different handling of separators on BSD/MacOS.

We'll be choosing option 3) – digging into features of wait and jobs to manage processes.

Quoting this great summary, here are some example commands for

3) Shell process handling

# run child process, save process id via `$!`
cmd3 & pid=$!
# get job list
jobs
# get job ids only
# note: not available on zsh
jobs -p
# only wait on job at position `n`
# note: slots may turn up empty while
#       newer jobs rest in the queue's tail
wait %n
# wait on last job in list
wait %%
# wait on next finishing process
# note: needs Bash 4.3
wait -n
Enter fullscreen mode Exit fullscreen mode

Taking our example from before, we make sure to

4) Limit concurrent jobs with Bash

each time a process is finished using wait -n:

for file in *.jpg; do

    jpegoptim -m 45 "${file}" &

    # still < 3 max job -l ines? continue loop
    if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi

    # with 3 jobs, wait for -n ext, then loop
    wait -n

done
# finish queue
wait
Enter fullscreen mode Exit fullscreen mode

Sadly, this won't work in MacOS, because Bash environments are frozen on old versions. We replace the wait -n command with wait %% to loop on the 3rd/last job in queue – an ok compromise on small groups (1/3 chance of fastest/slowest/medium job):

for file in *.jpg; do

    jpegoptim -m 45 "${file}" &

    # still < 3 max job -l ines? continue loop
    if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi

    # with 3 jobs, wait for last in line, then loop
    wait %%

done
# finish queue
wait
Enter fullscreen mode Exit fullscreen mode

To further develop the code, one could check for Bash version or alternative shells (zsh on MacOS) to switch code depending on context. I keep using these:

5) Bonus: One-liners

# sequential, slow
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" ; done )

# concurrent, messy
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & done; wait )

# concurrent, fast/compatible
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait %%; done; wait )

# concurrent, fastest
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait -n; done; wait )
Enter fullscreen mode Exit fullscreen mode

Fun Fact

As the 20th birthday post by parallel author Ole Tange explains, the original version was leveraging make because it allows asynchronous processes as well.

Top comments (0)