Skip to content

Newbie at bash scripting? Here's some advice

Richard Lenkovits on February 03, 2019

When I started using bash for scripting I couldn't wrap my head around it first. I'm relatively young and the reason behind my confusion was probab... [Read Full]
markdown guide

When writing scripts for automation and while testing them it's also a good idea to include x for your set -> set -exuo pipefail

This makes it output the command being executed, so e.g. when variable contents differ from your expectation or similar you can see it from the output, and debugging time is significantly reduced.


Modifying the script to make it output debug info is a little.… meh.
You wouldn't want to re-compile nginx to get INFO instead of WARNING output in your error log, right? ;-)

The simple way would be having your script not use -x at all, but instead just call it using bash -x, that honours all the options set inside the script and is an on-demand way to print the debug info.


People make mistakes, and I'm talking about scripts used in automation.

If your habit is to put the x on all scripts used in automation, when you have an error you can just look at the logs to have a decent idea of what's going on. If you don't, you'll first have to figure out which scripts to even put the bash -x or similar to, make that change, and then trigger another build hoping that it was the right place.

There's quite a big difference between outputting what a bash script is doing in an automation system vs. having a long running service print it's debugging output to your system logs.

Ah, that makes sense of course.
Didn't figure which kind of scripts you meant exactly.

I agree that for example a build-system (Travis CI, GitLab CI, buildbot,.…) should run most scripts with -x (exceptions exist, most notably, those that have lots of pipes produce rather unreadable debug-output then).
However, if those scripts are the same ones used for e.g. (manually) deploying the application, then arguably the -x output should be suppressed by default since a manually triggered deployment should produce as little output as possible.
The build system can always call the script using bash -x notation if needed then.

There's quite a big difference between outputting what a bash script is doing in an automation system vs. having a long running service print it's debugging output to your system logs.

Depending on your "automation system", a shell script and a long running service should both have proper logging facilities that permit them to log different output to different data-sinks.
A script running in a build-system may then produce artifacts with different log-levels each and only output relevant information in the user-visible output.
I've had to scroll through an eternity of brightly coloured output too often, that was obscuring the real error, often bothersome, sometimes even misleading, because relevant build-information was not properly separated from the tooling's own output (where the tooling equates to your -x).
This holds true for build-systems, linting, and most things running in a CI system.

If you don't plan for the "when something goes wrong"-path, then instead of being a minor nuisance, a failed deployment can take down your website for hours, because your deployment didn't just say "running database migration" and "deploying to first container" followed by "health check failed, aborting" so that everyone knows what to do next (check whether the first container has spun down correctly, eventually rollback the database migration), three sysadmins are busy for an hour going through your monstrous output, trying to find the actual error.
Sure, when they finally find it, they immediately know what error it was that made the health check fail and they can report it properly without consulting another logfile, but your site might have been down for an hour by that time, because the database migration was faulty.

I sure hope that

1) you have a way to rollback failing releases instead of having your production down for an hour because you have a typo somewhere in your database migrations or similar
2) you don't build your massive deployment systems incl. migrations etc. with shell scripts

Yes, there's different log levels for different things, and since shell scripting is shell scripting it shouldn't really be used to build massive scripts, and also because it's shell scripting and prone to all kinds of mysterious and obscure errors everywhere you benefit a lot from having the -x enabled.

Instead you should use something more akin to e.g. Python (maybe using libraries such as Invoke or Pynt), possibly Ruby/Go/Rust or other such language with tooling that helps you build solid CLI tools.

And then you should use the best tools like Spinnaker, or e.g. Azure DevOps's release pipelines, to orchestrate your releases on the higher level so you don't have to reinvent the wheel while building your release tooling.


The set -euo pipefail only really works for bash 4.4 and newer, you might want to double check there.

I myself am pretty fluent in writing safe shell-code, and the most important advice I can give is "quote everything everywhere, anytime".
The remaining pitfalls are arbitrary rules that you cannot really put in generic advice (like echo not being safe to print variables).

My go-to resource would be the one I linked to above:
It's long, yes, but you decided to go for something that's historically grown when you decided to use the shell, so be ready for long lists of arbitrary rules you have to follow that seem useless or redundant, but make a difference.

code of conduct - report abuse