So, you've got a shiny new shell script courtesy of ChatGPT, Copilot, or your favorite AI. It looks good, it even feels good. But that nagging doubt creeps in: "Is this thing really safe to run in production?"
This is the world of unit testing shell scripts generated by LLMs – a world where the stakes are high, sudo is a double-edged sword, and a single misplaced rm -rf can ruin your entire day. This post provides a battle-tested strategy to safely test and validate scripts that manage critical services like PM2, Docker, Nginx, or anything interacting with your system's state.
The Perils of Trusting LLM-Generated Shell Scripts
Large Language Models (LLMs) are fantastic for quickly generating shell scripts. However, even the best LLMs are prone to:
- Making assumptions about your environment: They might assume specific package installations or directory structures that don't exist on your server.
-
Using incorrect binary names: For example, using
pgrep -x PM2instead of the correctpm2. -
Overlooking side effects: Commands like
systemctl restart dockeraren't always harmless; they can cause unexpected downtime.
Even if the script's logic is 90% correct, that remaining 10% can lead to:
- Services restarting at the wrong time.
- Data written to incorrect log paths.
- Broken idempotency (repeated runs causing unintended changes).
That's why robust unit testing is crucial – not in the traditional pytest sense, but using shell-native methods to verify logic and safety.
Strategy 1: Embrace the --dry-run Mode
Every LLM-generated script should include a --dry-run flag. This allows you to preview the script's actions without executing them.
Here's how to implement it:
DRY_RUN=false
[[ "$1" == "--dry-run" ]] && DRY_RUN=true
log_action() {
echo "$(date): $1"
$DRY_RUN && echo "[DRY RUN] $1" || eval "$1"
}
# Example usage:
log_action "sudo systemctl restart nginx"
This approach provides traceable and reversible operations, letting you inspect the intended actions before execution.
Strategy 2: Mock External Commands
You don't want docker restart or pm2 resurrect running during your tests. We can override these commands using mocking:
- Create a
mock-bindirectory:mkdir mock-bin - Create a mock
dockerscript:
echo -e '#!/bin/bash\necho "[MOCK] $0 $@"' > mock-bin/docker
chmod +x mock-bin/docker
- Add the mock directory to your
PATH:export PATH="$(pwd)/mock-bin:$PATH"
Now, any call to docker will output a harmless message instead of interacting with your containers. Repeat this process for other potentially disruptive commands like systemctl, pm2, and rm.
This technique, borrowed from the excellent Bash Automated Testing System (BATS), allows for isolated and safe testing.
Strategy 3: Leverage shellcheck
LLMs sometimes make mistakes with quoting, variables, or command usage. shellcheck is your invaluable ally here.
Simply run:
shellcheck myscript.sh
shellcheck will identify:
- Unquoted variables (
"$var"vs$var). - Incorrect command usage.
- Malformed
ifconditions.
Think of it as a linter for your shell scripts, ensuring their structural integrity.
Strategy 4: Modularize with Functions
Break your script into smaller, testable functions:
check_pm2() {
ps aux | grep '[P]M2' > /dev/null
}
restart_all() {
pm2 resurrect
docker restart my-app
systemctl restart nginx
}
This allows you to mock and call these functions individually within a test harness, avoiding the need to run the entire script each time.
Strategy 5: Log Everything (Seriously!)
Log every decision point. Why? Because "works on my machine" is unhelpful when a container fails to restart or PM2 silently exits.
log() {
echo "$(date '+%F %T') [LOG] $1" >> /var/log/pm2_watchdog.log
}
Comprehensive logging provides crucial debugging information when things go wrong.
Strategy 6: Sandbox Your Tests
If you have access to Docker or a virtual machine, create a replica environment to run your tests. It's far better to break a test server than your production system!
For example:
docker run -it ubuntu:20.04
# Then install necessary packages: pm2, docker, nginx, etc.
Bonus: Useful Tools
- BATS: A powerful Bash unit testing framework.
- shunit2: An xUnit-style testing framework for POSIX shells.
- assert.sh: A simple shell assertion helper.
- shellspec: A full-featured, RSpec-like testing framework.
Final Thoughts: Test Before You Trust
It's tempting to simply run an LLM-generated script, but in production environments, especially those managing critical services, testing is paramount. Use dry-run flags, mock commands, employ shellcheck, add comprehensive logging, and test in a sandbox. Prioritize safety – your sanity and uptime will thank you!
💬 Your thoughts?
Did this help you? Have questions? Drop a comment below!
🔗 Read more
Full article on our blog with additional examples and resources.

Top comments (0)