DEV Community

Cover image for 🐚 My DevOps Journey: Part 5 — Shell Scripting Lessons from Real Troubleshooting
Sheersh Sinha
Sheersh Sinha

Posted on

🐚 My DevOps Journey: Part 5 — Shell Scripting Lessons from Real Troubleshooting

After setting up cronjobs in Part 4 , I realized typing commands manually wasn’t scaling. In the real world, systems don’t wait for you to type — they break when you’re asleep. That’s when I leaned on Shell Scripting.

But instead of learning it the “classroom way,” I picked it up through real troubleshooting scenarios. Each concept — variables, arguments, loops — came alive only when I faced a problem I had to solve.

Here’s how scripting became my real-world rescue tool.

1️⃣ Writing My First Script — The “On-Call Repetition Nightmare”

Scenario: During my internship, disk alerts kept firing at midnight. Every time, I’d log in and type df -h manually. It was exhausting.

Solution:

I wrote a script called diskcheck.sh:

#!/bin/bash
df -h
Enter fullscreen mode Exit fullscreen mode

Now, instead of typing commands half-asleep, I just ran:

./diskcheck.sh

Enter fullscreen mode Exit fullscreen mode

👉 Lesson: Scripts are your second brain during on-call duty.

2️⃣ Variables — The “Multiple Servers, One Script” Challenge

Scenario: I had to check connectivity for 3 different servers. My first script hardcoded the IP. If the server changed, the script broke.

Solution:

Use variables:

Later, I swapped the value, and the script worked everywhere.

👉 Lesson: Variables save scripts from becoming useless with changing infrastructure.

3️⃣ Operators — The “Disk Space Alert” Problem

Scenario: A staging server ran out of space, crashing the app. I needed a script to check disk usage and alert only when it crossed 80%.

Solution:

USED=$(df / | grep / | awk '{ print $5 }' | sed 's/%//g')
THRESHOLD=80

if [ $USED -gt $THRESHOLD ]; then
  echo " Disk usage critical: $USED%"
fi

Enter fullscreen mode Exit fullscreen mode

👉 Lesson: Operators + conditions = automated monitoring without external tools.

4️⃣ Read User Input — The “Backup Flexibility” Problem

Scenario: My backup script only archived /home/, but once I needed /etc/ configs too. Editing the script every time was painful.

Solution: Make the script ask:

echo "Enter path to backup:"
read PATH
tar -cvf backup.tar $PATH
Enter fullscreen mode Exit fullscreen mode

Now I could run it for any directory.

👉 Lesson: Interactivity makes scripts versatile across environments.

5️⃣ Functions — The “Scattered Logging Mess”

Scenario: I had multiple scripts spitting logs in random formats. Debugging failures became a nightmare.

Solution: I created a reusable logging function:

log() {
  echo "[INFO] $(date): $1"
}
log "Backup started"
log "Backup completed"


Enter fullscreen mode Exit fullscreen mode

Now all my scripts spoke the same language.

👉 Lesson: Functions bring consistency to messy automation.

6️⃣ Shell vs sh vs Bash — The “Works on Ubuntu, Breaks on Alpine” Problem

Scenario: A deployment script ran fine on Ubuntu but failed inside a lightweight Alpine Docker container.

The issue: Alpine used /bin/sh, which didn’t support certain Bash features.

Solution: Always specify the interpreter:

#!/bin/bash

Enter fullscreen mode Exit fullscreen mode

👉 Lesson: Explicit is better than implicit. Always define your shell.

7️⃣ Conditionals — The “Don’t Delete While Running” Disaster

Scenario: A cleanup script started wiping logs while nginx was still running. The service crashed.

Solution: Add a safety check:

if pgrep nginx; then
  echo "Nginx running. Skipping cleanup."
else
  rm -rf /var/log/nginx/*
fi

Enter fullscreen mode Exit fullscreen mode

👉 Lesson: Conditionals are the guardrails that keep automation safe.

8️⃣ Arguments — The “One Script Per Environment” Problem

Scenario: I had separate scripts for dev, test, and prod backups. Unmanageable.

Solution:

ENV=$1
echo "Backing up $ENV environment..."
tar -cvf backup_$ENV.tar /var/$ENV/
Enter fullscreen mode Exit fullscreen mode

Run with:

./backup.sh dev
./backup.sh prod

Enter fullscreen mode Exit fullscreen mode

👉 Lesson: Arguments make scripts scale across environments.

9️⃣ Loops — The “50 Log Files” Problem

Scenario: Compressing logs one by one? Impossible during an outage.

Solution:

for file in /var/log/*.log; do
  gzip $file
done

Enter fullscreen mode Exit fullscreen mode

👉 Lesson: Loops are the ultimate time-savers for bulk operations.

🎭 The Advanced Turning Point: When My Script Nearly Wiped Production

Late one night, I was asked to “clean up temp files.” I rushed and wrote:

  • But I forgot my script was running as root inside a container mount. It began deleting critical shared files.
  • I pulled the brakes with Ctrl+C, but the damage was real — some mounted configs were gone.

How I Recovered:

  • Restored configs from backup.
  • Added dry-run safety in my scripts:
echo "Would delete: $file"

Enter fullscreen mode Exit fullscreen mode
  • before actually running rm.
  • Enforced set -e and logging for all critical scripts.

👉 Lesson: Scripts are scalpels — powerful, precise, and dangerous. Treat them with respect.

🌟 Key Takeaways from Part 5

  • Shell scripting concepts come alive only through real problems.
  • Variables, arguments, loops → not “syntax,” but solutions to DevOps pain.
  • Conditionals and functions prevent disasters.
  • Always test in a sandbox before production.
  • Scripts aren’t toys — they’re system lifelines.

🚀 What’s Next (Part 6: Networking Basics)

Now that I can script my way out of problems, the next step is to make sure systems can talk to each other.

In Part 6, I’ll explore:

  • Computer Networking Overview
  • OSI Model
  • LAN, Switch, Router, Subnet, Firewall
  • Cloud Networking
  • Microservices Networking

🤝 Over to You

What was your biggest shell scripting disaster? Did you ever run a command that made you sweat? Share it 👇 — maybe we can all save someone else from repeating it.

Top comments (0)