There comes a time in every sysadmins life when he needs to let his scripts take on a life of their own. They start working on big jobs for hours on end, and it becomes unreasonable to keep staring at the terminal to make sure everything is ok.
But, like any good helicopter-parent will tell you, you can't trust your
kids scripts to do the right thing! Your job is to bring them up correctly, point them in the right direction and hope make sure they make the right decisions. Here's a handy check-list to put the mental fun back in fundamental!
One of the more common gotchas when writing scripts is assuming the startup directory of the script. You can't afford to be vague like that! Don't let them guess the context, hard-code it in.
For example, you're testing your script locally, and it's looking great! You are now prepared to say "It works on machine!" even after it breaks in production! But that won't save you once your script just does not start from the crontab, because of something simple, like a bad log path.
Ways to approach this:
- Define the absolute paths within constants on top of the script
- Include the paths from the environment
- Or get the current directory path from within the script
- Require the "start path" be provided when running the script as an argument
So, while you're looking directly at the output as it's happening, it behaves wonderfully! Your
child script is a well-behaved and productive member of its environment. Surely you can let them do things on their own?
If you're not paranoid at this point, you're just unaware of all the things that can go wrong! From paths and permissions and all the way to broken loops and undefined behaviour, there's a myriad of things that can break - and you should have a paper trail of it happening.
The last thing you want is having to debug a major outage by having to cause another one!
If you're running your script as a cronjob, it's easy to log the output:
0 9 * * * bash /var/scripts/check_messages.sh >/var/scripts/logs/check_messages.log 2>&1
The above will run the script once a day, but will write to the same log file every day, overwriting it. So you can get yesterdays output. That's the least you can do. Also notice the absolute path for the log file as well! We're not messing around.
Alternatively you can have logging within the script itself. A few things to keep in mind:
- Use timestamps everywhere! You need to know when something happened.
- If you think it's needed, include email alerts, so you're notified about critical issues on time.
- If you're keeping most of your logs, use
logrotate. You don't want to run out of disk space because of your logs.
Say your script is some kind of polling script, or something that needs to run forever. That's not something you should be starting and stopping from the crontab. It's also not ideal to get a notification just when it dies - it should restart on it's own as well.
Supervisordto the rescue!
This is an awesome tool you need to be using in these cases. It's like the best personal trainer out there! He never sleeps and just runs after your processes yelling "No slacking! Keep running!".
For example, I've recently had issues with the Docker deamon dying from time to time. Obviously, me jumping into the server to start it up again was unproductive, so I added this to the
supervisord config file:
[program:dockerd] command=dockerd autostart=true autorestart=true redirect_stderr=true stdout_logfile=/var/log/dockerd-supervisor.log
This tells it to supervise the
dockerd process, and to restart it when it dies. It also logs what's happening to a log file we can use for debugging later on.
And this works with your own scripts as well! It's also neat if you have a script that only runs for 1 hour and needs to start back up again. This way you avoid having potential same-process overlap like when using cronjobs.
Eventually, you'll want to be able to tell the fruit of your
loins keyboard to hold on a second while you reconsider what's been happening. If you don't plan for this ahead of time you might end up sending hard-kill signals like
SIGKILL which can have nasty consequences.
Bad things that could happen by killing a script mid-way:
- File permissions stay wrong
- Corrupted files or database data
- Orphan processes and half-processed data
- Unreleased resources
To avoid the above, you should put in a
SIGTERM handler. You can do this in most any scripting language, I even have it working in my PHP scripts.
This way the script can finish what it's doing, process it's current batch of data, finish writing that CSV file line, and release all the used resources. So, when you write
killall -s SIGTERM dockerd you let it finish what it's doing before it stops.
Also, minor side-note: if you have
sleep() code in your script, the sleeping will be skipped to speed things up after it receives the
SIGTERM signal. Speaking of sleeping...
The great(and terrible) thing about your scripts running is that they will try to get the most out of the resources available. So if they're transferring a 1TB file over the network, you'll wish you'd changed the WiFi password!
They need to realize that they're not the only one in the
flat server and that they need to clean up after themselves, as well as not hog the utilities.
There's a few guidelines you can go by to make sure they play nice:
- When transferring over the network, use upper limits. For example
--bwlimitparameter for this.
- When bombarding the database with queries, give it some leeway - sleep the script from time to time.
- When writing heavily to the disk, make sure it's not hogging the disk that the database uses.
- If possible, have it run during off-peak hours, and monitor the server performance.
So that's my list of "script" lessons I've learned the hard way - so you don't have to! Do share your tips in the comments as well, and point me (and others) to other nice resources on the subject.
You can never be too careful with them!
Mandatory disclamer: I do not condone helicopter parenting when it involves real children as opposed to child processes.