Today, while working on an in-house project, I encountered a really interesting problem. I needed a python script, running every 30 minutes, pulling some information from a third party, processing the data, updating on my local database & take a rest till the next round. I wrote the script and set-up the cron.
Smooth right!
But, my happiness didn't last long. Sometimes, my script took more than 30 minutes to execute. This presented me with a beautiful issue of cron jobs overlapping & data duplication. I didn't want the jobs to start stacking up over each other.
Ahh. Cute concurrency problem.
To fix this, like any other developer, a couple of thoughts popped up in my mind.
- Modify my python script, use some internal package to list down all running processes & grep if the same cron is already running. If yes, maybe it's not a good time to run it.
- Why not look for the existence of a particular file
mylock.txt
and exit if it exists or create it if it doesn't?
Both solutions seemed pretty lousy & unsafe. And touching a working code is my biggest nightmare.
Our internal discussion, headed me over to a beautiful tool, Flock.
So, What's flock?
Flock is a very easy & simple tool. This tiny utility comes by default with the util-linux
package.
Its mechanism is pretty neat and simple. For execution, it takes a lock file
& command
to run as input. It puts a lock on a given lock file and releases the lock when the script is executed. Lock on the file helps the tool decide, whether to run the script or not, in the next round.
Just to add here, file locking is a mechanism to restrict access to a file among multiple processes. It allows only one process to access the file at a specific time.
How to setup a cron job, using flock ?
Setting up a cron using flock is pretty simple.
Step 1 - Install flock, if not available in your system
yum install -y util-linux
You can verify if flock has been installed by
whereis flock
in linux system. It should show/usr/bin/flock
as a path.
Step 2 - Open up cron tab
crontab -e
Step 3 - Install your new cron
*/30 * * * * /usr/bin/flock -w 0 /home/myfolder/my-file.lock python my_script.py
And you are done.
The moment flock starts, it locks the my-file.lock
file & if in next round, the previous cron is already running, it will not the script again.
Don't worry about
my-file.lock
, flock will create it for you if it doesn't exist.
To verify the lock, try -
fuser -v /home/myfolder/my-file.lock
So, my crontab entry looked like this -
*/30 * * * * /usr/bin/flock -w 0 /home/myfolder/my-file.lock python my_script.py > /home/myfolder/mylog.log 2>&1
Well, calm down. I know, I have added some random texts to my cron. Let's decode the meaning of >/home/myfolder/mylog.log 2>&1
one by one.
-
>
is standard I/O redirection. -
/home/myfolder/mylog.log
is a black hole where any data is sent -
2
is the file descriptor for standard error (STDERR) -
>
, again for redirect -
&
symbol for file descriptor -
1
is file descriptor for standard output (STDOUT)
2>&1
means a redirection of channel 2 (STDERR) to channel 1 (STDOUT) so both outputs are now on the same channel 1. >/home/myfolder/mylog.log
means, output from channel 1 will be sent to this black hole.
To sum it up, output & errors are generated while the execution of your script will go to this file.
The unfortunate truths of flock
How to run multiple commands with flock
I had an interesting use case. Due to some system absolute path-related stuff inside my python script, I had to run the script as a combination of two commands.
Instead of -
python /home/myfolder/script.py
I needed to do -
cd /home/myfolder/ && python script.py
Running multiple commands with the help of flock, is a bit tricky. After a bit of struggle, this one worked for me.
*/30 * * * * cd /home/myfolder/ /usr/bin/flock -w 0 /home/myfolder/my-file.lock && python my_script.py > /home/myfolder/mylog.log 2>&1
flock doesn't always seem to be working.
Flock does advisory locking, which is a cooperative locking scheme which means you will be able to override the lock if you don't cooperate.
It has been raised many times that if, the flock is used to invoke a command in a subshell, other programs seem to be able to read/write to the locked file. This issue on stackoverflow talks about this in detail.
Top comments (0)