Vladimir Alinsky

Posted on Sep 23, 2023 • Edited on Jan 27

Mastering Python Script Scheduling: Pitfalls and Solutions

#python #devops #automation #tutorial

This article is about pitfalls, solutions, and what I've come to in scheduling Python scripts.

Imagine the following case: periodically, you have to execute updating SQLs using ORM models from your codebase with intermediate processing. For example, you want to detect inactive users by side factors like actions they've taken or not taken within a certain time frame once per day and mark them as inactive. This scenario often occurs in data-driven applications where you need to maintain the integrity and accuracy of your data.

Or a different one,

Every Monday at 10 AM your script must send internal reports on Slack to every manager about their employees' non-compliant Key Performance Indicators. This scenario is a common requirement in organizations striving to monitor and improve performance across various teams and departments.

What could go wrong?

Async code
If your codebase contains asynchronous parts that you'd like to reuse, you need to consider async support. I wouldn't recommend relying solely on the standard asyncio.run approach because you'll forfeit the advantages of asynchrony, and your code will operate like regular synchronous code.

Concurrent execution
If you have more than one job, you'll have to deal with parallel job execution. However, a common issue that arises is time shifting, as the scheduler may need to wait until one job finishes before it can move on to the next. This can potentially lead to delays in executions, especially when jobs have varying runtimes or when you have many jobs.

Resource leakage
There are different ways to start jobs, like using separate Python threads, individual processes, or event loops. The choice of initiation method can have a significant impact on CPU and memory utilization when you have a whole lot of jobs.

Restart intolerance
Inability to be resistant to reboots without affecting job intervals. In other words, after reboot, you get a time shift because of interrupting clocking. For example, you have a job whose interval is 3 hours. After 2 hours of clocking the CI/CD pipeline rebuilt the image and recreated the container to update the code because you pushed new commits. If the scheduler starts clocking right from the start, you'll get 2 hours of clocking before the reboot, and 3 after. So, in this case, the interval between executions is 5 hours and you have a significant time shift. Sometimes it might be critical.

Duplicate parallel executions
If the execution time of a job is more than an interval between executions or it varies, you may easily get a situation when the scheduler starts new executions of the job before the previous execution ends. It leads to job multiplication, resource leakage, and even a critical system shutdown due to overload. This case must be considered if you have short intervals or long-running jobs.

Localization
If you have a need to execute jobs at different times by different time zones on different days, it might be tricky to configure. For example, one job must be executed at 11 AM by New York on Monday only, a different one at 5 PM by Berlin on weekends, and the last one every 3 hours starting by Tokio on Friday only.

So, a lot could go wrong:)

Approaches

Let's look at possible ways to overcome pitfalls in order of their effectiveness.

Schedule Python lib

Schedule is the most basic scheduler for Python and the first one that you'll find if start googling.

It looks interesting, but the main disadvantage of this lib is that it's not designed to really use it somewhere. According to the official documentation:

You should probably look somewhere else if you need:

Job persistence (remember schedule between restarts)

Exact timing (sub-second precision execution)

Concurrent execution (multiple threads)

Localization (workdays or holidays)

Schedule does not account for the time it takes for the job function to execute.

Anyway, let's look at an example, pros and cons for further comparison.

import schedule
import time

def job():
    print("I'm working...")

schedule.every().day.at("10:30").do(job)

while True:
    schedule.run_pending()
    time.sleep(1)

Pros

Simple

Cons

No async code support
No concurrent execution
Restart intolerance
No localization

Cron Unix utility

Cron is the most popular general-purpose scheduler in the world. This scheduler is universal and starts jobs as a shell command. Take note that Cron starts a separate process for every job and this may take a lot of system resources.

Here you need a crontab file:

# m h dom mon dow user  command
*/30 * * * *      /usr/local/bin/python /path/to/the/script.py >> /var/log/cron.log 2>&1

In turn, the script can contain whatever your heart desires:

if __name__ == "__main__":
    print("Whatever your heart desires")

Pros

Concurrent execution
Versatility

Cons

No async code support
Resource leakage
Restart intolerance
Duplicate parallel executions
No localization

Regta Python utility

Regta is a scheduling tool designed with these pitfalls in mind especially for Python. The key advantage is that it has async, multithreading, and multiprocessing support just like restart tolerance.

from regta import async_job, Period

@async_job(Period().on.sunday.at("18:35").by("Asia/Almaty"))
async def my_async_job():
    pass  # Do some stuff here

To run it use regta run command.

Pros

Async code support
Concurrent execution
Restart tolerance
Localization

Cons

No strong community

Summary

When it comes to scheduling Python programs, the first option, Schedule, may not be the most suitable choice for solving real-world problems. Instead, consider the following points:

For internal automation tasks that involve various programming languages and CLI tools, Cron stands as a robust choice.

If your automation needs are focused on Python exclusively, Regta emerges as a compelling option, offering a wealth of Python-specific optimizations. Give it a try, and feel free to share your thoughts in the comments 🙌

DEV Community

Mastering Python Script Scheduling: Pitfalls and Solutions

What could go wrong?

Approaches

Schedule Python lib

Cron Unix utility

Regta Python utility

Summary

Useful links

Top comments (0)

Read next

7 Must-Try Open-Source Tools for Python and JavaScript Developers 🚀

Optimizing Large-Scale Data Processing in Python: A Guide to Parallelizing CSV Operations

How To Setup Password Hash Synchronization In Microsoft Azure

Building and Deploying TypeScript Microservices to Kubernetes