As a Django application developer, you quite often encountered a situation where you would like to have the ability to perform periodical asynchronous background tasks. It comes in handy if you want to create some background checks, send notifications or build cache.
Motivation
My first choice was the installation of the django-celery-beats. I was quite satisfied with the result. I was able to dynamically configure my periodic task according to the user configuration because Celery read execution configuration from the database.
On the other hand, my application now has a dependency on the celery
(which has to run as a separated service) and Redis
server (which celery uses as message broker).
The containerization of the application became weird. I was not sure, if I am supposed to include the celery
service inside of the container or as a dependency using docker-compose. I also thought it's ridiculous that I need to have Redis
instance only to perform periodical tasks. I also wanted to reduce the size of the hypervisord configuration.
I wanted to keep flexibility but reduce the number of dependencies and configuration boilerplate.
Solution
I get rid of the celery
in the applications where it was not necessary (I was not using the rest of the great celery features). I just wanted to load periodicity from the django.conf.settings or the database.
I use Alpine Linux as a base image for my applications. The base operating systems already can handle the execution of the periodical tasks. It's good old crond
. The configuration is described in the Alpine Linux docs.
I created a simple Django management command called setup
in my application. Let us also assume I have another Django management command called popularity
which is supposed to run every five minutes. In this example I will read a configuration from the Django settings variable CRON_JOBS
which can look like this:
CRON_JOBS = {
'popularity': '*/5 * * * *'
}
The variable consists of the Django management command
name and periodicity
pairs. The command will create a crond
configuration according to it using python-crontab library. This command is supposed to be run before the first run and every time if configuration changes (keep in mind as you are already creating CRON job rules from the management command, you can use ORM to read configuration from the database).
# management/commands/setup.py
from crontab import CronTab
from django.core.management import BaseCommand
class Command(BaseCommand):
cron = CronTab(tabfile='/etc/crontabs/root', user=True)
cron.remove_all()
for command, schedule in settings.CRON_JOBS.items():
job = cron.new(command='cd /usr/src/app && python3 manage.py {}'.format(command), comment=command)
job.setall(schedule)
job.enable()
cron.write()
Keep in mind, if you are loading the data from the database, you have to perform this change once again using the setup
management command. You can achieve this using call_command method.
from django.core import management
def save_config():
# do whatever you want
management.call_method('setup')
Creating container
As I said before, my Django applications are based on the Alpine Linux containers and are executed from the entrypoint.sh
which is responsible for (in mentioned order):
- executing
migrations
, - executing our
setup
management command which creates an initial CRON job configuration, - and initialize the
supervisord
service (which will managegunicorn
andcrond
service).
supervisord configuration
I use supervisord
to manage the execution of the gunicorn application server and the crond service.
If the application is located in the /usr/src/app
directory and gunicorn
is installed in /root/.local/bin/gunicorn
the could supervisor.conf
look like this:
[supervisord]
nodaemon=true
[program:gunicorn]
directory=/usr/src/app
command=/root/.local/bin/gunicorn -b 0.0.0.0:8000 -w 4 my_app.wsgi --log-level=debug --log-file=/var/log/gunicorn.log
autostart=true
autorestart=true
priority=900
[program:cron]
directory=/usr/src/app
command=crond -f
autostart=true
autorestart=true
priority=500
stdout_logfile=/var/log/cron.std.log
stderr_logfile=/var/log/cron.err.log
If you are interested in the details of the configuration, don't hesitate to ask me in the comments or check the hypervisord documentation.
Dockerfile
The minimal Dockerfile
has to contain at least the:
- copying the application source code,
- installing the dependencies,
- copying the configuration,
- execution of the entry-point.
FROM alpine:3.15
WORKDIR /usr/src/app
# Copy source
COPY . .
# Dependencies
RUN apk add --no-cache python3 supervisor
RUN pip3 install --user gunicorn
RUN pip3 install --user -r requirements.txt
# Configuration
COPY conf/supervisor.conf /etc/supervisord.conf
# Execution
RUN chmod +x conf/entrypoint.sh
CMD ["conf/entrypoint.sh"]
entrypoint.sh
#!/bin/sh
# Wait until the PostgreSQL is ready
until PGPASSWORD=$DATABASE_PASSWORD psql -h "$DATABASE_HOST" -U "$DATABASE_USER" -c '\q'; do
>&2 echo "Postgres is unavailable - sleeping"
sleep 1
done
# Execute migrations
python3 manage.py migrate
# Execute our setup management command which installs CRON jobs
python3 manage.py setup
# Execute supervisord service
supervisord -c /etc/supervisord.conf
For complete example check the EvilFlowersCatalog/EvilFlowersCatalog repository.
Top comments (2)
Can be used only with docker, without supervisor?
Hi @andreesnavarroo, first of all: sorry I haven't noticed your comment. To answer your question: sure, there are multiple strategies on how to achieve it without supervisord. You can use OpenRC for example. You still need to run the crond somehow. Check the Alpine Linux FAQ for a brief example. I can write a small blog post about it if you wish. Solution is connected with the base operating system you are running in your container.