Mehedee Siddique

Posted on Sep 18, 2023 • Updated on Sep 19, 2023

Configuring The Celery App

#python #celery #tutorial #conf

This post is the 2nd of a multipart series on Celery, find the first one here Introduction to Celery

In this post we'll explore Celery configurations and in the process learn a bit about how tasks should be designed.

To recap, lets first set up the Celery app instance

from celery import Celery


# Broker(RabbitMQ) connection str
CELERY_BROKER: str = (
    f"pyamqp://user:Pass1234@rabbitmq:5672//"
)

# Result Backend(Redis) connection str
CELERY_BACKEND: str = f"redis://redis:6379"


# Celery App instance
celery_app = Celery(
    __name__, broker=CELERY_BROKER, backend=CELERY_BACKEND, include=['app.tasks']
)

Lets talk about some configurations now.

task_acks_late (`boolean`)

celery_app.conf.task_acks_late = False

Disabled by default, meaning the worker acknowledges the task as executed right before it starts execution. So, in any case the worker crashes while executing the task, the task won't be redelivered to the worker again. If this setting is enabled, any task lost because of worker crash while in the middle of task execution will be delivered to the worker again by the broker.
This begs the question, why is the setting not enabled by default? To explain, we need to understand what an ideal task looks like. In an ideal scenario, a task should be idempotent. It means, when we send a task to the broker, no matter how many times the task is delivered to the worker for execution(due to worker crash or task retry on failure), the end result is always the same. Now, it is on us developers to write an idempotent task because Celery can't detect if it is not. Also, in reality, it might not always be possible to write an idempotent task. If we enable task_acks_late on a non-idempotent task, we might face unwanted consequences because of the multiple execution of that task. For this reason, assuming our tasks are not idempotent, Celery keeps it disabled by default. If we make sure that our tasks are idempotent, we can take the benefit of enabling this setting and, it is recommended to do so. Read more on this here
Applying this setting on application level would apply it on all tasks but, we can also specify this setting in task level with @celery_app.task(acks_late=True) parameter which will then take precedence over the app level setting for this task only.
My preference is, we keep this setting disabled as it is by default and, for tasks we are sure to be idempotent and must be redelivered to the worker in case of any crash, we enable this setting in task level. Also, I like to explicitly add this config to the app as disabled to make the codebase more explanatory.

task_ignore_result (`boolean`)

celery_app.conf.task_ignore_result = False

Disabled by default, meaning task results won't be saved to the result backend even if you have the backend configured. Enabling this setting would mean results for all tasks will be saved in the backend.
Saving task result in the backend adds and extra overhead to performance, memory and storage. In my opinion, we don't need results for most tasks and, if your code is mostly dependent on task results, you might not be taking the asynchronous execution benefit of Celery. For example, you are handling a newsletter subscribe post request and, have triggered an email sending task from your web application and, waiting for the task to finish in the worker before moving on further with request. In this case your code will still be behaving as if the email sending task was executed in the current application process, not in a separate worker. Having said that, there will still be many valid cases where keeping and tracking task results will be necessary or required(some Celery Canvas Workflows are dependent on task results).

This setting is also available in task level and gives task level setting precedence for that task.

My preference is to keep the default setting as disabled and enable for specific tasks where it is necessary. We should also check task_store_errors_even_if_ignored.

result_expires (`int(seconds) | timedelta`)

celery_app.conf.result_expires = 300

24 hours by default. After the configured time the result will be deleted from the result backend. We should set the minimum time we need to access the result. Setting a larger value unnecessarily will add performance, memory and storage overheads.

worker_concurrency (`int`)

celery_app.conf.worker_concurrency = 2

Number of CPU cores by default. This setting defines how many concurrent processes the worker will use to execute tasks when concurrency pool is not defined or prefork is used explicitly. If concurrency pool is either eventlet or gevent, then this setting denotes the number of green threads spawned.

Concurrency pool of Celery is a different discussion and there is a nice article about it. The rule of thumb is, For CPU bound tasks use prefork concurrency and for I/O bound tasks, use gevent/eventlet. Don't use more number than available CPU cores with prefork(default) pool but, for gevent/eventlet pool, you can use hundreds or even thousands.

broker_pool_limit (`int | None`)

celery_app.conf.broker_pool_limit = 30

10 connection pools by default. This setting denotes the maximum number of connections can be opened in a connection pool. This number can be allocated depending on how many threads/green-threads(eventlets/gevent) are accessing a connection.

worker_prefetch_multiplier (`int`)

celery_app.conf.worker_prefetch_multiplier = 5

4 by default, meaning the worker will prefetch 4 task messages per concurrency, e.g. if worker_concurrency is 4 and worker_prefetch_multiplier is 5 then, 20 tasks in total will be prefetched by the worker.
Prefetching tasks helps us reduce the time a worker stays idle between completing a task and starting the next one. Also, prefetching too much tasks could cause a performance overhead.

This is mostly all the configs we need in most cases and I hope I didn't miss any. Let me know if you think I should have mentioned any other configs. We should also go through Celery's complete list of configs and play with them a bit to understand better.

See you in the next post.

DEV Community

Configuring The Celery App

task_acks_late (`boolean`)

task_ignore_result (`boolean`)

result_expires (`int(seconds) | timedelta`)

worker_concurrency (`int`)

broker_pool_limit (`int | None`)

worker_prefetch_multiplier (`int`)

Top comments (0)

Read next

Anthropic Claude with tools using Python SDK

How to Bring Your Own Domain in SaaS Platforms?

Tkinter library in Python

The Tale of Tailwind CSS and React

task_acks_late (boolean)

task_ignore_result (boolean)

result_expires (int(seconds) | timedelta)

worker_concurrency (int)

broker_pool_limit (int | None)

worker_prefetch_multiplier (int)

Read next

Anthropic Claude with tools using Python SDK

How to Bring Your Own Domain in SaaS Platforms?

Tkinter library in Python

The Tale of Tailwind CSS and React

task_acks_late (`boolean`)

task_ignore_result (`boolean`)

result_expires (`int(seconds) | timedelta`)

worker_concurrency (`int`)

broker_pool_limit (`int | None`)

worker_prefetch_multiplier (`int`)