Every Python app that does anything real eventually hits the same wall: a request needs to do slow work — send an email, render a PDF, call a third-party API, run an inference — and you can't make the user wait for it. The fix is a background job: hand the work to a worker process and return immediately.
The hard part isn't the concept. It's choosing among the four libraries that dominate this space — Celery, RQ, Dramatiq, and Arq — and then not regretting the choice eighteen months later. This guide is the decision framework I wish I'd had: what each one is good at, a minimal working example of each, and the production details that actually bite.
What a task queue actually buys you
A task queue gives you three things: offloading (slow work leaves the request cycle), durability (jobs survive a process restart because they live in a broker like Redis or RabbitMQ), and scalability (you add workers to add throughput). If you only need "run this after the response is sent" and never need durability, an in-process option like FastAPI's BackgroundTasks may be enough. The moment you need retries, scheduling, or work that must not be lost, you want a real queue.
The four contenders at a glance
| Celery | RQ | Dramatiq | Arq | |
|---|---|---|---|---|
| Broker | Redis, RabbitMQ, SQS | Redis | Redis, RabbitMQ | Redis |
| Async (asyncio) tasks | partial / awkward | no | no (threads/processes) | native |
| Maturity | very high | high | high | medium |
| Ops complexity | high | low | low–medium | low |
| Scheduling/cron | Celery Beat | rq-scheduler | APScheduler/built-in | built-in cron |
| Best for | large, heterogeneous workloads | simple sync jobs | reliability with low ceremony | asyncio-native apps |
Celery — the default everyone reaches for
Celery is the 800-pound gorilla: enormous feature set, every broker, routing, priorities, chords/groups, mature monitoring (Flower). The cost is operational weight and a config surface that can swallow a week.
# tasks.py
from celery import Celery
app = Celery("myapp", broker="redis://localhost:6379/0",
backend="redis://localhost:6379/1")
@app.task(bind=True, max_retries=3, default_retry_delay=10)
def send_report(self, user_id: int):
try:
build_and_email_report(user_id)
except TransientError as exc:
raise self.retry(exc=exc)
send_report.delay(42) # enqueue
# worker: celery -A tasks worker --loglevel=info
Pick Celery when you have many task types, need routing/priorities across queues, or want the deepest ecosystem. Avoid it when your needs are modest — you'll pay the complexity tax for features you never use.
RQ — the one you can understand in an afternoon
RQ (Redis Queue) is deliberately small. Redis only, plain functions as jobs, a readable dashboard. There's almost no magic.
from redis import Redis
from rq import Queue
from myapp.jobs import send_report
q = Queue(connection=Redis())
q.enqueue(send_report, 42, retry=Retry(max=3))
# worker: rq worker
Pick RQ when your jobs are synchronous, Redis is already in your stack, and you value being able to read the entire library's behavior in your head. Avoid it for asyncio-heavy code or when you need RabbitMQ-grade routing.
Dramatiq — reliability without the ceremony
Dramatiq is the "Celery did too much" answer: sane defaults, automatic retries with exponential backoff, message age limits, and a clean middleware system — with far less configuration.
import dramatiq
from dramatiq.brokers.redis import RedisBroker
dramatiq.set_broker(RedisBroker(url="redis://localhost:6379"))
@dramatiq.actor(max_retries=3, min_backoff=1000)
def send_report(user_id: int):
build_and_email_report(user_id)
send_report.send(42)
# worker: dramatiq myapp
Pick Dramatiq when you want Celery-grade reliability semantics (retries, dead-letter handling) but hate Celery-grade config. It's my default recommendation for new synchronous projects.
Arq — built for asyncio from the ground up
If your app is async (FastAPI, aiohttp, async DB drivers), Celery's async story will frustrate you. Arq is asyncio-native: jobs are coroutines, the worker is an event loop, and a single worker handles high I/O concurrency without a thread per job.
from arq import create_pool
from arq.connections import RedisSettings
async def send_report(ctx, user_id: int):
await build_and_email_report(user_id) # real awaits
class WorkerSettings:
functions = [send_report]
redis_settings = RedisSettings()
# enqueue from async code:
async def main():
redis = await create_pool(RedisSettings())
await redis.enqueue_job("send_report", 42)
# worker: arq mymodule.WorkerSettings
Pick Arq when your codebase is async end-to-end and your jobs are I/O-bound (HTTP calls, DB, queues). One worker can run hundreds of concurrent jobs. Avoid it for CPU-bound work — you still need processes for that.
The 30-second decision
- Async app, I/O-bound jobs → Arq.
- New sync project, want reliability with minimal config → Dramatiq.
- Small, simple, Redis already present → RQ.
- Complex routing, priorities, many task types, or an existing Celery shop → Celery.
The production gotchas nobody warns you about
These apply no matter which library you pick:
-
Make tasks idempotent. At-least-once delivery means a task can run twice (worker dies after doing the work but before acking). Use a dedupe key or an
INSERT ... ON CONFLICTso a re-run is harmless. - Set a visibility/ack timeout longer than your slowest task. If the broker reclaims a message because the task ran long, you get duplicate execution. Tune it deliberately.
- Cap retries and route failures to a dead-letter queue. Infinite retries on a poison message will saturate your workers. Bound them and inspect the DLQ.
- Handle graceful shutdown. A deploy that SIGKILLs a worker mid-task loses or double-runs it. Trap SIGTERM, stop pulling new work, finish in-flight jobs.
- Keep payloads small. Enqueue an ID, not a 5 MB object. Fetch the data inside the task. Big payloads bloat the broker and slow everything.
- Monitor queue depth, not just CPU. A growing backlog is your earliest signal that workers can't keep up — alert on it before users notice.
Skip the boilerplate
Wiring up a queue, retries, idempotency helpers, a scheduler, and Docker for the worker is the same 200 lines every time. If you'd rather start from a production-ready setup than assemble it by hand, the Async Task Queue Toolkit packages worker configs, retry/idempotency patterns, and deployment recipes for exactly these four libraries so you can ship the feature instead of the plumbing.
Bottom line
There's no single "best" Python task queue — there's the right one for your concurrency model and operational appetite. Match the tool to your app: Arq for async, Dramatiq for low-ceremony reliability, RQ for simple, Celery when you genuinely need its depth. Then spend your real effort on the parts that bite everyone equally — idempotency, retries, and graceful shutdown.
Top comments (0)