Hey everyone,
While working with FastAPI and Celery, I ran into a subtle reliability issue that I think is easy to overlook.
A very common flow looks like this:
db.commit()
celery_task.delay(...)
At first glance, this seems perfectly fine.
But there’s a problem: this is not atomic.
The hidden bug
If your process crashes between these two lines:
the database transaction is already committed
but the task is never enqueued
The job is silently lost. No retry. No error. No visibility.
This makes it especially dangerous in production systems where reliability matters.
A possible solution: Transactional Outbox
One approach to solve this is the Transactional Outbox pattern:
- Write both the state change and an "event" into the database in the same transaction
- Use a separate worker to read and publish those events to the queue
This ensures that nothing is lost, but it comes with trade-offs:
- added complexity
- eventual consistency
- extra moving parts
I put together a more detailed write-up here:
https://medium.com/@imgeaslikok/the-bug-between-db-commit-and-queue-enqueue-c8ef92207863
Curious about real-world approaches
I’m really interested in how others handle this in production:
- Do you implement outbox/inbox patterns?
- Do you rely on retries and idempotency instead?
- Have you ever run into this issue in real systems?
- Any simpler alternatives that still guarantee delivery?
Would love to hear your experience.
Top comments (0)