TL;DR:
transaction.atomic()protects SQL work on one connection to one database. It does not protect queues, emails, webhooks, other databases, or third-party APIs. If you design a business process as if oneatomicblock covered the whole thing, sooner or later you will ship a half-commit to production: DB state changed, side effects missing (or the other way around). This post walks through where the real consistency boundary is, and what to reach for when you need more.
I'm a senior backend engineer working in Python/Django. More long-form writing at rafalfuchs.dev.
The illusion atomic() creates
Most Django developers learn transactions through a comforting pattern:
with transaction.atomic():
order = Order.objects.create(...)
Payment.objects.create(order=order, ...)
send_confirmation_email(order)
publish_to_kafka("order.created", order.id)
It reads like one unit of work. It feels atomic. It is not.
Only the first two lines are actually protected by the database transaction. The email and the Kafka publish happen inside the with block, but they are side effects that do not roll back if the transaction aborts. Worse: if they fire before the commit and the commit then fails, you just notified the world about an order that does not exist.
This is the core misconception I want to unpack: SQL commit is not business-process commit.
1. What atomic() actually guarantees
Think of atomic as a local database safety boundary. Scoped to one connection, one database, one transaction.
It does:
- commit or roll back SQL changes together,
- support nesting via savepoints,
- preserve invariants inside that specific DB transaction.
It does not:
- include external side effects (HTTP calls, emails, message brokers, cache writes),
- guarantee delivery of any asynchronous message,
- solve multi-database atomicity,
- protect you from a successful commit followed by a crash before your handler returns.
That last point trips up a lot of people. The moment __exit__ on the context manager finishes, your transaction is committed. Anything afterwards is a new world, and the database has no idea whether your Celery task made it into Redis or not.
2. ATOMIC_REQUESTS: a sharp tool, not a default
ATOMIC_REQUESTS = True wraps every request in a transaction. It feels like a sane default, and for small apps it genuinely reduces accidental partial writes.
At higher traffic it starts to bite:
- transactions live longer (the full request lifecycle, not just the write),
- lock contention climbs on hot rows,
- throughput on mixed read/write endpoints drops,
- a slow external call inside a view now holds a DB transaction open for its entire duration.
The better architectural question is rarely "can we wrap the whole request?". It is "which specific write-critical section genuinely needs a transaction?". Reach for explicit with transaction.atomic(): around that section, and let the rest of the request run without holding row locks.
3. The minimum viable guardrail: transaction.on_commit
If you only take one pattern away from this post, take this one. Never fire a side effect from inside an atomic block directly. Register it with on_commit:
from django.db import transaction
def create_invoice_and_enqueue(invoice_data):
with transaction.atomic():
invoice = Invoice.objects.create(**invoice_data)
transaction.on_commit(
lambda: publish_invoice_created(invoice_id=invoice.id)
)
return invoice
on_commit holds the callback until the outermost transaction successfully commits. If the transaction rolls back, the callback never runs. No phantom notifications about invoices that no longer exist.
But note what this still does not give you:
- if the process crashes between commit and
on_commitexecution, the callback is lost, - if the broker is down when the callback fires, the message is gone,
- if the consumer processes the message twice, you get double side effects.
on_commit is a necessary guardrail, not a delivery guarantee. Pair it with retries on the producer side and idempotent consumers on the receiver side, or move to something stronger (see section 6).
4. Isolation levels and the race conditions you are not seeing
Django on PostgreSQL defaults to READ COMMITTED. That means: you see rows that were committed before your statement started. It does not mean: nobody can change a row between your SELECT and your UPDATE.
Classic broken pattern:
# BROKEN under concurrency
def reserve_stock(product_id, qty):
with transaction.atomic():
product = Product.objects.get(id=product_id)
if product.available_qty < qty:
raise ValueError("Insufficient stock")
product.available_qty -= qty
product.save(update_fields=["available_qty"])
Two concurrent requests both read available_qty = 5, both see enough stock for qty = 3, both subtract, and you have just oversold by 1 unit. The transaction committed successfully. The business invariant is broken.
Three tools to pick from, in rough order of cost:
4a. Pessimistic locking with select_for_update
from django.db import transaction
def reserve_stock(product_id, qty):
with transaction.atomic():
product = (
Product.objects
.select_for_update()
.get(id=product_id)
)
if product.available_qty < qty:
raise ValueError("Insufficient stock")
product.available_qty -= qty
product.save(update_fields=["available_qty"])
Simple, correct, and it serializes everyone hitting the same row. Use it for short critical sections on high-value state (stock, balance, seat booking). Keep the locked section small - do not put HTTP calls inside.
4b. Optimistic locking with a version column
updated = (
Product.objects
.filter(id=product_id, version=expected_version)
.update(
available_qty=F("available_qty") - qty,
version=F("version") + 1,
)
)
if updated == 0:
raise ConcurrentUpdateError("retry")
Lets readers through without blocking. The loser of a race has to retry. Good fit for read-heavy paths where contention is rare but must be detected.
4c. Database constraints as the last line of defence
UNIQUE, CHECK, FK, partial indexes. Your application logic will have bugs. The database is the one layer that will reliably catch a duplicate order number or a negative balance. Treat constraints as non-negotiable, not as "optimization for later".
5. The moment you cross a process boundary, there is no global transaction
The moment your use case touches Celery, a webhook, an email provider, a second database, or any external API, you are out of the ACID world. There is no protocol that wraps "insert row in Postgres" and "send message to SQS" into one atomic action. Two-phase commit exists on paper. Almost nobody runs it in production for good reasons.
What you actually have is a distributed system with partial failures. Your options:
| Scenario | Acceptable approach |
|---|---|
| Side effect is nice-to-have (analytics event) |
on_commit + fire-and-forget, accept occasional loss |
| Side effect must eventually happen |
on_commit + retries + idempotent consumer |
| Side effect must happen exactly-once-ish, and loss is unacceptable | Outbox pattern |
The outbox pattern is the one I reach for in anything touching billing, inventory, compliance, or audit.
6. The outbox pattern, concretely
The idea: instead of publishing to a broker from application code, write the message to a regular table in the same transaction as your domain change. A separate worker reads the outbox and publishes. Because the write and the message land in one DB commit, they succeed or fail together.
Schema
class OutboxEvent(models.Model):
id = models.BigAutoField(primary_key=True)
aggregate_type = models.CharField(max_length=64)
aggregate_id = models.CharField(max_length=64)
event_type = models.CharField(max_length=128)
payload = models.JSONField()
created_at = models.DateTimeField(auto_now_add=True)
published_at = models.DateTimeField(null=True, db_index=True)
class Meta:
indexes = [
models.Index(
fields=["published_at", "id"],
name="outbox_unpublished_idx",
condition=models.Q(published_at__isnull=True),
),
]
Writing the event
from django.db import transaction
def create_invoice(invoice_data):
with transaction.atomic():
invoice = Invoice.objects.create(**invoice_data)
OutboxEvent.objects.create(
aggregate_type="invoice",
aggregate_id=str(invoice.id),
event_type="invoice.created",
payload={"id": invoice.id, "total": str(invoice.total)},
)
return invoice
No on_commit, no direct broker call. The invoice row and the outbox row commit together, or neither exists.
Relaying
A separate process (Celery beat, a small dedicated worker, or a CDC tool like Debezium reading the WAL) pulls unpublished rows and publishes them:
from django.db import transaction
from django.utils import timezone
def relay_outbox(batch_size=100):
with transaction.atomic():
events = (
OutboxEvent.objects
.select_for_update(skip_locked=True)
.filter(published_at__isnull=True)
.order_by("id")[:batch_size]
)
for event in events:
publish_to_broker(
topic=event.event_type,
key=event.aggregate_id,
payload=event.payload,
message_id=str(event.id), # for consumer dedup
)
event.published_at = timezone.now()
event.save(update_fields=["published_at"])
skip_locked lets you run multiple relay workers without them fighting over the same rows.
What you gain
- no lost events on broker outage (they sit in the outbox),
- no phantom events on rollback (they never hit the outbox),
- an auditable history of what was published and when,
- a lag metric (
unpublished outbox rowsandoldest unpublished row age) you can alert on.
What you still owe the consumer side
Consumers must be idempotent. Design every handler so that receiving the same message twice is a no-op. The outbox gives you at-least-once delivery, not exactly-once. The message ID (outbox row PK) is your dedup key.
7. Multiple databases: there is no atomic across them
Django supports multiple databases. It does not give you a cross-database transaction. This code is a lie:
# Does NOT make the two writes atomic
with transaction.atomic(using="default"):
with transaction.atomic(using="analytics"):
Order.objects.using("default").create(...)
AnalyticsEvent.objects.using("analytics").create(...)
If the default commit succeeds and the analytics commit fails (or the process dies in between), you have inconsistent state across databases with no automatic recovery.
Practical rules:
- keep each business invariant anchored in one database,
- if a flow genuinely spans databases, design it as eventual consistency: commit to the source of truth, then propagate via outbox or CDC,
- accept that "propagate" means "retry forever until it sticks, with alerts if lag grows".
8. Decision matrix
| Pattern | Use when | Avoid when |
|---|---|---|
Plain atomic
|
Single DB write, no external side effects, low business risk | Any side effect leaves the DB |
atomic + on_commit
|
Side effects must run after commit, brief loss is acceptable, basic retry in place | Loss is genuinely unacceptable |
| Outbox + idempotent consumers | Billing, inventory, compliance, audit, anything where partial failure is a incident | You have no consumers and never will |
| Saga / compensation | Long-running workflows across multiple services | Simple CRUD |
Do not jump to the bottom of the table by default. Outbox has operational cost: another table, another worker, another dashboard, another runbook. Use it where the business cost of a lost event exceeds that.
9. Production checklist
Before you ship any write path that has side effects, walk through this:
- Is every critical write anchored in a single database?
- Are all external side effects deferred until after commit (either via
on_commitor outbox)? - Are all message consumers idempotent? Do you have a dedup strategy with a concrete key?
- Do you track outbox lag (oldest unpublished row age) and retry rate, with alerts?
- Do your integration tests include concurrent writers hitting the same row?
- Is there a runbook for recovery from a half-commit? Who runs it at 03:00?
- Do you have constraints in the DB that catch the failure modes your application logic might miss?
- Are long-running operations (HTTP, file I/O) kept out of
atomicblocks?
If any answer is "we will add it later", that is the thing that will page you.
Final verdict
transaction.atomic() gives you local transactional correctness inside one database. It does not give you global process consistency. Those are different layers, solved by different tools.
Most "committed but broken" production incidents come from conflating them - trusting one with transaction.atomic(): block to cover a process that actually spans three systems. Treat the database transaction as one small, strict boundary, move every side effect to on_commit at minimum, and reach for the outbox pattern when losing an event is unacceptable.
Get that separation right and a whole class of weird half-state bugs disappears from your backlog.
If this was useful, I write more about Django architecture, backend design, and production consistency at rafalfuchs.dev/en/blog. The original version of this post lives here.
Top comments (0)