Rafał Fuchs

Posted on Apr 17 • Originally published at rafalfuchs.dev

Transaction boundaries in Django: where consistency really ends

#django #architecture #python

TL;DR: transaction.atomic() protects SQL work on one connection to one database. It does not protect queues, emails, webhooks, other databases, or third-party APIs. If you design a business process as if one atomic block covered the whole thing, sooner or later you will ship a half-commit to production: DB state changed, side effects missing (or the other way around). This post walks through where the real consistency boundary is, and what to reach for when you need more.

I'm a senior backend engineer working in Python/Django. More long-form writing at rafalfuchs.dev.

The illusion `atomic()` creates

Most Django developers learn transactions through a comforting pattern:

with transaction.atomic():
    order = Order.objects.create(...)
    Payment.objects.create(order=order, ...)
    send_confirmation_email(order)
    publish_to_kafka("order.created", order.id)

It reads like one unit of work. It feels atomic. It is not.

Only the first two lines are actually protected by the database transaction. The email and the Kafka publish happen inside the with block, but they are side effects that do not roll back if the transaction aborts. Worse: if they fire before the commit and the commit then fails, you just notified the world about an order that does not exist.

This is the core misconception I want to unpack: SQL commit is not business-process commit.

1. What `atomic()` actually guarantees

Think of atomic as a local database safety boundary. Scoped to one connection, one database, one transaction.

It does:

commit or roll back SQL changes together,
support nesting via savepoints,
preserve invariants inside that specific DB transaction.

It does not:

include external side effects (HTTP calls, emails, message brokers, cache writes),
guarantee delivery of any asynchronous message,
solve multi-database atomicity,
protect you from a successful commit followed by a crash before your handler returns.

That last point trips up a lot of people. The moment __exit__ on the context manager finishes, your transaction is committed. Anything afterwards is a new world, and the database has no idea whether your Celery task made it into Redis or not.

2. `ATOMIC_REQUESTS`: a sharp tool, not a default

ATOMIC_REQUESTS = True wraps every request in a transaction. It feels like a sane default, and for small apps it genuinely reduces accidental partial writes.

At higher traffic it starts to bite:

transactions live longer (the full request lifecycle, not just the write),
lock contention climbs on hot rows,
throughput on mixed read/write endpoints drops,
a slow external call inside a view now holds a DB transaction open for its entire duration.

The better architectural question is rarely "can we wrap the whole request?". It is "which specific write-critical section genuinely needs a transaction?". Reach for explicit with transaction.atomic(): around that section, and let the rest of the request run without holding row locks.

3. The minimum viable guardrail: `transaction.on_commit`

If you only take one pattern away from this post, take this one. Never fire a side effect from inside an atomic block directly. Register it with on_commit:

from django.db import transaction


def create_invoice_and_enqueue(invoice_data):
    with transaction.atomic():
        invoice = Invoice.objects.create(**invoice_data)

        transaction.on_commit(
            lambda: publish_invoice_created(invoice_id=invoice.id)
        )

    return invoice

on_commit holds the callback until the outermost transaction successfully commits. If the transaction rolls back, the callback never runs. No phantom notifications about invoices that no longer exist.

But note what this still does not give you:

if the process crashes between commit and on_commit execution, the callback is lost,
if the broker is down when the callback fires, the message is gone,
if the consumer processes the message twice, you get double side effects.

on_commit is a necessary guardrail, not a delivery guarantee. Pair it with retries on the producer side and idempotent consumers on the receiver side, or move to something stronger (see section 6).

4. Isolation levels and the race conditions you are not seeing

Django on PostgreSQL defaults to READ COMMITTED. That means: you see rows that were committed before your statement started. It does not mean: nobody can change a row between your SELECT and your UPDATE.

Classic broken pattern:

# BROKEN under concurrency
def reserve_stock(product_id, qty):
    with transaction.atomic():
        product = Product.objects.get(id=product_id)

        if product.available_qty < qty:
            raise ValueError("Insufficient stock")

        product.available_qty -= qty
        product.save(update_fields=["available_qty"])

Two concurrent requests both read available_qty = 5, both see enough stock for qty = 3, both subtract, and you have just oversold by 1 unit. The transaction committed successfully. The business invariant is broken.

Three tools to pick from, in rough order of cost:

4a. Pessimistic locking with `select_for_update`

from django.db import transaction


def reserve_stock(product_id, qty):
    with transaction.atomic():
        product = (
            Product.objects
            .select_for_update()
            .get(id=product_id)
        )

        if product.available_qty < qty:
            raise ValueError("Insufficient stock")

        product.available_qty -= qty
        product.save(update_fields=["available_qty"])

Simple, correct, and it serializes everyone hitting the same row. Use it for short critical sections on high-value state (stock, balance, seat booking). Keep the locked section small - do not put HTTP calls inside.

4b. Optimistic locking with a version column

updated = (
    Product.objects
    .filter(id=product_id, version=expected_version)
    .update(
        available_qty=F("available_qty") - qty,
        version=F("version") + 1,
    )
)

if updated == 0:
    raise ConcurrentUpdateError("retry")

Lets readers through without blocking. The loser of a race has to retry. Good fit for read-heavy paths where contention is rare but must be detected.

4c. Database constraints as the last line of defence

UNIQUE, CHECK, FK, partial indexes. Your application logic will have bugs. The database is the one layer that will reliably catch a duplicate order number or a negative balance. Treat constraints as non-negotiable, not as "optimization for later".

5. The moment you cross a process boundary, there is no global transaction

The moment your use case touches Celery, a webhook, an email provider, a second database, or any external API, you are out of the ACID world. There is no protocol that wraps "insert row in Postgres" and "send message to SQS" into one atomic action. Two-phase commit exists on paper. Almost nobody runs it in production for good reasons.

What you actually have is a distributed system with partial failures. Your options:

Scenario	Acceptable approach
Side effect is nice-to-have (analytics event)	`on_commit` + fire-and-forget, accept occasional loss
Side effect must eventually happen	`on_commit` + retries + idempotent consumer
Side effect must happen exactly-once-ish, and loss is unacceptable	Outbox pattern

The outbox pattern is the one I reach for in anything touching billing, inventory, compliance, or audit.

6. The outbox pattern, concretely

The idea: instead of publishing to a broker from application code, write the message to a regular table in the same transaction as your domain change. A separate worker reads the outbox and publishes. Because the write and the message land in one DB commit, they succeed or fail together.

Schema

class OutboxEvent(models.Model):
    id = models.BigAutoField(primary_key=True)
    aggregate_type = models.CharField(max_length=64)
    aggregate_id = models.CharField(max_length=64)
    event_type = models.CharField(max_length=128)
    payload = models.JSONField()
    created_at = models.DateTimeField(auto_now_add=True)
    published_at = models.DateTimeField(null=True, db_index=True)

    class Meta:
        indexes = [
            models.Index(
                fields=["published_at", "id"],
                name="outbox_unpublished_idx",
                condition=models.Q(published_at__isnull=True),
            ),
        ]

Writing the event

from django.db import transaction


def create_invoice(invoice_data):
    with transaction.atomic():
        invoice = Invoice.objects.create(**invoice_data)

        OutboxEvent.objects.create(
            aggregate_type="invoice",
            aggregate_id=str(invoice.id),
            event_type="invoice.created",
            payload={"id": invoice.id, "total": str(invoice.total)},
        )

    return invoice

No on_commit, no direct broker call. The invoice row and the outbox row commit together, or neither exists.

Relaying

A separate process (Celery beat, a small dedicated worker, or a CDC tool like Debezium reading the WAL) pulls unpublished rows and publishes them:

from django.db import transaction
from django.utils import timezone


def relay_outbox(batch_size=100):
    with transaction.atomic():
        events = (
            OutboxEvent.objects
            .select_for_update(skip_locked=True)
            .filter(published_at__isnull=True)
            .order_by("id")[:batch_size]
        )

        for event in events:
            publish_to_broker(
                topic=event.event_type,
                key=event.aggregate_id,
                payload=event.payload,
                message_id=str(event.id),   # for consumer dedup
            )
            event.published_at = timezone.now()
            event.save(update_fields=["published_at"])

skip_locked lets you run multiple relay workers without them fighting over the same rows.

What you gain

no lost events on broker outage (they sit in the outbox),
no phantom events on rollback (they never hit the outbox),
an auditable history of what was published and when,
a lag metric (unpublished outbox rows and oldest unpublished row age) you can alert on.

What you still owe the consumer side

Consumers must be idempotent. Design every handler so that receiving the same message twice is a no-op. The outbox gives you at-least-once delivery, not exactly-once. The message ID (outbox row PK) is your dedup key.

7. Multiple databases: there is no `atomic` across them

Django supports multiple databases. It does not give you a cross-database transaction. This code is a lie:

# Does NOT make the two writes atomic
with transaction.atomic(using="default"):
    with transaction.atomic(using="analytics"):
        Order.objects.using("default").create(...)
        AnalyticsEvent.objects.using("analytics").create(...)

If the default commit succeeds and the analytics commit fails (or the process dies in between), you have inconsistent state across databases with no automatic recovery.

Practical rules:

keep each business invariant anchored in one database,
if a flow genuinely spans databases, design it as eventual consistency: commit to the source of truth, then propagate via outbox or CDC,
accept that "propagate" means "retry forever until it sticks, with alerts if lag grows".

8. Decision matrix

Pattern	Use when	Avoid when
Plain `atomic`	Single DB write, no external side effects, low business risk	Any side effect leaves the DB
`atomic` + `on_commit`	Side effects must run after commit, brief loss is acceptable, basic retry in place	Loss is genuinely unacceptable
Outbox + idempotent consumers	Billing, inventory, compliance, audit, anything where partial failure is a incident	You have no consumers and never will
Saga / compensation	Long-running workflows across multiple services	Simple CRUD

Do not jump to the bottom of the table by default. Outbox has operational cost: another table, another worker, another dashboard, another runbook. Use it where the business cost of a lost event exceeds that.

9. Production checklist

Before you ship any write path that has side effects, walk through this:

Is every critical write anchored in a single database?
Are all external side effects deferred until after commit (either via on_commit or outbox)?
Are all message consumers idempotent? Do you have a dedup strategy with a concrete key?
Do you track outbox lag (oldest unpublished row age) and retry rate, with alerts?
Do your integration tests include concurrent writers hitting the same row?
Is there a runbook for recovery from a half-commit? Who runs it at 03:00?
Do you have constraints in the DB that catch the failure modes your application logic might miss?
Are long-running operations (HTTP, file I/O) kept out of atomic blocks?

If any answer is "we will add it later", that is the thing that will page you.

Final verdict

transaction.atomic() gives you local transactional correctness inside one database. It does not give you global process consistency. Those are different layers, solved by different tools.

Most "committed but broken" production incidents come from conflating them - trusting one with transaction.atomic(): block to cover a process that actually spans three systems. Treat the database transaction as one small, strict boundary, move every side effect to on_commit at minimum, and reach for the outbox pattern when losing an event is unacceptable.

Get that separation right and a whole class of weird half-state bugs disappears from your backlog.

If this was useful, I write more about Django architecture, backend design, and production consistency at rafalfuchs.dev/en/blog. The original version of this post lives here.

DEV Community

Transaction boundaries in Django: where consistency really ends

The illusion `atomic()` creates

1. What `atomic()` actually guarantees

2. `ATOMIC_REQUESTS`: a sharp tool, not a default

3. The minimum viable guardrail: `transaction.on_commit`

4. Isolation levels and the race conditions you are not seeing

4a. Pessimistic locking with `select_for_update`

4b. Optimistic locking with a version column

4c. Database constraints as the last line of defence

5. The moment you cross a process boundary, there is no global transaction

6. The outbox pattern, concretely

Schema

Writing the event

Relaying

What you gain

What you still owe the consumer side

7. Multiple databases: there is no `atomic` across them

8. Decision matrix

9. Production checklist

Final verdict

Top comments (0)

The illusion atomic() creates

1. What atomic() actually guarantees

2. ATOMIC_REQUESTS: a sharp tool, not a default

3. The minimum viable guardrail: transaction.on_commit

4. Isolation levels and the race conditions you are not seeing

4a. Pessimistic locking with select_for_update

4b. Optimistic locking with a version column

4c. Database constraints as the last line of defence

5. The moment you cross a process boundary, there is no global transaction

6. The outbox pattern, concretely

Schema

Writing the event

Relaying

What you gain

What you still owe the consumer side

7. Multiple databases: there is no atomic across them

8. Decision matrix

9. Production checklist

Final verdict

The illusion `atomic()` creates

1. What `atomic()` actually guarantees

2. `ATOMIC_REQUESTS`: a sharp tool, not a default

3. The minimum viable guardrail: `transaction.on_commit`

4a. Pessimistic locking with `select_for_update`

7. Multiple databases: there is no `atomic` across them