Mrakdon.com

Posted on Jan 29

Why Soft Delete Can Backfire on Data Consistency

#postgres #softdelete #dataintegrity

Why Soft Delete Can Backfire on Data Consistency

“Soft delete is a safety net, but if you’re not careful it can become a data integrity nightmare.”

Introduction

Many modern applications need to preserve historical records while still allowing users to “remove” data. The most common pattern is soft delete: instead of physically deleting a row, we set a flag (deleted_at, is_deleted, etc.) and treat the record as invisible in the UI.

At first glance this looks like a win‑win:

Auditability – you can always trace who did what.
Recoverability – accidental deletions are reversible.
Simpler business logic – no need for cascade deletes.

However, the moment you combine soft delete with unique constraints (e.g., email addresses, usernames, invoice numbers), the simplicity evaporates. A regular unique index does not differentiate between active and soft‑deleted rows, so inserting a new “active” record that collides with a “deleted” one throws a constraint violation.

Developers often respond by moving the uniqueness check to application code. Unfortunately, that opens the door to race conditions and ultimately breaks the very consistency the database was supposed to guarantee.

In this post we’ll dive deep into the problem, illustrate the pitfalls, and present the PostgreSQL partial index as an elegant, database‑level solution.

What You Will Learn

Why relying on application‑level validation for uniqueness is unsafe.
How race conditions manifest with soft delete.
The syntax and semantics of partial indexes in PostgreSQL.
A step‑by‑step implementation example, including migration scripts.
Best‑practice testing strategies to ensure the solution holds under concurrency.

The Soft Delete Dilemma

1. Unique Constraints vs. Soft‑Deleted Rows

Consider a simple users table:

CREATE TABLE users (
    id          SERIAL PRIMARY KEY,
    email       TEXT NOT NULL,
    deleted_at  TIMESTAMP WITH TIME ZONE,
    CONSTRAINT uq_users_email UNIQUE (email)
);

When a user deletes their account, we run:

UPDATE users SET deleted_at = now() WHERE id = 42;

Now we try to register a new user with the same email address. PostgreSQL raises:

ERROR:  duplicate key value violates unique constraint "uq_users_email"

The database treats the soft‑deleted row exactly the same as an active row because the unique index has no knowledge of the deleted_at column.

2. Shifting Validation to the Application Layer

A common workaround is:

# Pseudocode (Python)
if not db.session.query(User).filter(User.email == new_email, User.deleted_at.is_(None)).exists():
    db.session.add(User(email=new_email))
    db.session.commit()

At first this seems fine, but under concurrent requests two processes can both evaluate the exists() check as false, then both insert, causing a duplicate key error at commit time – or worse, silently corrupting data if the unique index is removed.

3. The Race Condition Explained

Process A                         Process B
---------------------------       ---------------------------
SELECT … WHERE email='x'          SELECT … WHERE email='x'
-- returns 0 rows                -- returns 0 rows
INSERT INTO users …               INSERT INTO users …
COMMIT                            COMMIT

Both processes see the table as empty because the soft‑deleted row is filtered out, leading to a lost‑update scenario. The only reliable guard against this is a database‑enforced constraint that understands the soft‑delete flag.

Partial Indexes – The PostgreSQL Answer

What Is a Partial Index?

A partial index is an index that only includes rows matching a given WHERE predicate. PostgreSQL evaluates the predicate for each row at index creation time and maintains the index only for rows that satisfy it.

“Think of a partial index as a filtered view of your data that the planner can use for fast look‑ups while ignoring irrelevant rows.”

Defining a Conditional Unique Index

For the users example we want uniqueness only for rows that are not soft‑deleted:

CREATE UNIQUE INDEX uq_users_email_active ON users (email)
WHERE deleted_at IS NULL;

Now the database enforces:

Active rows (deleted_at IS NULL) must have unique emails.
Deleted rows are excluded from the index, so their email values can be reused.

Verifying the Behavior

-- Insert an active user
INSERT INTO users (email) VALUES ('alice@example.com');

-- Soft‑delete the user
UPDATE users SET deleted_at = now() WHERE email = 'alice@example.com';

-- Re‑use the same email – succeeds because the row is excluded from the index
INSERT INTO users (email) VALUES ('alice@example.com');

If you try to insert a duplicate while the original row is still active, PostgreSQL blocks the operation with a clear error.

Full Implementation Walkthrough

1. Table Definition

CREATE TABLE users (
    id          BIGSERIAL PRIMARY KEY,
    email       TEXT NOT NULL,
    name        TEXT NOT NULL,
    deleted_at  TIMESTAMP WITH TIME ZONE,
    created_at  TIMESTAMP WITH TIME ZONE DEFAULT now(),
    updated_at  TIMESTAMP WITH TIME ZONE DEFAULT now()
);

2. Adding the Partial Unique Index

-- Migration script (e.g., using Flyway or Alembic)
BEGIN;
    CREATE UNIQUE INDEX uq_users_email_active ON users (email)
    WHERE deleted_at IS NULL;
COMMIT;

3. Soft Delete Helper (SQL Function)

CREATE OR REPLACE FUNCTION soft_delete_user(p_user_id BIGINT)
RETURNS VOID AS $$
BEGIN
    UPDATE users
    SET deleted_at = now(), updated_at = now()
    WHERE id = p_user_id AND deleted_at IS NULL;
END;
$$ LANGUAGE plpgsql;

4. Application‑Level Insert (No Manual Uniqueness Check Needed)

# Example using SQLAlchemy (Python)
new_user = User(email='bob@example.com', name='Bob')
session.add(new_user)
try:
    session.commit()
except Exception as e:
    session.rollback()
    raise ValueError('Email already in use') from e

The only source of truth for uniqueness is now the partial index – no extra code, no race conditions.

Testing Under Concurrency

To prove the solution works, simulate concurrent inserts with pgbench or a custom script:

#!/usr/bin/env bash
# concurrent_insert.sh – runs two psql sessions in parallel
psql -d mydb -c "BEGIN; INSERT INTO users (email, name) VALUES ('charlie@example.com', 'Charlie'); COMMIT;" &
psql -d mydb -c "BEGIN; INSERT INTO users (email, name) VALUES ('charlie@example.com', 'Charles'); COMMIT;" &
wait

One transaction will succeed; the other will abort with:

ERROR:  duplicate key value violates unique constraint "uq_users_email_active"

Because the constraint lives in the database, PostgreSQL serializes the conflicting operations safely.

When Partial Indexes Aren’t Enough

Multiple soft‑delete columns – you may need a composite predicate.
Complex business rules – sometimes a CHECK constraint combined with a partial index is required.
Cross‑table uniqueness – consider a separate “lookup” table with a ON DELETE SET NULL cascade.

In such cases, the same principle applies: push the rule down to the DB, not the application.

Conclusion

Soft delete is a powerful pattern, but when you pair it with unique constraints you must avoid moving the guard rail to application code. Doing so invites race conditions and erodes data integrity.

By leveraging PostgreSQL partial indexes, you get:

Conditional uniqueness – active rows stay unique, deleted rows are ignored.
Zero‑runtime overhead – the index is maintained automatically.
Robust concurrency handling – the database enforces the rule atomically.

“Let the database do what it does best: guarantee consistency. Your application can then focus on delivering value.”

If you’ve encountered similar challenges or have alternative strategies (e.g., UUID surrogate keys, versioned tables), share your thoughts in the comments. Happy coding!

Top comments (2)

Pascal CESCATO • Jan 29 • Edited

I took a different route with MariaDB: instead of soft deletes, I keep a clean customers table and write every create/update/delete into a customers_history table via triggers.
The goal isn’t restoration UX but auditability and query consistency over time.
And there’s no universal “right” pattern here — only trade-offs depending on whether you optimize for simplicity, performance, or long-term auditability.

Mrakdon.com • Jan 29

Great point. Segregating history into a separate table is a very robust way to handle this, and it keeps the active index small and clean.

One of the reasons I love PostgreSQL is that Partial Indexes allow us to keep the 'Soft Delete' logic in a single table without paying the penalty on unique constraints. In databases without partial indexes (or where they are less flexible), the History Table approach you mentioned is almost mandatory to maintain integrity. Thanks for sharing the alternative!