Francisco Perez

Posted on Mar 13 • Originally published at uncorreotemporal.com

Designing an Expiring Inbox System with Background Workers

#python #fastapi #postgres #backend

Temporary email systems have one defining property: everything they create must eventually disappear. That sounds simple until you need to implement it in production.

Expiring inboxes are not a UI gimmick — they are an operational requirement. Without reliable expiration, the database grows unboundedly. Anonymous inboxes pile up. Storage costs accumulate. More critically, without expiration enforcement at the ingestion layer, a mailbox that should have stopped receiving messages an hour ago might still accept them — silently, with no indication to the sender or recipient.

This article documents the expiring inbox implementation in uncorreotemporal.com, a programmable temporary email infrastructure. The system is written in Python with FastAPI, PostgreSQL, Redis, and an async SMTP layer. We'll trace the full expiration path: from the database schema, through the background worker, through the delivery safeguard, and into the failure scenarios that shaped each design decision.

The Problem with Temporary Inboxes

An inbox that expires is, by definition, an inbox with two valid states: active and inactive. Transitioning between those states cleanly is where most problems live.

Consider what "expired" means in practice:

No new messages should be delivered. The SMTP or webhook handler must reject or silently drop incoming mail.
No new API queries should return data. A client polling an expired inbox should get a clear 404 or empty response.
Storage should eventually be reclaimed. The inbox row and its associated messages should not live forever.

These three requirements create a tension. The first two need to be immediate and consistent — a client polling an inbox at T+0:01 after a T+0:00 expiry should not see new messages. The third is a background concern that can tolerate lag. Conflating them — deleting rows immediately on expiration — creates a hard dependency between message ingestion and garbage collection that introduces fragile timing dependencies.

The design separates the two concerns: expiration is a state transition (is_active = False), while deletion is deferred to an explicit action, at which point PostgreSQL's foreign key cascade handles it atomically. The background worker enforces the state transition; deletion is never implicit.

Inbox Lifecycle

An inbox moves through four distinct stages:

CREATE → ACTIVE → EXPIRED → [DELETED]

1. Creation (POST /api/v1/mailboxes)

A mailbox is created with an expires_at timestamp calculated at creation time. The TTL is clamped to plan limits before the timestamp is computed:

effective_ttl = min(ttl_minutes or plan.default_ttl_minutes, plan.max_ttl_minutes)
expires_at = datetime.now(timezone.utc) + timedelta(minutes=effective_ttl)

Plan ceilings in the seed data: free plan caps at 60 minutes, pro at 1,440 (24 hours), enterprise at 10,080 (7 days). The inbox is inserted with is_active=True and immediately begins accepting mail.

2. Message reception (core/delivery.py)

When a message arrives — whether from the dev aiosmtpd handler or the production SES/SNS webhook — the delivery layer checks both is_active and expires_at before accepting it:

result = await db.execute(
    select(Mailbox).where(
        Mailbox.address == address,
        Mailbox.is_active == True,
        Mailbox.expires_at > now,   # critical double-check
    )
)
mailbox = result.scalar_one_or_none()
if not mailbox:
    logger.debug("Buzon no encontrado o expirado: %s", address)
    return False

Both conditions must be true. An inbox where is_active is still True but expires_at is in the past is also rejected. This is the guard against the window between actual expiry time and when the background worker next runs.

3. Expiration (core/expiry.py)

A background asyncio task running in the same process as the API marks expired inboxes inactive every 60 seconds with a bulk UPDATE:

result = await db.execute(
    update(Mailbox)
    .where(
        Mailbox.expires_at <= now,
        Mailbox.is_active == True,
    )
    .values(is_active=False)
)
await db.commit()

At this point the mailbox is logically inactive but physically still in the database with all its messages intact.

4. Deletion (explicit, cascade)

Users can explicitly delete a mailbox via DELETE /api/v1/mailboxes/{address}, which is also a soft-delete (is_active = False). When a mailbox row is hard-deleted, PostgreSQL cascades the deletion to all associated messages via the ondelete="CASCADE" constraint on the messages.mailbox_id foreign key. No application-level loop is required; the database handles it atomically in a single transaction.

Data Model for Expiration

The two tables that matter are mailboxes and messages.

class Mailbox(Base):
    __tablename__ = "mailboxes"
    __table_args__ = (
        Index("ix_mailboxes_expires_at_is_active", "expires_at", "is_active"),
        Index("ix_mailboxes_owner_id_is_active", "owner_id", "is_active"),
    )

    id:            Mapped[uuid.UUID]        # primary key
    address:       Mapped[str]              # unique, indexed
    expires_at:    Mapped[datetime]         # DateTime(timezone=True), NOT NULL
    is_active:     Mapped[bool]             # default=True, indexed
    owner_type:    Mapped[OwnerType]        # anonymous | api | mcp
    owner_id:      Mapped[uuid.UUID | None] # NULL for anonymous inboxes
    session_token: Mapped[str | None]       # NULL for api/mcp

The expires_at field is always timezone-aware (TIMESTAMPTZ in PostgreSQL). This matters: comparisons with now use UTC on both sides, eliminating the DST edge cases that plague naive datetime comparisons.

The composite index ix_mailboxes_expires_at_is_active on (expires_at, is_active) is what makes the expiration query efficient. Without it, the background worker issues a full table scan every 60 seconds. With it, PostgreSQL uses a B-tree range scan to find rows where expires_at <= now, then filters by is_active = True — the exact set that needs updating.

The messages table binds to mailboxes with a cascade foreign key:

class Message(Base):
    __tablename__ = "messages"
    __table_args__ = (
        Index("ix_messages_mailbox_id_received_at", "mailbox_id", "received_at"),
    )

    mailbox_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("mailboxes.id", ondelete="CASCADE"),
        nullable=False,
    )
    raw_email:   Mapped[bytes]               # LargeBinary, full RFC 2822
    attachments: Mapped[list[dict] | None]   # JSONB metadata only, not binaries
    body_text:   Mapped[str | None]
    body_html:   Mapped[str | None]
    received_at: Mapped[datetime]
    is_read:     Mapped[bool]

The raw_email column stores the complete RFC 2822 bytes of each email. Attachment binary content is not stored separately — only metadata (filename, content type, size, content-id) is stored as JSONB. This preserves the original message for future re-parsing without duplicating binary payloads.

Background Workers for Expiration

The expiration logic lives entirely in core/expiry.py and runs as a single asyncio task co-hosted with the FastAPI application process.

_stop_event: asyncio.Event | None = None
_task:       asyncio.Task | None  = None


async def _run_expiry_loop(interval_seconds: int = 60) -> None:
    logger.info("Expiry loop iniciado (intervalo: %ds)", interval_seconds)
    while True:
        try:
            await _expire_mailboxes()
        except Exception as exc:
            logger.error("Error en expiry loop: %s", exc, exc_info=True)

        try:
            await asyncio.wait_for(
                _stop_event.wait(),
                timeout=interval_seconds,
            )
            break  # stop signal received
        except asyncio.TimeoutError:
            pass   # normal interval — continue

The sleep mechanism is worth examining. Rather than await asyncio.sleep(interval_seconds), the loop uses asyncio.wait_for(_stop_event.wait(), timeout=interval_seconds). A sleeping coroutine cannot be interrupted without task cancellation. An event wait can be signaled immediately. When the FastAPI lifespan shuts down, stop_expiry_task() sets the event:

async def stop_expiry_task() -> None:
    if _stop_event:
        _stop_event.set()
    if _task:
        try:
            await asyncio.wait_for(_task, timeout=5.0)
        except (asyncio.TimeoutError, asyncio.CancelledError):
            _task.cancel()

On graceful shutdown, the worker exits within milliseconds rather than waiting up to 60 seconds for a sleep to expire. The task is started from the FastAPI lifespan:

@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    start_expiry_task(interval_seconds=60)
    yield
    await stop_expiry_task()
    await close_redis()

The expiration operation itself is a single SQL statement:

async def _expire_mailboxes() -> int:
    now = datetime.now(timezone.utc)
    async with AsyncSessionLocal() as db:
        result = await db.execute(
            update(Mailbox)
            .where(
                Mailbox.expires_at <= now,
                Mailbox.is_active == True,
            )
            .values(is_active=False)
        )
        await db.commit()
        count = result.rowcount
        if count:
            logger.info("Expirados %d buzones (ts=%s)", count, now.isoformat())
        return count

One asyncio task. One SQL statement per cycle. No Python-level iteration over rows, no loading messages into memory, no nested queries. The entire operation is a single round-trip to PostgreSQL, bounded by the number of rows matching the WHERE clause.

Handling Concurrency and Race Conditions

The codebase has an explicit constraint documented at the module level:

IMPORTANTE: Solo compatible con --workers 1 en uvicorn.

This constraint exists because the expiry task runs inside the asyncio event loop of a single OS process. With multiple uvicorn workers, each process starts its own asyncio.Task, and all of them execute the same UPDATE query concurrently. PostgreSQL's row-level locking means only one transaction's changes are applied per row, but you get N redundant round-trips and N lock contentions per cycle with no correctness benefit.

Within a single worker, asyncio's cooperative multitasking serializes access: _expire_mailboxes() runs to completion in one coroutine context. No other coroutine executes concurrently during the await db.execute(...) call at the Python layer. At the database layer, row-level locks still apply, but there is only one writer.

The subtler race is between message delivery and the expiry window:

T+0:00  inbox expires_at = T+0:00
T+0:00  SMTP handler receives a message
T+0:01  expiry worker runs and marks is_active=False

Between T+0:00 and T+0:01, the inbox has passed its expiration time but is_active is still True. Without an additional guard, a message could be delivered into an inbox whose TTL has elapsed.

The delivery code closes this window explicitly:

select(Mailbox).where(
    Mailbox.address == address,
    Mailbox.is_active == True,
    Mailbox.expires_at > now,   # closes the expiry window
)

The background worker's state transition is an optimization — it keeps active-mailbox list queries and quota checks efficient without re-evaluating expires_at on every read. But it is not the primary enforcement mechanism. Delivery correctness does not depend on the worker having run.

Performance Considerations

The composite index. The expiration query filters on expires_at <= now and is_active = True. The composite index (expires_at, is_active) lets PostgreSQL satisfy both conditions in one B-tree range scan. As the proportion of is_active=False rows grows over time, a partial index CREATE INDEX ... ON mailboxes (expires_at) WHERE is_active = TRUE would be more selective — it only indexes the rows the expiry worker needs to find, and it automatically shrinks as rows are expired.

Bulk UPDATE vs. per-row operations. The worker issues a single UPDATE ... WHERE ... rather than selecting expired rows and iterating. This keeps the transaction tight, eliminates N+1 round-trips, and lets PostgreSQL optimize the write path. For tables with millions of rows, this design would need batching (LIMIT + ORDER BY expires_at) to avoid long-held locks on large sets. At the current scale, the single-statement approach is clean and correct.

60-second polling interval. The worker fires every 60 seconds with no jitter. For most use cases — inbox TTLs measured in minutes to hours — 60-second precision is entirely adequate. In multi-tenant scenarios where many inboxes share the same creation time (and therefore the same expiry time), the UPDATE touches many rows at once. Adding jitter to creation timestamps or to the polling interval would spread write load if this ever became a bottleneck.

Soft-delete and table growth. Because expiration is a flag flip and not a physical delete, the mailboxes table grows monotonically. The messages table grows faster still, given that raw_email stores full RFC 2822 bytes. Periodic hard-deletes of old is_active=False rows — relying on the CASCADE FK to clean up messages — would reclaim storage and keep index sizes bounded.

Failure Scenarios

Worker crash mid-cycle. The expiry loop wraps _expire_mailboxes() in a try/except that logs the error and continues:

try:
    await _expire_mailboxes()
except Exception as exc:
    logger.error("Error en expiry loop: %s", exc, exc_info=True)

A failed iteration does not kill the loop. The worker retries on the next cycle. Any inboxes that should have been expired during a failed cycle are caught on the next successful run — the expires_at <= now query is self-correcting across restarts.

Process restart. The expiry task holds no external state. If the uvicorn process restarts, the task restarts with it. On first run, _expire_mailboxes() catches all inboxes whose expires_at is in the past, regardless of how long the process was down. There is no watermark, no checkpoint file, no state to recover. The query is always expires_at <= now.

Delayed expiration. The maximum delay between an inbox's expires_at and when the background worker marks it inactive is one polling interval (60 seconds) plus commit time. During this window, is_active is still True, but the delivery layer's expires_at > now check prevents new messages from being accepted. From the API consumer's perspective, the inbox may still appear in list queries for up to 60 seconds after its TTL expires — a known and documented characteristic.

Orphaned inboxes. An orphaned inbox — one not expired due to the worker being down during a deployment — is recovered automatically on the next successful cycle. The stateless query design means there is no "orphan window" that persists beyond the next worker run.

Future Improvements

Distributed locks for multi-worker deployments. The single-worker constraint is the most significant architectural limitation. The direct fix is a distributed lock acquired before the UPDATE cycle — PostgreSQL advisory locks (pg_try_advisory_lock) or Redis Redlock. The worker that holds the lock runs the UPDATE; others skip their cycle. This makes the expiry task safe to co-deploy across N API processes.

Queue-based expiration. An alternative to polling is delayed message delivery. At inbox creation, publish a message to a delay queue with delivery_time = expires_at — a Redis Sorted Set scored by expiry timestamp, or Amazon SQS with message delay. A consumer pops messages when they become due and marks the inbox inactive. This eliminates the polling window entirely and reduces database load for sparse expiration patterns.

Redis ZSET scheduling. Redis Sorted Sets are a natural primitive for TTL scheduling. Add each inbox address to a ZSET at creation with score = expires_at.timestamp(). A worker calls ZRANGEBYSCORE mailbox_expirations 0 <now> to retrieve all due expirations, updates the database, then removes the entries from the set with ZREM. This works across multiple processes without explicit locking and gives sub-second expiration accuracy.

Partial index for active inboxes. Replacing the current composite index with a partial index ON mailboxes (expires_at) WHERE is_active = TRUE would reduce index size and maintenance overhead. As inboxes expire and is_active flips, rows fall out of the partial index automatically.

Explicit hard-delete background job. The current design never hard-deletes. A low-priority job that removes mailbox rows older than N days (with cascade deleting messages) would keep the table size bounded. Given that raw_email stores complete RFC 2822 bytes, the storage impact of unbounded message accumulation is more significant here than in typical metadata-only schemas.

Conclusion

The expiring inbox system in uncorreotemporal.com separates three things that are easy to conflate: expiration enforcement, expiration state, and storage reclamation. The delivery layer enforces expiration at the timestamp level — a message cannot be delivered to an inbox past its expires_at, regardless of whether the background worker has run. The background worker maintains is_active state so that list queries and quota checks remain efficient. Storage reclamation is delegated to PostgreSQL's cascade delete, triggered only by explicit action.

The design is deliberately minimal: one asyncio task, one SQL statement per cycle, one composite index. The 60-second polling interval is coarse but sufficient for the TTLs the system serves. The single-worker constraint is documented honestly and identifies the exact upgrade path when horizontal scaling becomes necessary.

The full system is running at https://uncorreotemporal.com. Anonymous inboxes require no signup — POST /api/v1/mailboxes?ttl_minutes=1 creates a live inbox with a 60-second TTL and returns the expires_at timestamp, letting you observe the full expiration lifecycle in real time.

DEV Community