137Foundry

Posted on Jun 21

Five Patterns for Making Data Integration Operations Safe to Retry

#api #productivity #programming

Every data integration pipeline has to handle retries, because every network boundary eventually produces a duplicate delivery. The patterns below are the five that show up most often in production-grade integration code, each with a clear use case and a clear set of trade-offs. The right choice depends on the operation shape and the cost structure of the integration.

This is a roundup of the patterns, with notes on when each one fits best and when to reach for a different tool.

Photo by Tom Fisk on Pexels

1. The Idempotency Key Pattern

The most common and most general pattern. The sender generates a stable UUID for each logical operation, includes it with every delivery attempt, and the receiver records processed UUIDs to dedupe duplicates.

Use case: any operation that produces side effects (creates, charges, sends, allocates) and needs to be safe under retries.

Trade-off: requires a dedup store at the receiver with a retention window longer than the maximum retry window. Storage cost scales with message volume.

The Wikipedia overview of idempotence covers the underlying property. The canonical implementation in production code is Stripe's "Idempotency-Key" header, which is the model most homegrown implementations follow.

The key implementation detail: generate the UUID at the source-of-truth event, not at send time. A UUID generated inside the sender function is unique per call, not unique per logical operation, which defeats the entire pattern.

2. The Absolute-State Pattern

Convert operations from relative changes ("increment count by 3") to absolute state ("set count to 7"). Absolute-state operations are naturally idempotent because re-applying produces the same result.

Use case: operations where you can express the desired end state as a value rather than a delta. Inventory levels, status flags, role assignments, configuration values.

Trade-off: messages carry more state, which costs bandwidth and may require sending fields the receiver does not change. For 50-field records where one field moved, sending all 50 is wasteful.

The Wikipedia overview of extract, transform, load processes covers the broader pattern of moving state to a target system in a retry-safe way. Absolute state is the easiest mental model for ETL because each batch overwrites the previous state.

The pattern works best for low-frequency state synchronization (nightly inventory sync) and worst for high-frequency event-stream integrations where bandwidth matters.

3. The Optimistic Concurrency / Version Check Pattern

The entity being modified carries a version number. Each write includes the expected current version, and the receiver only applies the write if the version matches.

A retry that arrives after the original applied finds a higher version on the entity and is harmlessly rejected. The sender treats the rejection as a successful retry resolution (the operation has already been applied).

Use case: relative operations on entities that can carry version metadata. The Wikipedia entry on database transactions covers the broader concurrency model this pattern fits into, and standard databases like PostgreSQL support row-level version semantics via xmin or explicit version columns.

Trade-off: requires a read to find the current version before each write, which doubles the network cost. For high-throughput integrations where reads are expensive, this becomes the bottleneck.

The pattern is more common in OLTP-style integrations (updating user records, modifying inventory) than in event-stream integrations (replicating a changelog), because OLTP already has version metadata on most entities.

4. The Saga Pattern for Multi-Step Operations

For operations that span multiple receivers and cannot be wrapped in a single transaction, the saga pattern uses local idempotent operations at each step with compensating actions to handle partial failures.

Each step in the saga is independently retry-safe via one of the other patterns (idempotency key, absolute state, version check). The saga adds a layer above the individual steps: if a later step fails, the orchestrator triggers compensating actions on the earlier steps to roll back.

Use case: any integration that touches more than one receiver and where partial application has to be cleaned up. Common in workflow orchestration, order processing across multiple services, multi-region replication.

Trade-off: significantly more design work than single-step patterns. Compensating actions are real code that has to be written and tested. The saga state machine has to be persisted and resumable.

Wikipedia's overview of the saga pattern covers the trade-offs against the simpler but more fragile two-phase commit protocol. Writing from practitioners like Martin Fowler at martinfowler.com covers the broader pattern in the context of microservice architectures.

5. The Broker-Level Dedup Pattern

Some message brokers (notably Apache Kafka with idempotent producer mode) handle a subset of the dedup problem at the transport layer. The producer attaches a sequence number to each message, and the broker rejects out-of-order or duplicate sequence numbers from the same producer.

Use case: producer-to-broker duplicate prevention. Useful as a foundation layer, not a complete solution.

Trade-off: only covers the producer-to-broker hop. Consumer-side duplicates (from rebalances or consumer crashes after processing but before offset commit) still happen and still need application-level idempotency.

The right framing is "necessary but not sufficient." Broker-level dedup eliminates one class of duplicate and reduces the volume the application-level dedup has to handle, but the consumer side of the integration still needs one of the other patterns.

How to Choose

A rough decision flow:

State-setting operation with infrequent updates and small messages: Absolute-state. Simplest pattern, naturally idempotent, no dedup store needed.

Side-effect operation (create, charge, send) with high volume: Idempotency key with event-time generation. Most general, requires receiver dedup store but scales well.

Relative operation on an entity that already has version metadata: Optimistic concurrency with version checks. Cleaner than idempotency keys when version is already present.

Multi-step operation across multiple receivers: Saga with idempotent steps. Required when partial failure cleanup is necessary.

Producer-to-broker messaging with consumer crash risk: Broker-level dedup plus application-level idempotency at the consumer. Layered defense.

Most production integrations use a mix. A typical pipeline might use absolute-state for inventory syncs, idempotency keys for outgoing webhooks, version checks for updates to user records, and broker-level dedup as a foundation underneath all of them.

Common Mistakes Across Patterns

A few patterns that reliably cause problems regardless of which idempotency pattern is in use:

Generating idempotency keys at send time. The key has to be stable across all retries of the same logical operation, which means generating it at the source-of-truth event, not at delivery attempt.

Unbounded dedup stores. Without TTL or partitioning, the dedup table grows until it becomes the bottleneck. Add cleanup as a baseline requirement, not as a follow-up.

Treating partial-success responses as success. A receiver that returns 200 OK after the side effect but before recording the dedup key opens a window where retries duplicate the side effect. The dedup record has to be written atomically with the side effect.

Skipping the observation phase during retrofit. Adding idempotency to an existing integration in one big-bang deploy is much riskier than the multi-step sequence of "record but do not enforce" then "enforce after observation."

Why Pattern Choice Matters

The choice of pattern shapes the cost structure of the integration over its lifetime. Idempotency keys with a well-managed dedup store have low ongoing cost but require disciplined design at the start. Absolute-state has minimal ongoing cost but constrains the message format. Sagas have high design cost but produce the cleanest behavior for multi-step operations.

The wrong pattern usually shows up as either ongoing operational pain (cleaning up dedup tables that grew too large) or design pain (trying to add a saga compensation flow after the integration is in production). The right pattern shows up as the absence of incidents.

For more depth on these patterns and the specific design choices that make each one production-grade, https://137foundry.com covers the broader engineering practice these patterns plug into. The 137Foundry services overview covers how integration design fits with the rest of the platform work, and the longer reference on idempotency in data integration pipelines walks through the unified theory that ties the five patterns together.

The Take

There is no single right pattern. There are five well-understood ones, each with a clear fit and a clear cost. The integrations that survive in production are the ones where the pattern was chosen deliberately for the operation shape, not the ones where the team picked one pattern and tried to make every operation fit it.

The five patterns above are the toolkit. Picking the right one is the engineering work.

DEV Community