Idempotency, Consistency, and Race-Safe Design
This section focuses on the critical final steps of the user journey: securely processing payments and reliably issuing digital tickets. These operations demand high integrity, fault tolerance, and consistency to prevent double-charging, lost tickets, or overselling — while also protecting sensitive financial data and preventing race conditions.
Today, we're going to architect a production-grade solution using a modern, event-driven approach. Our design is built on three key principles:
- Microservice Architecture: We'll separate our system into distinct services, each with one job.
- Saga Pattern: We'll manage the complex, multi-step purchase process as a single, coordinated workflow.
- Transactional Outbox: We'll use this pattern as the "atomic glue" that guarantees our services stay in sync, even if one of them fails.
The Saga Pattern
The Saga Pattern is a design pattern for managing distributed transactions across multiple services. Instead of using traditional ACID transactions (which don't work well across service boundaries), sagas break down complex business processes into a sequence of local transactions, each with a corresponding compensating action.
How it works:
- Each step in the saga is a local transaction within a single service
- If any step fails, the saga executes compensating transactions to undo previous steps
- This ensures eventual consistency across the entire system
- Perfect for complex workflows like: Reserve Seats → Process Payment → Generate Tickets → Send Notifications
Transactional Outbox Pattern
The Transactional Outbox Pattern solves the dual-write problem in distributed systems. When you need to update your database AND publish an event (like sending a notification), you can't do both atomically across service boundaries.
Why it's useful:
- Atomicity: Write to database and outbox table in the same transaction
- Reliability: Guarantees events are eventually published, even if the service fails
- Consistency: Ensures database state and event publishing stay in sync
- Resilience: Handles network failures, service restarts, and partial failures gracefully
How it works:
- Application writes business data AND event data to the same database transaction
- A separate process (outbox processor) reads from the outbox table
- Publishes events to message queues (Kafka, RabbitMQ, etc.)
- Marks events as processed to prevent duplicates
The central theme is separation of concerns. Let's see how it works.
The Core Architecture: Separating Our Concerns
We'll decompose our system into three primary services. The key is to define what each service is—and is not—responsible for.
-
Order Management Service: This is our "smart" orchestrator. It acts as the brain of the operation, managing the business workflow (the Saga). Its only concern is the state of an order (e.g.,
pending,paid,tickets_issued). - Payment Processing Service: This is a "dumb," single-purpose service. Its only concern is interacting with a payment gateway (like Stripe) and reporting success or failure. It knows nothing about seats or tickets.
- Ticket Issuance Service: This is another "dumb" service. Its only concern is managing inventory (seats, sections) and generating ticket records. It knows nothing about credit cards or payment intents.
This separation is what allows us to scale and maintain the system. We can update our payment provider without ever touching the ticket generation code. If the ticket service is slow, it won't block new payments from being accepted.
But how do these separate services talk to each other reliably?
The Glue: A Saga for Distributed Transactions
You can't use a single database COMMIT across three different services. So, how do you guarantee that a successful payment always results in a ticket?
This is where the Saga and Transactional Outbox patterns come in.
-
The Saga: Our "Order Management Service" will manage the purchase as a step-by-step process.
- Step 1: Tell the
Payment Serviceto process a payment. - Step 2 (if success): Tell the
Ticket Issuance Serviceto generate tickets. - Step 3 (if fail): Tell the
Ticket Issuance Service(or Reservation Service) to release the held seats.
- Step 1: Tell the
-
The Transactional Outbox: This is the mechanism that makes the Saga reliable. When the
Payment Serviceconfirms a payment, it can't just hope to send a message to theTicket Issuance Service. What if it crashes right after it saves the payment to its own database? The payment would be successful, but the ticket would never be issued.The Outbox Pattern solves this. In a single, atomic database transaction, the
Payment Servicedoes two things:- Updates the
paymentstable tostatus = 'successful'. - Inserts a message into a
transactional_outboxtable in its own database.
Because this is one transaction, it's 100% atomic. It's impossible for the payment to be marked "successful" without the "issue tickets" message being created. A separate background process (a "message relay") then reliably publishes this message from the outbox to the rest of the system.
- Updates the
How the Message Relay Works:
The message relay is a critical infrastructure component that bridges the gap between database transactions and message queues. It has two common implementation patterns:
Polling Pattern: A simple service that polls the
transactional_outboxtable every few seconds forstatus = 'pending'events, publishes them to Kafka/RabbitMQ, then marks them asstatus = 'published'. This is reliable but has higher latency (2-5 seconds).Change Data Capture (CDC) Pattern: A more advanced approach using tools like Debezium that tail the database's transaction log (WAL) and instantly convert table inserts into Kafka messages. This offers lower latency (milliseconds) but requires more sophisticated infrastructure.
Both patterns guarantee that every outbox event is eventually published, even if the service fails mid-process.
This is the ultimate separation of concerns. The Payment Service's only job is to update its own state and "leave a memo" in its outbox. It doesn't know or care who reads that memo, allowing our services to be beautifully decoupled.
The Blueprint: Our Production-Grade Database Schema
-- Create ENUM types for reusability
CREATE TYPE order_payment_status_enum AS ENUM (
'pending', -- Order created, awaiting payment
'confirmed_optimistic', -- Client-side /confirm received, awaiting webhook
'paid', -- Authoritative webhook received
'failed', -- Payment failed
'refunded' -- Payment refunded
);
CREATE TYPE payment_transaction_status_enum AS ENUM (
'initiated', -- Payment intent created
'confirmed_optimistic', -- Client-side /confirm received
'successful', -- Authoritative webhook received
'failed', -- Payment failed
'refunded' -- Payment refunded
);
CREATE TYPE ticket_status_enum AS ENUM ('pending', 'issued', 'failed');
CREATE TYPE ticket_scan_status_enum AS ENUM ('active', 'scanned', 'canceled');
-- New ENUM for outbox processing
CREATE TYPE outbox_status_enum AS ENUM ('pending', 'sent');
-- Orders table with constraints
CREATE TABLE orders (
order_id UUID PRIMARY KEY,
reservation_id UUID REFERENCES reservations(reservation_id) UNIQUE,
payment_status order_payment_status_enum NOT NULL DEFAULT 'pending',
ticket_status ticket_status_enum NOT NULL DEFAULT 'pending',
created_at TIMESTAMP DEFAULT now(),
updated_at TIMESTAMP DEFAULT now()
);
-- Payments table with transaction tracking
CREATE TABLE payments (
payment_id UUID PRIMARY KEY,
order_id UUID NOT NULL REFERENCES orders(order_id), -- NO CASCADE DELETE
payment_gateway_id VARCHAR(100),
payment_method_token TEXT NOT NULL,
payment_intent_id VARCHAR(100) NULL,
transaction_reference VARCHAR(100) NULL,
amount_minor_units INT NOT NULL,
currency CHAR(3) NOT NULL CHECK (char_length(currency) = 3),
status payment_transaction_status_enum NOT NULL DEFAULT 'initiated',
idempotency_key UUID UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT now(),
updated_at TIMESTAMP DEFAULT now()
);
-- Tickets table with seat/section exclusivity
CREATE TABLE tickets (
ticket_id UUID PRIMARY KEY,
order_id UUID NOT NULL REFERENCES orders(order_id), -- NO CASCADE DELETE
event_id UUID NOT NULL REFERENCES events(event_id),
seat_id VARCHAR(50) REFERENCES seats(seat_id) NULL,
section_id VARCHAR(50) REFERENCES sections(section_id) NULL,
owner_user_id VARCHAR(50) REFERENCES users(user_id) NULL,
qr_code_url TEXT,
status ticket_scan_status_enum NOT NULL DEFAULT 'active',
issued_at TIMESTAMP DEFAULT now(),
UNIQUE(order_id, seat_id), -- prevent duplicate seat assignment within same order
CHECK (
(seat_id IS NOT NULL AND section_id IS NULL)
OR (seat_id IS NULL AND section_id IS NOT NULL)
)
);
-- Critical constraint: prevent double-booking across orders for ACTIVE tickets
CREATE UNIQUE INDEX idx_tickets_event_seat_unique_active
ON tickets (event_id, seat_id)
WHERE seat_id IS NOT NULL AND status IN ('active', 'scanned');
-- NEW: Transactional Outbox Table
-- This table guarantees atomic event publishing
CREATE TABLE transactional_outbox (
event_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(255) NOT NULL,
aggregate_id VARCHAR(255) NOT NULL,
topic VARCHAR(255) NOT NULL,
payload JSONB NOT NULL,
status outbox_status_enum NOT NULL DEFAULT 'pending',
created_at TIMESTAMP WITH TIME ZONE DEFAULT now()
);
-- Index for the message relay to efficiently find pending events
CREATE INDEX idx_outbox_pending_status ON transactional_outbox (status)
WHERE status = 'pending';
-- Indexes for performance
CREATE INDEX idx_orders_reservation_id ON orders (reservation_id);
CREATE INDEX idx_payments_order_id ON payments (order_id);
CREATE INDEX idx_tickets_event_status ON tickets (event_id, status);
CREATE INDEX idx_tickets_owner ON tickets (owner_user_id);
The API & Workflow: A Step-by-Step Breakdown
Here is the complete user flow, showing how each API call and background process works within our separated architecture.
Payment Processing APIs
1. Initiate Payment
This is the first step. The client asks the Payment Service to create a payment intent. No money is charged yet - this only creates the payment intent with Stripe.
Endpoint: POST /payments
Request Body:
{
"order_id": "order_789",
"payment_method_token": "pm_123456789",
"idempotency_key": "idem_987654321"
}
Response:
{
"payment_id": "pay_123456789",
"status": "initiated",
"payment_intent_id": "pi_stripe_abc123"
}
Associated SQL Operations:
-- 1. Check for idempotency
SELECT payment_id, status
FROM payments
WHERE idempotency_key = 'idem_987654321';
-- 2. Get server-side price (NEVER trust the client)
SELECT
r.total_amount_minor_units, r.currency
FROM orders o
JOIN reservations r ON o.reservation_id = r.reservation_id
WHERE o.order_id = 'order_789';
-- 3. [Application calls Stripe API with the server-side amount]
-- 4. Create the payment record
INSERT INTO payments (
payment_id, order_id, payment_intent_id, amount_minor_units, currency,
status, idempotency_key, created_at
) VALUES (
'pay_123456789', 'order_789', 'pi_stripe_abc123',
r.total_amount_minor_units, -- ✅ Server-validated amount
r.currency, -- ✅ Server-validated currency
'initiated', 'idem_987654321', NOW()
);
-- ⚠️ NOTE: This creates the PaymentIntent but does NOT charge the user yet.
-- The actual payment happens on the client-side when the user confirms.
Complete Payment Flow:
-
POST /payments(this endpoint): Creates PaymentIntent, returnsclient_secretto browser -
Client-Side: Browser uses
client_secretwith Stripe.js to show payment form -
Client-Side: User clicks "Pay" →
stripe.confirmCardPayment()is called → This is when money moves! -
Client-Side: On success, browser calls
POST /payments/{id}/confirm(optimistic UX) -
Server-Side: Stripe sends
POST /webhooks/payments(authoritative confirmation)
2. Payment Webhook (The Asynchronous Source of Truth)
This is the most important endpoint. It's called by Stripe's servers, not the user. It is the authoritative record of a transaction.
Endpoint: POST /webhooks/payments
Headers: X-Webhook-Signature: <HMAC_SHA256>
Hardened Implementation Logic:
- Verify Signature: Cryptographically verify the signature.
- Respond Immediately: Return a
200 OKto Stripe to acknowledge receipt. - Queue for Processing: Place the event on an internal queue.
Asynchronous Handler SQL (Processing the event from the internal queue):
This logic demonstrates our separation of concerns perfectly. The Payment Service's only job is to update its own tables and (atomically) create an outbox event. It has no knowledge of how to issue a ticket.
-- CASE 1: Payment Successful (e.g., 'payment_intent.succeeded')
BEGIN;
-- Step 1: Update the payment record idempotently.
UPDATE payments
SET
status = 'successful',
transaction_reference = 'txn_987654321', -- from webhook
updated_at = NOW()
WHERE payment_intent_id = 'pi_stripe_abc123'
AND status IN ('initiated', 'confirmed_optimistic');
-- Step 2: Update the master order status.
UPDATE orders
SET
payment_status = 'paid',
updated_at = NOW()
WHERE order_id = (SELECT order_id FROM payments WHERE payment_intent_id = 'pi_stripe_abc123')
AND payment_status IN ('pending', 'confirmed_optimistic');
-- Step 3: Update reservation status to confirmed
UPDATE reservations
SET
status = 'confirmed',
confirmed_at = NOW(),
updated_at = NOW()
WHERE reservation_id = (
SELECT reservation_id FROM orders WHERE order_id = (
SELECT order_id FROM payments WHERE payment_intent_id = 'pi_stripe_abc123'
)
)
AND status = 'pending_payment';
-- Step 4: Create the outbox event to trigger Ticket Issuance
-- This is ATOMIC with the reservation update.
INSERT INTO transactional_outbox (
aggregate_type, aggregate_id, topic, payload
)
VALUES (
'order',
(SELECT order_id FROM payments WHERE payment_intent_id = 'pi_stripe_abc123'),
'order.paid',
'{ ... webhook payload ... }'
);
COMMIT;
-- CASE 2: Payment Failed (e.g., 'payment_intent.payment_failed')
BEGIN;
-- Step 1: Update payment to failed
UPDATE payments
SET status = 'failed', updated_at = NOW()
WHERE payment_intent_id = 'pi_stripe_abc123'
AND status IN ('initiated', 'confirmed_optimistic');
-- Step 2: Update order to failed
UPDATE orders
SET payment_status = 'failed', updated_at = NOW()
WHERE order_id = (SELECT order_id FROM payments WHERE payment_intent_id = 'pi_stripe_abc123')
AND payment_status IN ('pending', 'confirmed_optimistic');
-- Step 3: Update reservation status to canceled
UPDATE reservations
SET
status = 'canceled',
canceled_at = NOW(),
updated_at = NOW()
WHERE reservation_id = (
SELECT reservation_id FROM orders WHERE order_id = (
SELECT order_id FROM payments WHERE payment_intent_id = 'pi_stripe_abc123'
)
)
AND status = 'pending_payment';
-- Step 4: Create outbox event to trigger reservation release
INSERT INTO transactional_outbox (
aggregate_type, aggregate_id, topic, payload
)
VALUES (
'order',
(SELECT order_id FROM payments WHERE payment_intent_id = 'pi_stripe_abc123'),
'order.failed',
'{ ... failure details ... }'
);
COMMIT;
3. Confirm Payment (The Optimistic UX Endpoint)
This endpoint is only for user experience. It's called by the user's browser to give them an instant success screen, without waiting for the (slower) webhook.
Endpoint: POST /payments/{payment_id}/confirm
Response:
{
"payment_id": "pay_123456789",
"status": "confirmed_optimistic",
"message": "Payment confirmed. Your tickets will be issued shortly."
}
Associated SQL Operations:
This transaction is deliberately fast and lightweight. It only updates the status to confirmed_optimistic. It does not trigger ticket issuance.
BEGIN;
-- Step 1: Mark the payment as optimistically confirmed
UPDATE payments
SET
status = 'confirmed_optimistic',
updated_at = NOW()
WHERE payment_id = 'pay_123456789'
AND status = 'initiated';
-- Step 2: Mark the order as optimistically confirmed
UPDATE orders
SET
payment_status = 'confirmed_optimistic',
updated_at = NOW()
WHERE order_id = (SELECT order_id FROM payments WHERE payment_id = 'pay_123456789')
AND payment_status = 'pending';
COMMIT;
The Saga Continues: Asynchronous Ticket Issuance
This is not an API. This is where our separation of concerns pays off. The Ticket Issuance Service has one job: listen for order.paid events and make tickets.
Trigger: Consuming an order.paid event from the message bus (which was reliably published by the Payment Service's outbox).
Associated SQL Operations (Deadlock-Proofed):
This is the big, complex transaction. By moving it to a background worker, we prevent it from ever blocking our payment flow. If it fails, it can safely retry without losing data.
BEGIN;
-- Step 1: IDEMPOTENCY CHECK - Prevent duplicate ticket generation
-- This makes the entire operation idempotent against message re-delivery
SELECT ticket_status
FROM orders
WHERE order_id = 'order_789'
FOR UPDATE; -- Lock the order row
-- [Application Logic]
-- IF ticket_status is already 'issued', COMMIT immediately and ACK the message.
-- The order is already processed - no need to generate tickets again.
-- IF ticket_status is 'pending', proceed with the rest of the logic...
-- Step 2: Lock the parent reservation
SELECT *
FROM reservations r
WHERE r.reservation_id = (
SELECT reservation_id FROM orders WHERE order_id = 'order_789'
)
AND r.status = 'confirmed'
FOR UPDATE;
-- Step 3: CRITICAL: Lock seats in a consistent order
-- This (ORDER BY s.seat_id) is the key to preventing deadlocks
SELECT *
FROM seats s
WHERE s.seat_id IN (
SELECT rs.seat_id
FROM reservation_seats rs
JOIN orders o ON rs.reservation_id = o.reservation_id
WHERE o.order_id = 'order_789'
)
AND s.status = 'reserved'
ORDER BY s.seat_id -- This deterministic ordering is vital
FOR UPDATE;
-- Step 4: Get reservation details for ticket generation
SELECT
rs.seat_id, rsg.section_id, rsg.quantity, r.event_id
FROM reservations r
LEFT JOIN reservation_seats rs ON r.reservation_id = rs.reservation_id
LEFT JOIN reservation_ga rsg ON r.reservation_id = rsg.reservation_id
WHERE r.reservation_id = (
SELECT reservation_id FROM orders WHERE order_id = 'order_789'
);
-- Step 5: Generate individual tickets (example)
INSERT INTO tickets (
ticket_id, order_id, event_id, seat_id,
section_id, qr_code_url, status, issued_at
) VALUES
('TKT-ABC123DEF456', 'order_789', 'event_456', 'A-101', NULL, '...', 'active', NOW()),
('TKT-JKL345MNO678', 'order_789', 'event_456', NULL, 'GA1', '...', 'active', NOW());
-- Step 6: Atomically update seats from 'reserved' to 'sold'
UPDATE seats
SET status = 'sold', updated_at = NOW()
WHERE seat_id IN (
SELECT rs.seat_id
FROM reservation_seats rs
JOIN orders o ON rs.reservation_id = o.reservation_id
WHERE o.order_id = 'order_789'
)
AND status = 'reserved';
-- Step 7: Atomically update reservation to 'completed'
UPDATE reservations
SET status = 'completed', updated_at = NOW()
WHERE reservation_id = (
SELECT reservation_id FROM orders WHERE order_id = 'order_789'
)
AND status = 'confirmed';
-- Step 8: Update order ticket status
UPDATE orders
SET ticket_status = 'issued', updated_at = NOW()
WHERE order_id = 'order_789';
-- Step 9: Create outbox event to notify user via email
INSERT INTO transactional_outbox (
aggregate_type, aggregate_id, topic, payload
)
VALUES (
'order',
'order_789',
'tickets.issued',
'{ ... ticket URLs and delivery info ... }'
);
COMMIT;
Supporting Operations
Ticket Status Update (Scan)
Endpoint: POST /tickets/{ticket_id}/scan
Associated SQL Operations:
BEGIN;
-- Lock the specific ticket row to prevent double-scans
SELECT ticket_id, status
FROM tickets
WHERE ticket_id = 'TKT-ABC123DEF456'
FOR UPDATE;
-- Check status and update (if active)
UPDATE tickets
SET
status = 'scanned',
scan_count = scan_count + 1,
scanned_at = NOW(),
updated_at = NOW()
WHERE ticket_id = 'TKT-ABC123DEF456'
AND status = 'active';
COMMIT;
Event Cancellation Refunds (Bulk Refunds)
Endpoint: POST /refunds/event-cancellation
Associated SQL Operations:
-- This query is CRITICAL: It selects the `transaction_reference`
-- which is the *actual charge ID* needed by the gateway to process a refund.
SELECT
p.payment_id, p.transaction_reference, p.amount_minor_units
FROM payments p
JOIN orders o ON p.order_id = o.order_id
JOIN reservations r ON o.reservation_id = r.reservation_id
WHERE r.event_id = 'event_456'
AND p.status = 'successful'
AND p.transaction_reference IS NOT NULL;
(The application then loops over these results, calls the gateway for each refund, and updates the payments and orders tables to status = 'refunded'.)
Verify Anonymous Refund Request
Endpoint: POST /refunds/verify
Associated SQL Operations:
-- 1. Validate order exists and is eligible for refund
SELECT
o.order_id, o.payment_status
FROM orders o
JOIN reservations r ON o.reservation_id = r.reservation_id
JOIN payments p ON o.order_id = p.order_id
WHERE o.order_id = 'order_789'
AND o.payment_status = 'paid'
AND p.status = 'successful';
-- 2. [Application Logic: Check rate limits]
-- 3. Generate and store verification code
INSERT INTO refund_verifications (
verification_id, order_id, contact_method,
contact_value, verification_code,
expires_at, created_at
) VALUES (
'ver_001', 'order_789', 'email',
'user@example.com', '123456', -- Securely generated OTP
NOW() + INTERVAL '15 minutes', NOW()
);
Our Design's Superpowers
By strictly separating concerns and using the Saga + Transactional Outbox patterns, our system is now:
-
Resilient: If the
Ticket Issuance Servicefails, payments are still safely accepted. Theorder.paidevents will simply wait in the outbox to be processed when the service recovers. - Consistent: We have an atomic guarantee that a committed payment will always result in an event to generate a ticket. It's impossible to charge a customer and forget to issue their ticket.
-
Scalable: We can add 100 more
Ticket Issuanceworkers to handle a massive on-sale event without ever needing to touch thePayment Service. -
Maintainable: Our services are small and focused. Our
Payment Serviceteam can work on compliance and gateways, while theTicket Issuanceteam can optimize for inventory and concurrency. They don't need to be in the same meetings.
Further Improving Consistency: The Reconciliation Service
While our Saga + Transactional Outbox patterns provide strong consistency guarantees, production systems can benefit from an additional layer of validation. A Reconciliation Service can further enhance data consistency across all system components by continuously monitoring and validating that all financial transactions, seat allocations, and ticket issuances are properly synchronized.
Why it's valuable as an afterthought:
- Financial Accuracy: Ensures payment amounts match between our system and payment gateways
- Inventory Consistency: Verifies seat allocations match between reservations and actual seat status
- Ticket Integrity: Confirms all issued tickets correspond to valid, paid orders
- Audit Compliance: Provides complete audit trails for regulatory requirements
- Discrepancy Detection: Identifies and flags inconsistencies before they impact customers
What it reconciles:
- Payment Gateway vs Database: Stripe/PayPal transaction records vs our payment records
- Reservation vs Seat Status: Reserved seats in our system vs actual seat availability
- Order vs Ticket Count: Number of tickets issued vs order quantities
- Revenue vs Transactions: Total revenue calculations vs individual payment sums
- Refund Consistency: Refund amounts vs original payment amounts
How it works:
- Scheduled Reconciliation: Runs every 15 minutes to check recent transactions
- Real-time Monitoring: Continuously validates critical operations (payments, ticket generation)
- Cross-system Validation: Compares data across payment gateways, databases, and external systems
- Automated Correction: Fixes minor discrepancies automatically (e.g., status updates)
- Alert Generation: Flags major discrepancies for manual investigation
- Audit Reporting: Generates comprehensive reports for compliance and analysis
This reconciliation service acts as a safety net, providing an additional layer of confidence that our distributed system maintains data integrity even under extreme load and complex failure scenarios.
Top comments (0)