DEV Community

Cover image for Writes done Right : Atomicity and Idempotency with Redis, Lua, and Go
Yashaswi Kumar Mishra
Yashaswi Kumar Mishra

Posted on

Writes done Right : Atomicity and Idempotency with Redis, Lua, and Go

Life would have been easy if the world were filled with monolithic functions. Simple functions that execute once, and if they crash, we could try again.
But we build distributed systems and life isn’t that easy (but it is fun).

A classic scenario is : A user on our system clicks “Pay Now”. The backend needs to do two things :

  • Deduct the balance from the user’s wallet (Postgres).
  • Publish an event to send a confirmation email and notify the warehouse (Redis/Kafka).

This looks simple enough in code. But what if the database commits the transaction but the network acts up before the event is published onto the communication queue?
The user is charged, but the email is never sent, and so the warehouse never ships the item.
And if we try reversing it, the email is sent, but we do not get the payment.

This is the Dual Write Problem, and it is the silent killer of data integrity in microservices. To solve it, we cannot rely on luck; we need two architectural pillars: Atomicity and Idempotency.

In this one, we are going to move beyond the "happy path." We will architect a robust backend system using Go, Redis, and Postgres. We will implement the Transactional Outbox Pattern to guarantee that our database writes and event publications happen together (Atomicity), and we will use Redis to ensure that even if a user clicks the button ten times, they are only charged once (Idempotency).

You can find the complete, runnable source code for this implementation on my GitHub:
HERE

The Transactional Outbox Pattern

We have to solve the “Dual Write Problem”. For this, we need to stop treating our database and our message broker as two separate entities during the request lifecycle. They need to behave as one.

Since we cannot make Redis (our message broker here) participate in a Postgres transaction (without complex and slow 2-phase commits), we have to bring the message queue to the database.

This is the Transactional Outbox Pattern.

Instead of publishing a message directly to Redis, we insert the message into a local SQL table called outbox within the same transaction that modifies our business data.
This is an outgoing mailbox. We simply place the mail here and it is taken to the post-office (redis) by a mail carrier (background worker).
If the mailbox burns down (database rollback), the letter is burnt with that, so letter is never delivered and the recipient is not confused.
This preserves Consistency.

Our system needs to behave like this :

HLD

Let us set our schema up for our business table (orders) and our outbox (i.e the mailbox).

-- 1. The Business Table: Stores the actual state
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    amount INT NOT NULL, 
    status VARCHAR(50) DEFAULT 'PENDING',
    created_at TIMESTAMP DEFAULT NOW()
);

-- 2. The Outbox Table: Stores the intent to publish
CREATE TABLE outbox (
    id UUID PRIMARY KEY,
    event_type VARCHAR(255) NOT NULL, -- e.g., "order.created"
    payload JSONB NOT NULL,           -- The actual data to publish
    status VARCHAR(50) DEFAULT 'PENDING',
    created_at TIMESTAMP DEFAULT NOW()
);
Enter fullscreen mode Exit fullscreen mode

As we have a clear picture of what to do now, we can write the go code to perform the operation. Let’s write the CreateOrder function.

type Order struct {
    ID     uuid.UUID `json:"id"`
    UserID uuid.UUID `json:"user_id"`
    Amount int       `json:"amount"`
}

func CreateOrder(ctx context.Context, db *sql.DB, order Order) error {
    // 1. Start the Transaction
    // This is the boundary of our atomicity.
    tx, err := db.BeginTx(ctx, nil)
    if err != nil {
        return fmt.Errorf("failed to begin transaction: %w", err)
    }

    // Defer a rollback. If the function exits without a Commit,
    // all changes (both order and outbox) are discarded.
    defer tx.Rollback()

    // 2. Insert the Business Record
    _, err = tx.ExecContext(ctx, 
        `INSERT INTO orders (id, user_id, amount) VALUES ($1, $2, $3)`,
        order.ID, order.UserID, order.Amount)
    if err != nil {
        return fmt.Errorf("failed to insert order: %w", err)
    }

    // 3. Insert the Outbox Record
    // We marshal the order data to JSON to serve as the event payload.
    payload, err := json.Marshal(order)
    if err != nil {
        return fmt.Errorf("failed to marshal payload: %w", err)
    }

    _, err = tx.ExecContext(ctx,
        `INSERT INTO outbox (id, event_type, payload) VALUES ($1, $2, $3)`,
        uuid.New(), "order.created", payload)
    if err != nil {
        return fmt.Errorf("failed to insert outbox event: %w", err)
    }

    // 4. Commit the Transaction
    // This is the moment of truth. Both the order and the event become visible
    // to the rest of the system at the exact same instant.
    if err := tx.Commit(); err != nil {
        return fmt.Errorf("failed to commit transaction: %w", err)
    }

    return nil
}
Enter fullscreen mode Exit fullscreen mode

In this function, we aren't talking to Redis yet. We are strictly talking to Postgres. We prepare the order, we prepare the message, and we commit them together. All or nothing. (This is Atomicity).

By the end of this function, we have guaranteed that if an order exists, a corresponding event exists in the outbox table waiting to be picked up. We have successfully persisted our intent.

The Guard : Implementing Idempotency

In distributed systems, we cannot really trust the network.

A client might send a request, the server processes it, but the acknowledgment gets lost on the way back. The client, thinking the request failed, retries.

If we aren't careful, we just processed the same order twice.

To prevent this, we use Idempotency Keys. The client generates a unique ID (like a UUID) for every specific action and sends it in the header (Idempotency-Key).

We will use Redis to track these keys because it's fast and supports atomic operations.

Three States of a Request

To handle the robustness, a request key (idempotency key) isn’t just "present" or "absent.” It has three life stages :

  • Null (New): We have never seen this key. Lock it and proceed.
  • "PENDING" (In Progress): We are currently processing this key on another thread/machine. Tell the user to wait (409 Conflict).
  • JSON Payload (Completed): We finished this request 5 minutes ago. Return the saved response immediately.

For this, we will need Lua. WHY?

You might be tempted to write code like this :

// BAD CODE: DO NOT USE
if redis.Get(key) == "" {
    redis.Set(key, "PENDING")
    // ... process ...
}
Enter fullscreen mode Exit fullscreen mode

This is dangerous. If two requests hit the same line at the exact same microsecond, both will see Get(key) == "" and both will proceed. This is a Race Condition.

Here is the disaster scenario:

  • Time 0.00s: Request A asks Redis: "Does Key X exist?" → Redis says: "No."
  • Time 0.00s: Request B asks Redis: "Does Key X exist?" → Redis says: "No."
  • Time 0.01s: Request A thinks: "Great, I'm first!" → Starts processing payment.
  • Time 0.01s: Request B thinks: "Great, I'm first!" → Starts processing payment.

You just charged the user twice.
The solution is Atomic Lock and Proceed.
To fix this, the "Check" and the "Lock" must happen as one indivisible action. In Redis, we do this with a Lua script. Redis guarantees that a Lua script is atomic, no other command runs while the script is executing.

The following is the Lua Script we write :

-- script.lua
local key = KEYS[1]
local pending_status = ARGV[1] -- Value: "PENDING"
local ttl = ARGV[2]            -- Expiration: e.g., 60s

-- 1. Check if key exists
local value = redis.call("GET", key)

-- 2. If it exists, return the value 
-- (This could be "PENDING" or the final JSON response)
if value then
    return value 
end

-- 3. If not, lock it with "PENDING" status and a TTL
-- The TTL is crucial: If our server crashes mid-process, 
-- this key must eventually expire so the user can retry.
redis.call("SET", key, pending_status, "EX", ttl)
return nil
Enter fullscreen mode Exit fullscreen mode

Why the TTL (Time-To-Live) matters?

Imagine your Go server crashes exactly after acquiring the lock (setting "PENDING") but before it writes to the database.

  • The key is set to "PENDING".
  • The server is dead. It never deletes the key or updates it with a result.
  • The user retries. Redis sees "PENDING" and blocks them.
  • The user is locked out of their account forever.

The TTL is the "Dead Man's Switch." If the server dies, Redis will automatically delete the "PENDING" key after 60 seconds, allowing the user to try again.

Now we can wrap the business logic with our guard.

func HandleCreateOrder(w http.ResponseWriter, r *http.Request, db *sql.DB, rdb *redis.Client) {
    // 1. Get the Idempotency Key from headers
    idempotencyKey := r.Header.Get("Idempotency-Key")
    if idempotencyKey == "" {
        http.Error(w, "Missing Idempotency-Key", http.StatusBadRequest)
        return
    }

    // 2. Execute the Lua Guard
    ctx := r.Context()
    // Returns: nil (acquired lock), "PENDING", or "{"id":...}"
    val, err := rdb.Eval(ctx, luaScript, []string{idempotencyKey}, "PENDING", 60).Result()

    if err != nil && err != redis.Nil {
        http.Error(w, "Internal Server Error", http.StatusInternalServerError)
        return
    }

    // CASE A: Request is already running (Thundering Herd Protection)
    // If the user double-clicked, we stop the second request here.
    if val == "PENDING" {
        http.Error(w, "Request is processing, please retry shortly", http.StatusConflict)
        return
    }

    // CASE B: Request was already completed
    if val != nil {
        // We found a cached response! Return it and skip the DB entirely.
        w.Header().Set("Content-Type", "application/json")
        w.Write([]byte(fmt.Sprintf("%v", val)))
        return
    }

    // CASE C: New Request (val is nil)
    // We have acquired the lock. logic proceeds...

    orderID := uuid.New()
    order := Order{ID: orderID, UserID: uuid.New(), Amount: 1000}

    // --- CALL THE ATOMIC TRANSACTION WE WROTE EARLIER ---
    if err := CreateOrder(ctx, db, order); err != nil {
        // CRITICAL: If the DB transaction fails, we MUST delete the key.
        // If we don't, the key stays "PENDING" until it expires (60s), 
        // locking the user out.
        rdb.Del(ctx, idempotencyKey)
        http.Error(w, "Transaction failed", http.StatusInternalServerError)
        return
    }

    // 3. Update Redis with the Final Result
    // The transaction succeeded. We overwrite "PENDING" with the actual success JSON.
    response := fmt.Sprintf(`{"status": "success", "order_id": "%s"}`, orderID)

    // Cache this result for 24 hours so retries tomorrow still get the receipt.
    rdb.Set(ctx, idempotencyKey, response, 24*time.Hour) 

    w.Header().Set("Content-Type", "application/json")
    w.Write([]byte(response))
}
Enter fullscreen mode Exit fullscreen mode

Now we have a fortress.

  • The Database is protected by SQL Transactions (Atomicity).
  • The API is protected by Redis Lua Scripts (Idempotency).

But what about those pending events in the outbox? The emails haven’t been sent.
The final piece of the puzzle is The Background Worker, the process that polls the outbox table and actually delivers the messages to the world.

Let us write the worker.

The Background Worker

Our CreateOrder function has finished. The user is happy, the order is saved, and the event is sitting safely in the outbox table in Postgres.

But the warehouse doesn't know about it yet.

We need a background process, a "Courier", that constantly checks the mailbox and delivers the messages to Redis.

Scaling the Worker

If we only run one instance of your backend, a simple SELECT * FROM outbox WHERE status = 'PENDING' works fine.

But in production, you likely run 5 or 10 replicas of your service. If they all query the database at the same time, they will all grab the same events and try to publish them simultaneously. We’ll spam Redis with duplicate messages.

For this, we will be using : FOR UPDATE SKIP LOCKED.
This clause tells Postgres : "Lock these rows for me so no one else can touch them. If a row is already locked by another worker, just skip it and give me the next one.

Let us implement the worker in Go now :

func StartOutboxWorker(ctx context.Context, db *sql.DB, rdb *redis.Client) {
    ticker := time.NewTicker(500 * time.Millisecond)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            processBatch(ctx, db, rdb)
        }
    }
}

func processBatch(ctx context.Context, db *sql.DB, rdb *redis.Client) {
    // 1. Start a transaction for the worker
    tx, err := db.BeginTx(ctx, nil)
    if err != nil {
        log.Printf("Worker failed to begin tx: %v", err)
        return
    }
    defer tx.Rollback()

    // 2. Fetch pending events with SKIP LOCKED
    // This allows multiple workers to run in parallel without conflict.
    rows, err := tx.QueryContext(ctx, `
        SELECT id, event_type, payload 
        FROM outbox 
        WHERE status = 'PENDING' 
        ORDER BY created_at ASC 
        LIMIT 10 
        FOR UPDATE SKIP LOCKED
    `)
    if err != nil {
        log.Printf("Worker query failed: %v", err)
        return
    }
    defer rows.Close()

    for rows.Next() {
        var id uuid.UUID
        var eventType string
        var payload []byte

        if err := rows.Scan(&id, &eventType, &payload); err != nil {
            continue
        }

        // 3. Publish to Redis
        // We use the event_type as the channel name (e.g., "order.created")
        err := rdb.Publish(ctx, eventType, payload).Err()
        if err != nil {
            // If Redis is down, we just abort. 
            // The transaction rolls back, and we retry next tick.
            log.Printf("Failed to publish event %s: %v", id, err)
            return 
        }

        // 4. Mark as Processed (or Delete)
        // Since we published successfully, we update the status.
        _, err = tx.ExecContext(ctx, 
            "UPDATE outbox SET status = 'PROCESSED' WHERE id = $1", id)
        if err != nil {
            return
        }
    }

    // 5. Commit the batch
    if err := tx.Commit(); err != nil {
        log.Printf("Worker failed to commit: %v", err)
    }
}
Enter fullscreen mode Exit fullscreen mode

Why this is robust:

  1. At-Least-Once Delivery: If Redis fails (network blip), the rdb.Publish returns an error. We return immediately, the SQL transaction rolls back, and the event stays "PENDING". We will try again in 500ms.
  2. Concurrency Safe: Thanks to SKIP LOCKED, you can scale this worker to 100 instances, and they will never process the same event twice.

You can check out the redis consumer in the codebase.

Conclusion

And actually that is it.

By combining the Transactional Outbox Pattern with Idempotency, we’ve brought that simplicity of a monolithic function back.

We no longer have to worry about ghost orders or lost emails. If our database commits, our event is guaranteed to publish. If our server crashes, our Redis lock ensures we pick up right where we left off.

Reliability isn't about hoping things don't break; it's about building systems that handle the breakages gracefully.

Top comments (0)