DEV Community

Young Gao
Young Gao

Posted on

Distributed Locking: Preventing Race Conditions Across Microservices (2026 Guide)

Every backend engineer eventually faces the moment: two requests hit your system at the exact same millisecond, and suddenly a customer is charged twice, a seat is double-booked, or inventory goes negative. In a monolith, you might get away with a database transaction or an in-process mutex. In a distributed system with multiple service instances, you need something stronger. You need distributed locks.

This article walks through the practical patterns for distributed locking -- from the simple Redis SET NX to PostgreSQL advisory locks to optimistic concurrency control -- with production-ready code in TypeScript and Go. We'll cover when each pattern shines, where it breaks, and how to choose.

Why Distributed Locks Exist

Consider a payment service running three replicas behind a load balancer. A user clicks "Pay" and a network hiccup causes a retry. Two replicas each receive the request within microseconds. Both read the order status as "pending," both charge the card, both mark it "paid." The customer sees two charges.

Or picture an event ticketing system. Two users try to book the last seat simultaneously. Both services check availability, both see one seat remaining, both confirm the booking. You've now sold a seat that doesn't exist.

These aren't theoretical edge cases. They happen in production at scale, and they cost real money. The core problem: multiple processes need to coordinate access to a shared resource without a shared memory space.

A distributed lock provides mutual exclusion across process boundaries. Only one holder can acquire the lock at a time. Everyone else either waits or fails fast.

Redis-Based Locks: The SET NX EX Pattern

The simplest and most widely deployed distributed lock uses a single Redis instance with the atomic SET command:

import Redis from "ioredis";

const redis = new Redis(process.env.REDIS_URL);

interface LockResult {
  acquired: boolean;
  release: () => Promise<boolean>;
}

async function acquireLock(
  key: string,
  ttlMs: number,
  ownerId: string
): Promise<LockResult> {
  const lockKey = `lock:${key}`;

  // SET NX ensures atomicity -- only one caller wins
  const result = await redis.set(lockKey, ownerId, "PX", ttlMs, "NX");

  if (result !== "OK") {
    return { acquired: false, release: async () => false };
  }

  // Release via Lua script to ensure only the owner can unlock
  const release = async (): Promise<boolean> => {
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;
    const removed = await redis.eval(script, 1, lockKey, ownerId);
    return removed === 1;
  };

  return { acquired: true, release };
}
Enter fullscreen mode Exit fullscreen mode

Usage in a payment handler:

import { randomUUID } from "crypto";

async function processPayment(orderId: string, amount: number) {
  const ownerId = randomUUID();
  const lock = await acquireLock(`payment:${orderId}`, 10_000, ownerId);

  if (!lock.acquired) {
    throw new Error("Payment already being processed");
  }

  try {
    const order = await db.orders.findById(orderId);
    if (order.status !== "pending") {
      return { status: "already_processed" };
    }

    const charge = await paymentGateway.charge(order.customerId, amount);
    await db.orders.update(orderId, {
      status: "paid",
      chargeId: charge.id,
    });

    return { status: "success", chargeId: charge.id };
  } finally {
    await lock.release();
  }
}
Enter fullscreen mode Exit fullscreen mode

The critical details:

  • NX (Not eXists) makes the SET atomic. Only one caller wins the race.
  • PX (expiry in milliseconds) prevents deadlocks if the holder crashes.
  • The Lua release script checks ownership before deleting. Without this, a slow process A could have its lock expire, process B acquires it, then process A finishes and deletes B's lock.
  • Owner ID must be unique per acquisition, not per process. A UUID works.

The same pattern in Go:

package distlock

import (
    "context"
    "crypto/rand"
    "encoding/hex"
    "time"

    "github.com/redis/go-redis/v9"
)

type Lock struct {
    client  *redis.Client
    key     string
    ownerID string
}

func Acquire(ctx context.Context, client *redis.Client, key string, ttl time.Duration) (*Lock, error) {
    ownerID := generateOwnerID()
    lockKey := "lock:" + key

    ok, err := client.SetNX(ctx, lockKey, ownerID, ttl).Result()
    if err != nil {
        return nil, err
    }
    if !ok {
        return nil, ErrLockNotAcquired
    }

    return &Lock{client: client, key: lockKey, ownerID: ownerID}, nil
}

var releaseScript = redis.NewScript(`
    if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
    else
        return 0
    end
`)

func (l *Lock) Release(ctx context.Context) error {
    result, err := releaseScript.Run(ctx, l.client, []string{l.key}, l.ownerID).Int64()
    if err != nil {
        return err
    }
    if result == 0 {
        return ErrLockAlreadyReleased
    }
    return nil
}

func generateOwnerID() string {
    b := make([]byte, 16)
    rand.Read(b)
    return hex.EncodeToString(b)
}
Enter fullscreen mode Exit fullscreen mode

Limitation: Single Point of Failure

A single Redis instance means a single point of failure. If Redis goes down, all locks are lost. If Redis fails over to a replica, locks that haven't replicated yet are also lost -- meaning two processes could hold the "same" lock simultaneously.

For many workloads this is acceptable. If your lock protects an idempotent operation with a database uniqueness constraint as a safety net, a rare double-acquisition during failover is tolerable. For financial transactions, you need something stronger.

The Redlock Algorithm

Martin Kleppmann and Salvatore Sanfilippo had a famous debate about this, but the Redlock algorithm remains a practical option when you need stronger guarantees than a single Redis node provides.

Redlock uses N independent Redis instances (typically 5) and requires a majority quorum:

import Redis from "ioredis";
import { randomUUID } from "crypto";

class Redlock {
  private nodes: Redis[];
  private quorum: number;

  constructor(urls: string[]) {
    this.nodes = urls.map((url) => new Redis(url));
    this.quorum = Math.floor(this.nodes.length / 2) + 1;
  }

  async acquire(
    resource: string,
    ttlMs: number
  ): Promise<{ acquired: boolean; release: () => Promise<void> }> {
    const ownerId = randomUUID();
    const lockKey = `lock:${resource}`;
    const startTime = Date.now();

    // Try to acquire on all nodes in parallel
    const results = await Promise.allSettled(
      this.nodes.map((node) =>
        node.set(lockKey, ownerId, "PX", ttlMs, "NX")
      )
    );

    const acquired = results.filter(
      (r) => r.status === "fulfilled" && r.value === "OK"
    ).length;

    const elapsed = Date.now() - startTime;
    const remainingTtl = ttlMs - elapsed;

    // Need majority AND enough remaining TTL to do useful work
    if (acquired >= this.quorum && remainingTtl > ttlMs * 0.1) {
      return {
        acquired: true,
        release: () => this.releaseAll(lockKey, ownerId),
      };
    }

    // Failed -- release any locks we did acquire
    await this.releaseAll(lockKey, ownerId);
    return { acquired: false, release: async () => {} };
  }

  private async releaseAll(key: string, ownerId: string): Promise<void> {
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;
    await Promise.allSettled(
      this.nodes.map((node) => node.eval(script, 1, key, ownerId))
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

Key properties of Redlock:

  1. Quorum-based: tolerates minority node failures.
  2. Clock-dependent: the validity window shrinks by the time spent acquiring. If acquisition takes too long, the lock is considered invalid.
  3. No fencing: a slow lock holder can still cause issues after TTL expires.

In practice, most teams use the battle-tested redlock npm package or Go's redsync rather than rolling their own. The algorithm has subtle timing requirements that are easy to get wrong.

Lock Renewal with Heartbeats

A fixed TTL creates a tension: too short and the lock expires before work completes; too long and a crashed holder blocks everyone. The solution is automatic renewal -- a background heartbeat that extends the lock while the holder is alive:

type RenewableLock struct {
    *Lock
    cancel context.CancelFunc
    done   chan struct{}
}

func AcquireWithRenewal(
    ctx context.Context,
    client *redis.Client,
    key string,
    ttl time.Duration,
) (*RenewableLock, error) {
    lock, err := Acquire(ctx, client, key, ttl)
    if err != nil {
        return nil, err
    }

    renewCtx, cancel := context.WithCancel(ctx)
    done := make(chan struct{})

    go func() {
        defer close(done)
        ticker := time.NewTicker(ttl / 3) // Renew at 1/3 of TTL
        defer ticker.Stop()

        for {
            select {
            case <-renewCtx.Done():
                return
            case <-ticker.C:
                err := renew(renewCtx, client, lock.key, lock.ownerID, ttl)
                if err != nil {
                    // Lost the lock -- log and exit
                    return
                }
            }
        }
    }()

    return &RenewableLock{Lock: lock, cancel: cancel, done: done}, nil
}

var renewScript = redis.NewScript(`
    if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("pexpire", KEYS[1], ARGV[2])
    else
        return 0
    end
`)

func renew(ctx context.Context, client *redis.Client, key, ownerID string, ttl time.Duration) error {
    result, err := renewScript.Run(ctx, client, []string{key}, ownerID, ttl.Milliseconds()).Int64()
    if err != nil {
        return err
    }
    if result == 0 {
        return ErrLockAlreadyReleased
    }
    return nil
}

func (rl *RenewableLock) Release(ctx context.Context) error {
    rl.cancel()  // Stop the renewal goroutine
    <-rl.done    // Wait for it to finish
    return rl.Lock.Release(ctx)
}
Enter fullscreen mode Exit fullscreen mode

The renewal interval of TTL/3 gives two retry windows before expiry. If the process hangs or the network partitions, the heartbeat stops, and the lock expires naturally.

This is exactly how Redisson (Java) and many production lock libraries work internally.

PostgreSQL Advisory Locks

If your system already depends on PostgreSQL and your lock scope aligns with database operations, advisory locks eliminate the need for external infrastructure:

import { Pool } from "pg";

const pool = new Pool({ connectionString: process.env.DATABASE_URL });

// Session-level advisory lock -- held until explicitly released or session ends
async function withAdvisoryLock<T>(
  lockId: number,
  fn: () => Promise<T>
): Promise<T> {
  const client = await pool.connect();

  try {
    // pg_try_advisory_lock returns true/false, never blocks
    const { rows } = await client.query(
      "SELECT pg_try_advisory_lock($1) AS acquired",
      [lockId]
    );

    if (!rows[0].acquired) {
      throw new Error(`Could not acquire advisory lock ${lockId}`);
    }

    try {
      return await fn();
    } finally {
      await client.query("SELECT pg_advisory_unlock($1)", [lockId]);
    }
  } finally {
    client.release();
  }
}

// Usage: derive a stable lock ID from business identifiers
function lockIdFromOrderId(orderId: string): number {
  let hash = 0;
  for (const char of orderId) {
    hash = (hash * 31 + char.charCodeAt(0)) | 0;
  }
  return Math.abs(hash);
}

// Process a refund with advisory lock protection
await withAdvisoryLock(lockIdFromOrderId("order-789"), async () => {
  const order = await db.orders.findById("order-789");
  if (order.status === "refunded") return;
  await paymentGateway.refund(order.chargeId);
  await db.orders.update("order-789", { status: "refunded" });
});
Enter fullscreen mode Exit fullscreen mode

PostgreSQL also supports transaction-level advisory locks that auto-release on commit/rollback:

-- Inside a transaction: lock is released when transaction ends
SELECT pg_advisory_xact_lock(hashtext('booking:seat-42'));

-- Do your work within the same transaction
UPDATE seats SET booked_by = 'user-123' WHERE id = 'seat-42' AND booked_by IS NULL;
Enter fullscreen mode Exit fullscreen mode

Advisory lock advantages:

  • No extra infrastructure. If you already have Postgres, you have advisory locks.
  • ACID-aligned. Transaction-level locks integrate naturally with your data mutations.
  • Deadlock detection. Postgres detects and breaks deadlock cycles automatically.

Advisory lock limitations:

  • Tied to a single Postgres instance (or primary in a replica set). No cross-database locking.
  • Lock IDs are 64-bit integers. You need a consistent hashing scheme for string-based resource identifiers.
  • Connection pool interactions. If using session-level locks, a released connection that returns to the pool still holds the lock until explicitly unlocked.

Optimistic Locking with Version Columns

Sometimes you don't need mutual exclusion at all. You just need to detect conflicts. Optimistic locking assumes conflicts are rare and checks at write time:

// Schema: orders table has a `version` integer column, default 1

async function updateOrderStatus(
  orderId: string,
  newStatus: string,
  expectedVersion: number
): Promise<boolean> {
  const result = await db.query(
    `UPDATE orders
     SET status = $1, version = version + 1, updated_at = now()
     WHERE id = $2 AND version = $3`,
    [newStatus, orderId, expectedVersion]
  );

  // If no rows affected, someone else modified the order
  return result.rowCount === 1;
}

// Usage with retry
async function processOrderWithRetry(
  orderId: string,
  maxRetries = 3
) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const order = await db.orders.findById(orderId);

    if (order.status !== "pending") {
      return { status: "already_processed" };
    }

    const updated = await updateOrderStatus(
      orderId,
      "processing",
      order.version
    );

    if (updated) {
      // We won the race -- proceed with business logic
      await fulfillOrder(order);
      return { status: "success" };
    }

    // Conflict detected -- re-read and retry
    await sleep(Math.pow(2, attempt) * 10); // exponential backoff
  }

  throw new Error("Max retries exceeded due to concurrent modifications");
}
Enter fullscreen mode Exit fullscreen mode

The Go equivalent using database/sql:

func updateOrderStatus(ctx context.Context, db *sql.DB, orderID, newStatus string, expectedVersion int) (bool, error) {
    result, err := db.ExecContext(ctx,
        `UPDATE orders SET status = $1, version = version + 1, updated_at = now()
         WHERE id = $2 AND version = $3`,
        newStatus, orderID, expectedVersion,
    )
    if err != nil {
        return false, err
    }

    rows, err := result.RowsAffected()
    return rows == 1, err
}
Enter fullscreen mode Exit fullscreen mode

Optimistic locking works best when:

  • Conflicts are genuinely rare (most requests touch different resources).
  • The operation is safe to retry.
  • You want maximum throughput -- no lock acquisition overhead in the happy path.

It works poorly when contention is high, because retries cascade and amplify load.

Compare-and-Swap (CAS) Patterns

CAS generalizes optimistic locking beyond databases. Any system that supports atomic conditional writes can implement CAS. Here is a CAS pattern using DynamoDB's conditional expressions:

import { DynamoDBClient, UpdateItemCommand } from "@aws-sdk/client-dynamodb";

const ddb = new DynamoDBClient({});

async function claimTask(taskId: string, workerId: string): Promise<boolean> {
  try {
    await ddb.send(
      new UpdateItemCommand({
        TableName: "tasks",
        Key: { id: { S: taskId } },
        UpdateExpression:
          "SET #status = :claimed, worker_id = :worker, claimed_at = :now",
        ConditionExpression: "#status = :unclaimed",
        ExpressionAttributeNames: { "#status": "status" },
        ExpressionAttributeValues: {
          ":claimed": { S: "claimed" },
          ":unclaimed": { S: "unclaimed" },
          ":worker": { S: workerId },
          ":now": { S: new Date().toISOString() },
        },
      })
    );
    return true;
  } catch (err: any) {
    if (err.name === "ConditionalCheckFailedException") {
      return false; // Someone else claimed it
    }
    throw err;
  }
}
Enter fullscreen mode Exit fullscreen mode

CAS is also the foundation of leader election, distributed state machines, and consensus protocols. Etcd, ZooKeeper, and Consul all expose CAS primitives that work across data centers.

Lock Contention and Performance

Locks serialize access. Under contention, they become bottlenecks. Here are the practical strategies for managing this:

1. Reduce lock scope. Lock the narrowest resource possible. Don't lock "payments" when you can lock "payment:order-123."

// Bad: global lock
await acquireLock("payment-processing", 10_000, ownerId);

// Good: per-resource lock
await acquireLock(`payment:${orderId}`, 10_000, ownerId);
Enter fullscreen mode Exit fullscreen mode

2. Reduce lock duration. Do I/O and computation outside the lock. Only hold the lock for the critical state transition.

// Validate and prepare outside the lock
const validatedPayment = await validatePaymentDetails(request);
const idempotencyKey = deriveIdempotencyKey(request);

// Lock only for the state mutation
const lock = await acquireLock(`payment:${orderId}`, 5_000, ownerId);
try {
  const order = await db.orders.findById(orderId);
  if (order.status !== "pending") return;
  await db.orders.update(orderId, { status: "paid" });
} finally {
  await lock.release();
}

// Post-processing outside the lock
await sendReceipt(validatedPayment);
Enter fullscreen mode Exit fullscreen mode

3. Use lock-free designs where possible. Idempotency keys, unique constraints, and CAS often eliminate the need for explicit locks entirely.

4. Monitor lock metrics. Track acquisition time, hold time, contention rate, and timeout frequency. A sudden spike in lock wait times usually indicates a design problem, not a need for faster locks.

5. Set aggressive timeouts. A lock acquisition timeout of 1-2 seconds is appropriate for most web requests. If you're waiting longer, the system is degraded and it's better to fail fast.

Choosing the Right Approach

Pattern Best For Infrastructure Consistency
Redis SET NX General-purpose, high throughput Redis Single-node durability
Redlock Cross-node Redis safety 5 Redis nodes Quorum-based
PG Advisory Locks DB-centric workflows PostgreSQL ACID-guaranteed
Optimistic Locking Low-contention reads with rare writes Any RDBMS Per-row CAS
CAS (DynamoDB/etcd) Serverless, multi-region Managed service Service-dependent

Decision framework:

  1. Start with optimistic locking if conflicts are rare and operations are retriable. It's the simplest, requires no extra infrastructure, and scales the best.

  2. Use PostgreSQL advisory locks if your critical section is tied to database operations and you already run Postgres. Transaction-level advisory locks are particularly clean.

  3. Use Redis SET NX for general-purpose distributed coordination where you need sub-millisecond lock acquisition and can tolerate rare double-acquisition during Redis failover.

  4. Use Redlock or etcd/Consul only when you need strong mutual exclusion guarantees across multiple nodes and the cost of double-execution is severe (financial transactions, resource provisioning).

  5. Combine layers. The most robust production systems use a distributed lock for distributed coordination AND a database constraint as a safety net. The lock handles the happy path efficiently; the constraint catches the edge cases.

async function processPaymentSafely(orderId: string, amount: number) {
  // Layer 1: Distributed lock prevents concurrent processing
  const lock = await acquireLock(`payment:${orderId}`, 10_000, randomUUID());
  if (!lock.acquired) throw new Error("Payment in progress");

  try {
    // Layer 2: Database check prevents reprocessing
    const order = await db.orders.findById(orderId);
    if (order.status !== "pending") return { status: "already_processed" };

    const charge = await paymentGateway.charge(order.customerId, amount);

    // Layer 3: Unique constraint on charge_id prevents duplicate records
    await db.orders.update(orderId, {
      status: "paid",
      chargeId: charge.id, // UNIQUE constraint in schema
    });

    return { status: "success" };
  } finally {
    await lock.release();
  }
}
Enter fullscreen mode Exit fullscreen mode

Wrapping Up

Distributed locking is one of those areas where the simple solution works 99% of the time, but the 1% can cost you real money. The SET NX EX pattern with proper owner verification handles most use cases. Advisory locks keep things simple when Postgres is already in the picture. Optimistic locking avoids coordination overhead entirely when contention is low.

The key insight is that locks are a means to an end, not the end itself. What you actually want is correctness under concurrency. Sometimes that means a lock. Sometimes it means an idempotency key, a unique constraint, or a CAS operation. Often it means layering multiple mechanisms so that no single failure mode can cause incorrect behavior.

Start simple, measure contention, and add complexity only when the data demands it.


This is part of the "Production Backend Patterns" series. Next up: circuit breakers, bulkheads, and graceful degradation in distributed systems.


If this was useful, consider:


You Might Also Like

Follow me for more production-ready backend content!


If this helped you, buy me a coffee on Ko-fi!

Top comments (0)