Every backend engineer eventually faces the moment: two requests hit your system at the exact same millisecond, and suddenly a customer is charged twice, a seat is double-booked, or inventory goes negative. In a monolith, you might get away with a database transaction or an in-process mutex. In a distributed system with multiple service instances, you need something stronger. You need distributed locks.
This article walks through the practical patterns for distributed locking -- from the simple Redis SET NX to PostgreSQL advisory locks to optimistic concurrency control -- with production-ready code in TypeScript and Go. We'll cover when each pattern shines, where it breaks, and how to choose.
Why Distributed Locks Exist
Consider a payment service running three replicas behind a load balancer. A user clicks "Pay" and a network hiccup causes a retry. Two replicas each receive the request within microseconds. Both read the order status as "pending," both charge the card, both mark it "paid." The customer sees two charges.
Or picture an event ticketing system. Two users try to book the last seat simultaneously. Both services check availability, both see one seat remaining, both confirm the booking. You've now sold a seat that doesn't exist.
These aren't theoretical edge cases. They happen in production at scale, and they cost real money. The core problem: multiple processes need to coordinate access to a shared resource without a shared memory space.
A distributed lock provides mutual exclusion across process boundaries. Only one holder can acquire the lock at a time. Everyone else either waits or fails fast.
Redis-Based Locks: The SET NX EX Pattern
The simplest and most widely deployed distributed lock uses a single Redis instance with the atomic SET command:
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL);
interface LockResult {
acquired: boolean;
release: () => Promise<boolean>;
}
async function acquireLock(
key: string,
ttlMs: number,
ownerId: string
): Promise<LockResult> {
const lockKey = `lock:${key}`;
// SET NX ensures atomicity -- only one caller wins
const result = await redis.set(lockKey, ownerId, "PX", ttlMs, "NX");
if (result !== "OK") {
return { acquired: false, release: async () => false };
}
// Release via Lua script to ensure only the owner can unlock
const release = async (): Promise<boolean> => {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
const removed = await redis.eval(script, 1, lockKey, ownerId);
return removed === 1;
};
return { acquired: true, release };
}
Usage in a payment handler:
import { randomUUID } from "crypto";
async function processPayment(orderId: string, amount: number) {
const ownerId = randomUUID();
const lock = await acquireLock(`payment:${orderId}`, 10_000, ownerId);
if (!lock.acquired) {
throw new Error("Payment already being processed");
}
try {
const order = await db.orders.findById(orderId);
if (order.status !== "pending") {
return { status: "already_processed" };
}
const charge = await paymentGateway.charge(order.customerId, amount);
await db.orders.update(orderId, {
status: "paid",
chargeId: charge.id,
});
return { status: "success", chargeId: charge.id };
} finally {
await lock.release();
}
}
The critical details:
-
NX(Not eXists) makes the SET atomic. Only one caller wins the race. -
PX(expiry in milliseconds) prevents deadlocks if the holder crashes. - The Lua release script checks ownership before deleting. Without this, a slow process A could have its lock expire, process B acquires it, then process A finishes and deletes B's lock.
- Owner ID must be unique per acquisition, not per process. A UUID works.
The same pattern in Go:
package distlock
import (
"context"
"crypto/rand"
"encoding/hex"
"time"
"github.com/redis/go-redis/v9"
)
type Lock struct {
client *redis.Client
key string
ownerID string
}
func Acquire(ctx context.Context, client *redis.Client, key string, ttl time.Duration) (*Lock, error) {
ownerID := generateOwnerID()
lockKey := "lock:" + key
ok, err := client.SetNX(ctx, lockKey, ownerID, ttl).Result()
if err != nil {
return nil, err
}
if !ok {
return nil, ErrLockNotAcquired
}
return &Lock{client: client, key: lockKey, ownerID: ownerID}, nil
}
var releaseScript = redis.NewScript(`
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`)
func (l *Lock) Release(ctx context.Context) error {
result, err := releaseScript.Run(ctx, l.client, []string{l.key}, l.ownerID).Int64()
if err != nil {
return err
}
if result == 0 {
return ErrLockAlreadyReleased
}
return nil
}
func generateOwnerID() string {
b := make([]byte, 16)
rand.Read(b)
return hex.EncodeToString(b)
}
Limitation: Single Point of Failure
A single Redis instance means a single point of failure. If Redis goes down, all locks are lost. If Redis fails over to a replica, locks that haven't replicated yet are also lost -- meaning two processes could hold the "same" lock simultaneously.
For many workloads this is acceptable. If your lock protects an idempotent operation with a database uniqueness constraint as a safety net, a rare double-acquisition during failover is tolerable. For financial transactions, you need something stronger.
The Redlock Algorithm
Martin Kleppmann and Salvatore Sanfilippo had a famous debate about this, but the Redlock algorithm remains a practical option when you need stronger guarantees than a single Redis node provides.
Redlock uses N independent Redis instances (typically 5) and requires a majority quorum:
import Redis from "ioredis";
import { randomUUID } from "crypto";
class Redlock {
private nodes: Redis[];
private quorum: number;
constructor(urls: string[]) {
this.nodes = urls.map((url) => new Redis(url));
this.quorum = Math.floor(this.nodes.length / 2) + 1;
}
async acquire(
resource: string,
ttlMs: number
): Promise<{ acquired: boolean; release: () => Promise<void> }> {
const ownerId = randomUUID();
const lockKey = `lock:${resource}`;
const startTime = Date.now();
// Try to acquire on all nodes in parallel
const results = await Promise.allSettled(
this.nodes.map((node) =>
node.set(lockKey, ownerId, "PX", ttlMs, "NX")
)
);
const acquired = results.filter(
(r) => r.status === "fulfilled" && r.value === "OK"
).length;
const elapsed = Date.now() - startTime;
const remainingTtl = ttlMs - elapsed;
// Need majority AND enough remaining TTL to do useful work
if (acquired >= this.quorum && remainingTtl > ttlMs * 0.1) {
return {
acquired: true,
release: () => this.releaseAll(lockKey, ownerId),
};
}
// Failed -- release any locks we did acquire
await this.releaseAll(lockKey, ownerId);
return { acquired: false, release: async () => {} };
}
private async releaseAll(key: string, ownerId: string): Promise<void> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await Promise.allSettled(
this.nodes.map((node) => node.eval(script, 1, key, ownerId))
);
}
}
Key properties of Redlock:
- Quorum-based: tolerates minority node failures.
- Clock-dependent: the validity window shrinks by the time spent acquiring. If acquisition takes too long, the lock is considered invalid.
- No fencing: a slow lock holder can still cause issues after TTL expires.
In practice, most teams use the battle-tested redlock npm package or Go's redsync rather than rolling their own. The algorithm has subtle timing requirements that are easy to get wrong.
Lock Renewal with Heartbeats
A fixed TTL creates a tension: too short and the lock expires before work completes; too long and a crashed holder blocks everyone. The solution is automatic renewal -- a background heartbeat that extends the lock while the holder is alive:
type RenewableLock struct {
*Lock
cancel context.CancelFunc
done chan struct{}
}
func AcquireWithRenewal(
ctx context.Context,
client *redis.Client,
key string,
ttl time.Duration,
) (*RenewableLock, error) {
lock, err := Acquire(ctx, client, key, ttl)
if err != nil {
return nil, err
}
renewCtx, cancel := context.WithCancel(ctx)
done := make(chan struct{})
go func() {
defer close(done)
ticker := time.NewTicker(ttl / 3) // Renew at 1/3 of TTL
defer ticker.Stop()
for {
select {
case <-renewCtx.Done():
return
case <-ticker.C:
err := renew(renewCtx, client, lock.key, lock.ownerID, ttl)
if err != nil {
// Lost the lock -- log and exit
return
}
}
}
}()
return &RenewableLock{Lock: lock, cancel: cancel, done: done}, nil
}
var renewScript = redis.NewScript(`
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("pexpire", KEYS[1], ARGV[2])
else
return 0
end
`)
func renew(ctx context.Context, client *redis.Client, key, ownerID string, ttl time.Duration) error {
result, err := renewScript.Run(ctx, client, []string{key}, ownerID, ttl.Milliseconds()).Int64()
if err != nil {
return err
}
if result == 0 {
return ErrLockAlreadyReleased
}
return nil
}
func (rl *RenewableLock) Release(ctx context.Context) error {
rl.cancel() // Stop the renewal goroutine
<-rl.done // Wait for it to finish
return rl.Lock.Release(ctx)
}
The renewal interval of TTL/3 gives two retry windows before expiry. If the process hangs or the network partitions, the heartbeat stops, and the lock expires naturally.
This is exactly how Redisson (Java) and many production lock libraries work internally.
PostgreSQL Advisory Locks
If your system already depends on PostgreSQL and your lock scope aligns with database operations, advisory locks eliminate the need for external infrastructure:
import { Pool } from "pg";
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
// Session-level advisory lock -- held until explicitly released or session ends
async function withAdvisoryLock<T>(
lockId: number,
fn: () => Promise<T>
): Promise<T> {
const client = await pool.connect();
try {
// pg_try_advisory_lock returns true/false, never blocks
const { rows } = await client.query(
"SELECT pg_try_advisory_lock($1) AS acquired",
[lockId]
);
if (!rows[0].acquired) {
throw new Error(`Could not acquire advisory lock ${lockId}`);
}
try {
return await fn();
} finally {
await client.query("SELECT pg_advisory_unlock($1)", [lockId]);
}
} finally {
client.release();
}
}
// Usage: derive a stable lock ID from business identifiers
function lockIdFromOrderId(orderId: string): number {
let hash = 0;
for (const char of orderId) {
hash = (hash * 31 + char.charCodeAt(0)) | 0;
}
return Math.abs(hash);
}
// Process a refund with advisory lock protection
await withAdvisoryLock(lockIdFromOrderId("order-789"), async () => {
const order = await db.orders.findById("order-789");
if (order.status === "refunded") return;
await paymentGateway.refund(order.chargeId);
await db.orders.update("order-789", { status: "refunded" });
});
PostgreSQL also supports transaction-level advisory locks that auto-release on commit/rollback:
-- Inside a transaction: lock is released when transaction ends
SELECT pg_advisory_xact_lock(hashtext('booking:seat-42'));
-- Do your work within the same transaction
UPDATE seats SET booked_by = 'user-123' WHERE id = 'seat-42' AND booked_by IS NULL;
Advisory lock advantages:
- No extra infrastructure. If you already have Postgres, you have advisory locks.
- ACID-aligned. Transaction-level locks integrate naturally with your data mutations.
- Deadlock detection. Postgres detects and breaks deadlock cycles automatically.
Advisory lock limitations:
- Tied to a single Postgres instance (or primary in a replica set). No cross-database locking.
- Lock IDs are 64-bit integers. You need a consistent hashing scheme for string-based resource identifiers.
- Connection pool interactions. If using session-level locks, a released connection that returns to the pool still holds the lock until explicitly unlocked.
Optimistic Locking with Version Columns
Sometimes you don't need mutual exclusion at all. You just need to detect conflicts. Optimistic locking assumes conflicts are rare and checks at write time:
// Schema: orders table has a `version` integer column, default 1
async function updateOrderStatus(
orderId: string,
newStatus: string,
expectedVersion: number
): Promise<boolean> {
const result = await db.query(
`UPDATE orders
SET status = $1, version = version + 1, updated_at = now()
WHERE id = $2 AND version = $3`,
[newStatus, orderId, expectedVersion]
);
// If no rows affected, someone else modified the order
return result.rowCount === 1;
}
// Usage with retry
async function processOrderWithRetry(
orderId: string,
maxRetries = 3
) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const order = await db.orders.findById(orderId);
if (order.status !== "pending") {
return { status: "already_processed" };
}
const updated = await updateOrderStatus(
orderId,
"processing",
order.version
);
if (updated) {
// We won the race -- proceed with business logic
await fulfillOrder(order);
return { status: "success" };
}
// Conflict detected -- re-read and retry
await sleep(Math.pow(2, attempt) * 10); // exponential backoff
}
throw new Error("Max retries exceeded due to concurrent modifications");
}
The Go equivalent using database/sql:
func updateOrderStatus(ctx context.Context, db *sql.DB, orderID, newStatus string, expectedVersion int) (bool, error) {
result, err := db.ExecContext(ctx,
`UPDATE orders SET status = $1, version = version + 1, updated_at = now()
WHERE id = $2 AND version = $3`,
newStatus, orderID, expectedVersion,
)
if err != nil {
return false, err
}
rows, err := result.RowsAffected()
return rows == 1, err
}
Optimistic locking works best when:
- Conflicts are genuinely rare (most requests touch different resources).
- The operation is safe to retry.
- You want maximum throughput -- no lock acquisition overhead in the happy path.
It works poorly when contention is high, because retries cascade and amplify load.
Compare-and-Swap (CAS) Patterns
CAS generalizes optimistic locking beyond databases. Any system that supports atomic conditional writes can implement CAS. Here is a CAS pattern using DynamoDB's conditional expressions:
import { DynamoDBClient, UpdateItemCommand } from "@aws-sdk/client-dynamodb";
const ddb = new DynamoDBClient({});
async function claimTask(taskId: string, workerId: string): Promise<boolean> {
try {
await ddb.send(
new UpdateItemCommand({
TableName: "tasks",
Key: { id: { S: taskId } },
UpdateExpression:
"SET #status = :claimed, worker_id = :worker, claimed_at = :now",
ConditionExpression: "#status = :unclaimed",
ExpressionAttributeNames: { "#status": "status" },
ExpressionAttributeValues: {
":claimed": { S: "claimed" },
":unclaimed": { S: "unclaimed" },
":worker": { S: workerId },
":now": { S: new Date().toISOString() },
},
})
);
return true;
} catch (err: any) {
if (err.name === "ConditionalCheckFailedException") {
return false; // Someone else claimed it
}
throw err;
}
}
CAS is also the foundation of leader election, distributed state machines, and consensus protocols. Etcd, ZooKeeper, and Consul all expose CAS primitives that work across data centers.
Lock Contention and Performance
Locks serialize access. Under contention, they become bottlenecks. Here are the practical strategies for managing this:
1. Reduce lock scope. Lock the narrowest resource possible. Don't lock "payments" when you can lock "payment:order-123."
// Bad: global lock
await acquireLock("payment-processing", 10_000, ownerId);
// Good: per-resource lock
await acquireLock(`payment:${orderId}`, 10_000, ownerId);
2. Reduce lock duration. Do I/O and computation outside the lock. Only hold the lock for the critical state transition.
// Validate and prepare outside the lock
const validatedPayment = await validatePaymentDetails(request);
const idempotencyKey = deriveIdempotencyKey(request);
// Lock only for the state mutation
const lock = await acquireLock(`payment:${orderId}`, 5_000, ownerId);
try {
const order = await db.orders.findById(orderId);
if (order.status !== "pending") return;
await db.orders.update(orderId, { status: "paid" });
} finally {
await lock.release();
}
// Post-processing outside the lock
await sendReceipt(validatedPayment);
3. Use lock-free designs where possible. Idempotency keys, unique constraints, and CAS often eliminate the need for explicit locks entirely.
4. Monitor lock metrics. Track acquisition time, hold time, contention rate, and timeout frequency. A sudden spike in lock wait times usually indicates a design problem, not a need for faster locks.
5. Set aggressive timeouts. A lock acquisition timeout of 1-2 seconds is appropriate for most web requests. If you're waiting longer, the system is degraded and it's better to fail fast.
Choosing the Right Approach
| Pattern | Best For | Infrastructure | Consistency |
|---|---|---|---|
| Redis SET NX | General-purpose, high throughput | Redis | Single-node durability |
| Redlock | Cross-node Redis safety | 5 Redis nodes | Quorum-based |
| PG Advisory Locks | DB-centric workflows | PostgreSQL | ACID-guaranteed |
| Optimistic Locking | Low-contention reads with rare writes | Any RDBMS | Per-row CAS |
| CAS (DynamoDB/etcd) | Serverless, multi-region | Managed service | Service-dependent |
Decision framework:
Start with optimistic locking if conflicts are rare and operations are retriable. It's the simplest, requires no extra infrastructure, and scales the best.
Use PostgreSQL advisory locks if your critical section is tied to database operations and you already run Postgres. Transaction-level advisory locks are particularly clean.
Use Redis SET NX for general-purpose distributed coordination where you need sub-millisecond lock acquisition and can tolerate rare double-acquisition during Redis failover.
Use Redlock or etcd/Consul only when you need strong mutual exclusion guarantees across multiple nodes and the cost of double-execution is severe (financial transactions, resource provisioning).
Combine layers. The most robust production systems use a distributed lock for distributed coordination AND a database constraint as a safety net. The lock handles the happy path efficiently; the constraint catches the edge cases.
async function processPaymentSafely(orderId: string, amount: number) {
// Layer 1: Distributed lock prevents concurrent processing
const lock = await acquireLock(`payment:${orderId}`, 10_000, randomUUID());
if (!lock.acquired) throw new Error("Payment in progress");
try {
// Layer 2: Database check prevents reprocessing
const order = await db.orders.findById(orderId);
if (order.status !== "pending") return { status: "already_processed" };
const charge = await paymentGateway.charge(order.customerId, amount);
// Layer 3: Unique constraint on charge_id prevents duplicate records
await db.orders.update(orderId, {
status: "paid",
chargeId: charge.id, // UNIQUE constraint in schema
});
return { status: "success" };
} finally {
await lock.release();
}
}
Wrapping Up
Distributed locking is one of those areas where the simple solution works 99% of the time, but the 1% can cost you real money. The SET NX EX pattern with proper owner verification handles most use cases. Advisory locks keep things simple when Postgres is already in the picture. Optimistic locking avoids coordination overhead entirely when contention is low.
The key insight is that locks are a means to an end, not the end itself. What you actually want is correctness under concurrency. Sometimes that means a lock. Sometimes it means an idempotency key, a unique constraint, or a CAS operation. Often it means layering multiple mechanisms so that no single failure mode can cause incorrect behavior.
Start simple, measure contention, and add complexity only when the data demands it.
This is part of the "Production Backend Patterns" series. Next up: circuit breakers, bulkheads, and graceful degradation in distributed systems.
If this was useful, consider:
- Sponsoring on GitHub to support more open-source tools
- Buying me a coffee on Ko-fi
You Might Also Like
- Building a High-Performance Cache Layer in Go with Redis (2026 Guide)
- Building a Production Rate Limiter from Scratch in Go (2026 Guide)
- API Rate Limiting with Redis: Token Bucket, Sliding Window, and Per-Client Limits
Follow me for more production-ready backend content!
If this helped you, buy me a coffee on Ko-fi!
Top comments (0)