Distributed Lock Alternatives: My Pragmatic System Design Experiences

#dagitikkilit #sistemtasarimi #veritutarliligi #redis

Ensuring data consistency in distributed systems has always been a headache. When multiple services try to access the same resource simultaneously, it can lead to conflicts and inconsistent data. This is where distributed locks come in; however, choosing the right locking mechanism often goes beyond a technical preference, becoming a pragmatic decision that varies based on the application's workload, fault tolerance, and even budget.

In my twenty years of experience, I've wrestled with distributed locks in many different scenarios, from a simple UPDATE query to complex stock movements in a production ERP. Here, I'll share these different alternatives, my experiences with them, what I chose in which situations, and why.

Introduction: Why Do We Need Distributed Locks?

In distributed systems, we need distributed locks to prevent multiple processes or services from simultaneously accessing a shared resource (a file, a database record, inventory information). These locks provide a singular control mechanism over the resource, preventing data corruption or unexpected situations. For example, when withdrawing money from a user's balance, we need to prevent two different transactions from simultaneously debiting the balance.

I first encountered this while updating order statuses on an e-commerce site. When both payment confirmation and shipping preparation for the same order were triggered simultaneously, the order status was updated multiple times, leading to inconsistency. To solve this, I had to resort to a simple database locking mechanism.

Database Locks: A Reliable But Costly Option

Databases are a natural candidate for distributed locks. Thanks to the atomic nature of transactions and built-in locking mechanisms, ensuring data consistency is relatively easy. Especially in powerful databases like PostgreSQL, both row-level locks (SELECT FOR UPDATE) and advisory locks (pg_advisory_lock) can be used.

In a production ERP, using SELECT FOR UPDATE was indispensable for me when processing stock movements. When updating a product's stock quantity, I needed to prevent another process from simultaneously reading that stock and making an incorrect decision. While this could lead to performance bottlenecks, especially in high-volume transactions, it was a cost worth paying when data consistency was critical.

BEGIN;
SELECT stock_quantity FROM products WHERE product_id = 123 FOR UPDATE;
-- stock_quantity'yi oku ve yeni değeri hesapla
UPDATE products SET stock_quantity = new_quantity WHERE product_id = 123;
COMMIT;

ℹ️ Points to Consider

SELECT FOR UPDATE usage significantly impacts database performance as the number of locked rows increases and transaction duration lengthens. Especially long-running transactions can block other queries, increasing the overall system response time. Therefore, keeping transaction durations as short as possible and using correct indexing is critical.

pg_advisory_lock, on the other hand, provides a lighter lock that can be managed at the application level. In one of my side projects, I used pg_advisory_lock to ensure that a specific background task was run by only one instance. Since these locks are not tied to database rows, they reduce the risk of deadlocks and offer more flexible usage. However, ensuring that locks are released correctly is entirely the developer's responsibility.

Redis Locks: Speed and Considerations

Redis, thanks to its in-memory structure, offers a very fast lock, and implementing distributed locks with the SETNX (SET if Not eXists) command is quite common. To acquire a lock, you write a specific key to Redis with a certain duration (TTL - Time To Live). If the key already exists, the lock cannot be acquired.

In a task management application I developed, I used Redis locks to prevent users from triggering the same task multiple times. When a user clicked the start button for a task, I would acquire a Redis lock with the task's ID and release it when the operation was complete. This prevented the same task from running multiple times in the background.

import redis
import uuid

r = redis.Redis(host='localhost', port=6379, db=0)

def acquire_lock(lock_name, acquire_timeout=10, lock_timeout=10):
    identifier = str(uuid.uuid4())
    end = time.time() + acquire_timeout
    while time.time() < end:
        if r.set(lock_name, identifier, ex=lock_timeout, nx=True):
            return identifier
        time.sleep(0.001)
    return False

def release_lock(lock_name, identifier):
    pipe = r.pipeline(True)
    while True:
        try:
            pipe.watch(lock_name)
            if pipe.get(lock_name).decode('utf-8') == identifier:
                pipe.multi()
                pipe.delete(lock_name)
                pipe.execute()
                return True
            pipe.unwatch()
            break
        except redis.exceptions.WatchError:
            pass
    return False

However, Redis locks have some weaknesses. While the Redlock algorithm was proposed to address these weaknesses, issues can arise in network partition scenarios or when a Redis instance crashes and restarts. Last year, I observed some locks being released prematurely when my Redis instance on my own VPS was OOM-killed. Therefore, it's necessary to set the eviction policy to noeviction and pay attention to Redis's memory limits.

Simple File Locks and Other Local Solutions

For applications running on a single server without distributed systems, file locks or simple operating system tools like mkdir might suffice. The flock command provides singular access to a file, preventing multiple processes from writing to the same file simultaneously.

Once upon a time, I had a script that ran as a cron job and generated specific reports. To prevent this script from running twice simultaneously, I used a simple file lock. When the script started, it would try to create a file named /tmp/rapor_uret.lock; if the file already existed, it would exit. Such a simple solution was perfectly adequate for that scenario.

#!/bin/bash
LOCK_FILE="/tmp/my_script.lock"

# Kilidi almaya çalış
if ( set -o noclobber; echo "$$" > "$LOCK_FILE") 2> /dev/null; then
    trap 'rm -f "$LOCK_FILE"; exit $?' INT TERM EXIT
    echo "Script çalışıyor, kilit alındı."
    # Gerçek script mantığı buraya gelir
    sleep 30
    echo "Script bitti."
    rm -f "$LOCK_FILE"
else
    echo "Script zaten çalışıyor. Çıkılıyor."
    exit 1
fi

⚠️ Limitations and Risks

Naturally, these types of local locks do not work in distributed systems and are limited to a single server. Furthermore, if