Distributed Locks in Side Projects: 4 Simpler Approaches

#career #tutorials #distributedsystems #concurrency

Introduction: The Need for Distributed Locks in Side Projects

When working on my small projects or side projects, I often stick to a single server or even a single process. But sometimes, things get a bit more complex. For instance, when I want to update a configuration file accessed from multiple places simultaneously, or ensure that multiple workers process a task atomically at the same time, I might need distributed lock mechanisms. Such scenarios become inevitable, especially as the project grows in size or the number of users increases.

Normally, Redis's distributed lock libraries or more complex solutions like ZooKeeper come to mind. However, for those like me seeking practical solutions, resorting to such heavy systems isn't always logical. Especially in situations where resources are limited or the project hasn't yet become that complex, simpler and lighter approaches both facilitate development and reduce infrastructure costs. In this post, I will explain four relatively simpler but effective distributed lock approaches I've used in my side projects, and why I chose these methods.

1. Ensuring Safety with PostgreSQL's Advisory Locks

PostgreSQL's pg_advisory_lock and pg_try_advisory_lock functions offer a surprisingly effective solution for distributed locks. These functions create session-based locks at the database level. The locks are automatically released when the session ends, which reduces the risk of being locked indefinitely. Especially for projects already using PostgreSQL, leveraging this feature without creating an additional dependency is a significant advantage.

I once used this approach while developing an enterprise resource planning (ERP) system. In the inventory management module, it was necessary to prevent multiple operators from updating the same product simultaneously. We added a pg_try_advisory_lock call to each operator screen. If the lock was successfully acquired, the operation proceeded. If another operator tried to acquire the lock at the same time, pg_try_advisory_lock would return immediately, and the operator would see a message like "This product is currently being processed by another user." This was very effective in preventing data inconsistency.

-- To acquire a lock (automatically released)
SELECT pg_advisory_lock(123456); -- 123456 is a unique lock ID

-- To try acquiring a lock (returns immediately if unsuccessful)
SELECT pg_try_advisory_lock(123456);

-- To release a lock (usually not necessary as it's released when the session closes)
SELECT pg_advisory_unlock(123456);

The biggest advantage of this method is that it leverages PostgreSQL's inherent reliability. Also, we can easily generate lock IDs; for example, using a combination of a table ID and the relevant record ID. This makes managing locks easier and ensures that the correct resource is locked. However, this solution won't work in an environment without PostgreSQL.

ℹ️ Advantages of PostgreSQL Advisory Locks

Requires no additional dependencies (if PostgreSQL is already present).

Locks are automatically released when the session ends.

Managed with simple SQL commands.

Ensures database-level consistency.

When using this method, it's crucial that lock IDs are unique and meaningful. If IDs are chosen randomly, you might accidentally try to acquire the same lock for different resources. Typically, unique IDs can be generated using a formula like table_id * 1000000 + record_id. This preserves the uniqueness of the table and record, ensuring the locks are also unique.

2. File Locking Mechanisms: A Simple Yet Effective Alternative

On Linux-based systems, file locking mechanisms provide a very basic and effective method for distributed locks. You can create a lock on a file using the flock command or programmatically via the fcntl system call. These locks are valid as long as the file is open and are automatically released when the file is closed. This is great for protecting specific resources, especially configuration files.

Once, while developing the backend for my own blog, I was using a queue system to process files uploaded by users. Multiple worker processes were running simultaneously, and each worker needed to pick up and process a task from the queue. To ensure that a task was picked up by only one worker, each worker would try to acquire a lock on a fixed "lock.file" using flock before starting its work. If the lock was successfully acquired, it would pick up the task from the queue and proceed with processing. If the lock could not be acquired, it would understand that another worker had taken the task and would move on to another task.

# Using flock in a script
exec 9>/path/to/your/lock.file
if flock -n 9; then
  echo "Lock acquired, proceeding with operation..."
  # Do your actual work here
  sleep 5
  echo "Operation completed."
  # Lock will be released automatically (as the file opened with exec will close)
else
  echo "Lock could not be acquired, another process is running."
fi

The biggest advantage of this method is that it doesn't require any additional services or databases. Only the file system is sufficient. This is a great option for minimalist projects. However, managing file locks requires a bit more care. If a worker crashes unexpectedly and leaves the file open, the lock might persist indefinitely. To prevent such situations, using the -w (timeout) option of the flock command or programmatically managing locks with fcntl can be beneficial.

⚠️ Risks of File Locking

Persistent lock issues can occur if a worker crashes.

Problems can arise if the disk containing the lock files is full or inaccessible.

Can create performance bottlenecks for a high number of operations.

Furthermore, the scalability of this method is limited. If your project is to be distributed across multiple different servers, this simple file locking method will not be sufficient. However, it is still a valid and simple solution for providing synchronization between multiple processes on a single server.

3. Simple Lock Mechanism with Redis SETNX (SET if Not Exists)

Redis, as an in-memory data structure store, is a popular tool for creating lock mechanisms in distributed systems. The SETNX command checks if a key exists, and if not, sets the key to your specified value. The atomic nature of this command makes it ideal for simple distributed locks. To acquire a lock, you use a unique key name and attempt to set a value (typically a process ID or timestamp) with SETNX. If the command returns 1, you have successfully acquired the lock.

While developing the backend for a mobile application, I used this method to prevent users from winning a unique gift under the same campaign simultaneously. A unique Redis key was created for each campaign (e.g., campaign:123:lock). When a user requested a gift, the backend first executed the SETNX campaign:123:lock user_id_timestamp command. If SETNX was successful, the user was given the gift, and the lock was released. If it failed, it was understood that another user had taken the gift, and the user was shown a message like "All gifts have been claimed."

import redis

r = redis.Redis(host='localhost', port=6379, db=0)

lock_key = "my_resource_lock"
lock_value = "process_id_12345" # Must be a unique value
timeout_seconds = 30 # Time for the lock to automatically expire

# Try to acquire the lock
if r.set(lock_key, lock_value, nx=True, ex=timeout_seconds):
    print("Lock acquired, proceeding with operation...")
    try:
        # Do your actual work here
        pass
    finally:
        # Release the lock
        # Caution: We should only release our own lock
        if r.get(lock_key) == lock_value.encode():
            r.delete(lock_key)
        print("Lock released.")
else:
    print("Lock could not be acquired, another process is running.")

The biggest advantage of this approach is Redis's speed and the reliability of the SETNX command, which provides a robust locking mechanism. By using the EX (expire) parameter to automatically release the lock after a certain period, you can largely solve the problem of persistent locks. However, the EX parameter is available in Redis 2.6.12 and later versions, so it's important to check your Redis version.

💡 Lock Management with Redis SETNX

Reliable due to atomic operations.

Can set automatic lock expiration with the `EX` parameter.

Easy integration with a simple key-value structure.

High performance.

An important point to note with this method is to ensure that when releasing the lock (the delete operation), you only release the lock you acquired. Otherwise, if another process acquired the lock before you released yours, you might accidentally delete its lock along with yours. Therefore, the correct approach is to check the lock's value with GET and only DELETE if it matches your own value.

4. Lock Management with a Simple Database Table

One of the most basic solutions is to create a dedicated lock table in your database. This table might contain fields such as the lock name, the lock owner (e.g., process ID or server name), and a timestamp. To acquire a lock, you attempt to insert a record into this table with a unique lock name. If the insertion is successful, you have acquired the lock. To release locks, you simply delete the corresponding record from the table.

Once, while developing a very simple message queue application, I used this method. Each worker would check if a record named message_queue_lock existed in a table called locks before adding a new message to the queue. If no record existed, it would insert a new one. If a record existed, it understood that another worker was adding a message.

-- Example lock table
CREATE TABLE locks (
    lock_name VARCHAR(255) PRIMARY KEY,
    owner_id VARCHAR(255) NOT NULL,
    acquired_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- To acquire a lock (with INSERT IGNORE or ON CONFLICT)
-- PostgreSQL example
INSERT INTO locks (lock_name, owner_id)
VALUES ('message_queue_lock', 'worker_id_abc')
ON CONFLICT (lock_name) DO NOTHING;

-- If INSERT IGNORE (MySQL) or ON CONFLICT DO NOTHING (PostgreSQL)
-- affects 0 rows, the lock was not acquired. If 1 row is affected, the lock was acquired.

-- To release a lock
DELETE FROM locks WHERE lock_name = 'message_queue_lock' AND owner_id = 'worker_id_abc';

The biggest advantage of this approach is that it uses your existing database infrastructure. You don't need to set up or manage an extra service. However, you need to be careful about automatically releasing locks. If a worker crashes and doesn't delete the lock, it can leave a persistent record in the lock table. To prevent this, you can add an "expiration timestamp" field to the lock table and clean up old locks with a periodically running garbage collection process.

🔥 Risks of Database Lock Table

Crashed workers may not clean up locks, requiring manual intervention.

High risk of persistent locks as locks do not automatically expire.

Can create performance pressure on the database with high transaction volumes.

Another disadvantage of this method is that if the locking mechanism becomes complex (e.g., extending lock duration, transferring lock ownership), managing the database table can become complicated. Therefore, this method is generally suitable for very simple scenarios and situations where setting up an additional service is not practical. It can be a pragmatic solution, especially for a few processes running on a single server.

Conclusion: The Power of Pragmatic Solutions

When I need distributed lock mechanisms in my side projects, I don't always opt for the most complex or popular solution. The approaches I mentioned above, such as PostgreSQL advisory locks, file locking, Redis SETNX, and a simple database table, can be quite effective depending on the project's requirements and existing infrastructure. The important thing is to understand the trade-offs and choose the solution that best fits your project's needs.

These simple methods, especially in situations where resources are limited or the project is in its early stages, provide both development ease and infrastructural efficiency. Before moving to complex solutions, it's always beneficial to evaluate whether these lighter approaches can get the job done. Remember, the best solution is not always the most complex one; sometimes, the simplest is the most pragmatic.

In future posts, we can explore more detailed use cases and performance comparisons of these approaches.