Leader election with Redlock.net

#netcore #redis #distributed #microservices

Why locking things

Microservice architecture becomes widely adopted these days. One of the benefits it offers is the possibility of horizontal scaling which allows us to increase the performance of our application dramatically. However, there are situations when multiple instances of service face contention for some shared resource.
Consider a service which apart from other functionality runs once per day some mission-critical job which should be executed in a single instance. At the same time, the deployment of a single instance is counterproductive because microservice bears other functionality which would benefit from horizontal scaling. One may argue that we can split such microservice into even smaller microservice but I'd be cautious against making microservices too granular.
As a solution, I offer to elect a single leader which would handle shared resource (mission critical job in our case) exclusively at a single point of time.
Such well-known leader election algorithms such as Bully algorithm or Ring algorithm require a lot of ceremony and knowledge of the logical topology of your system in order to be implemented. That's the reason why we'll have a look at leader election using a distributed lock.
You should use this pattern when the tasks in a distributed application need careful coordination and there's no natural leader.
As storage for a distributed lock, we'll use Redis. Redis is in-memory key-value storage so we'll take advantage of its speed. There is already a library that implements distributed lock over Redis. So we just have to make use of it.

The code

The sample code can be accessed on github. Let's break down what actually happens here.
The idea behind leader election via distributed lock is whoever acquires lock over shared resource becomes a leader. So naturally, we have a lock key quite similarly to built-in C# lock construct.

private const string _resource = "the-thing-we-are-locking-on";

Obviously, the storage is a single point of failure so we have to make sure that it is reliable. RedLock.net which we use for our case allows us to use multiple instances of Redis instead of single in order to improve reliability.
Here's how we create connection to Redis during start up.

var endPoints = new List<RedLockEndPoint>
{
    new DnsEndPoint("redis1", 6379)
    new DnsEndPoint("redis2", 6379)
    new DnsEndPoint("redis3", 6379)
};
_distributedLockFactory = RedLockFactory.Create(endPoints);

Every instance try to acquire a lock once in a given period of time. If it succeeds it becomes a leader. If not it will try once again later.

private readonly TimeSpan _expiry = TimeSpan.FromSeconds(_expirySecondsCount);

_acquireLockTimer = new Timer(async state => await TryAcquireLock((CancellationToken)state), _cts.Token, 0, _expirySecondsCount * 1000);

However, the leader does not need to re-acquire a lock since it has auto-extend feature. At first encounter with RedLock.net this might be unintuitive so it should be noted. Let's have a look at TryAcquireLock method.

private async Task TryAcquireLock(CancellationToken token)
{
    if (token.IsCancellationRequested)
        return;

    var distributedLock = await _distributedLockFactory.CreateLockAsync(_resource, _expiry);
    if (distributedLock.IsAcquired)
    {
        DoLeaderJob();
        _acquireLockTimer.Dispose(); //no need to renew lock because of autoextend
    }   
}

As mentioned above we get rid of re-acquire timer as soon as an instance becomes a leader taking advantage of auto-extend feature.
Once the instance fails the lock is released and is up to other instances for a taking.

Summary

As we can see the implementation of leader election via distributed lock is pretty straightforward. Still, it should be used with care since every locking increases contention between instances of a microservice and thus reduces the benefits of horizontal scaling.

DEV Community

Leader election with Redlock.net

Why locking things

The code

Summary

Oldest comments (0)