Lately I've been trying to re-implement a cache library using redis. The (simplified) requirement is to use cached value for a given key if present else fetch fresh value and cache the value to redis (on the first time or the first time after cache value expires). Also assume that fetching fresh value, even once, is a super expensive operation (it causes heavy load on the database).
Seems like a simple requirement, right? But one does not simply write distributed system code!
So we have couple of challenges here.
- We need to try our best to call fetchFreshValues() as few times as possible, across all server processes, since it is expensive.
We need to make sure that all processes returns a consistent value for a key. i.e. let's say our system calls fetchFreshValues() independently & parallelly from two processes, then only one should successfully write to redis whereas the other must fail.
The one that failed must fetch whatever was saved and return that instead of the result of fetchFreshValues() it got. You might be asking whether this is necessary. I'd reckon that system either fails or return a stale value rather than returning potentially different value than the other processes (i.e. correctness. though stale value blows away the "correctness" part of my blog's title I think.. but you can adapt the info from this blog for correctness by returning error in this case).
One way to achieve the above is to acquire a distributed lock on a key, then fetch latest value and write to redis. The recommendation from Redis is to use redlock for achieving distributed lock.
And so, for a long time we did use redlock for this, yet there has been criticism about the algorithm much before we started using it -> https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
The main takeaway from Martin Kleppmann's post is that redlock's dependence on time cannot guarantee correctness on acquiring an exclusive lock nor for the subsequent database operations. There may be situations that two process manages to acquire a lock to the same resource (either due to a system time reset or more likely because of the lock expiring before the DB write began).
Martin further suggests that to fix this, redlock needs to issue incrementing "fencing tokens", and that the storage layer collaborates to accept writes for only those tokens that are greater in number than the saved data's token number (conversely, reject writes that are lesser than or equal to the saved data's token number).
However in my case, I am using Redis for the cache storage. And redis doesn't seem to have this kind of concurrency control out-of-the-box. If two parallel processes tries to write to the same key at the same time, redis will accept both and the last write wins.
The first time I read the fencing tokens solution, it felt strongly similar to multi-version document systems (like CouchDB)... that rejects writes if the version sent doesn't match the one on disk (and the new version gets incremented on save). I think this is called multi-version concurrency control (MVCC).
Redis can't do that out-of-the-box. But searching around, I came across a potential way to implement this. Using Redis scripts - which are custom Lua scripts that redis would run atomically and transactionally + gets rolled back if script fails part-ways. References -> https://redis.io/topics/transactions and https://redis.io/commands/eval
Quoting the relevant sections:
A Redis script is transactional by definition
Also Redis guarantees that a script is executed in an atomic way: no other script or Redis command will be executed while a script is being executed. This semantic is similar to the one of MULTI / EXEC. From the point of view of all the other clients the effects of a script are either still not visible or already completed.
Seems cool. But what does redis do in a multi-node setup? From their docs:
All Redis commands must be analyzed before execution to determine which keys the command will operate on. In order for this to be true for EVAL, keys must be passed explicitly. This is useful in many ways, but especially to make sure Redis Cluster can forward your request to the appropriate cluster node.
Ok. So I presume, a multi-node setup means that a given key only has one master - i.e. keys are sharded across nodes. And that the non-master nodes are replicas. So redis uses the key passed as separate argument, to make the decision about which node the script would run on. Seems cool so far.
Yet another question remains. What about replicas? We know replicas are async in nature, but will the script execute in order on the replicas as the original order? I don't know the answer to this. I am guessing "yes".
Assuming the answer is "yes, order is maintained on replicas", then that does make redis scripts an option to implement a simple version check before every write.
So let's dive in to my potential solution (Lua codez!):
local newPayload = ARGV local newVersionStr, newData = ARGV:match("^([0-9]+)|(.+)$") local prevVal = redis.call('get', KEYS) or nil if prevVal == nil then return redis.call('set', KEYS, "1|" .. newData) end local oldVersionStr, oldData = prevVal:match("^([0-9]+)|(.+)$") local newVersion = tonumber(newVersionStr) local oldVersion = tonumber(oldVersionStr) -- check if version matches before writing if oldVersion == (newVersion - 1) then return redis.call('set', KEYS, newPayload) else return nil end
EVAL "the script as above" 1 "key-to-write" "new-version-number|new data"
What I am doing there is writing a version number for the first write and making sure that for subsequent writes, the application sends the data along with plus-oned version number. If two processes try to use the same version, then the first one would succeed and second would receive
null. The one that received
null would then know that another process has changed the data, so it can handle the case appropriately.
Two small notes here:
- I didn't use JSON for saving version number, cause some of our JSONs are so huge that it did slow down node.js's JSON.parse() (but may not be true for redis/lua, I didn't measure it).
- The version check is pretty rigid
oldVersion == (newVersion - 1). Probably making the check as
newVersion > oldVersionwould suffice and potentially make the writes order insensitive for the replicas.
At this point you might ask, whether there is a point in using redlock anymore? I think it is still valuable... as even though it does not solve problem #2, it still solves problem #1 from my initial requirement (i.e. to minimize calls to fetchFreshValues() to as few times as possible). From a probabilistic perspective, the system most likely would only have one write-lock per resource and the lock acquirer successfully writes to storage before the lock expires. This would work most of the time. And the "most of the time" is what's important for solving #1. So I'd keep redlock (solves #1) + have this storage level concurrency control in place (solves #2).
Hope you enjoyed this piece. I'd like more eyes reviewing that code I wrote there. Here is the code with a small test to play with. Any criticism welcome.
This post was reposted from https://www.codepasta.com/redis/distributed-systems/2020/02/22/concurrent-redis-writes-and-correctness.html. Follow me on twitter for short ideas/tips/tricks on programming and follow my blog at codepasta.com for long-form articles.