Christian Zink

Posted on Jun 30, 2021 • Originally published at itnext.io

How to Cache Aggregated Data with Redis and Lua Scripts for a Scaled Microservice Architecture

#redis #microservices #architecture #database

A scaled microservice-based application with a huge amount of growing data has a challenge to effectively deliver aggregated data like top lists.

In this article, I show you how to use Redis to cache the aggregated data. While the databases store the item/line data as “source of truth” and use sharding to scale.

A single Redis instance can handle some 100,000 operations per second

My example data model with users, posts, and categories can be a basis for your own usecases.

Example Use-Cases and Data Model
Setup Redis and Implement the Top Categories
Top Users, Latest User Posts, and the Inbox Pattern
Lua Scripting for Atomicity
Final Thoughts and Outlook

1. Example Use-Cases and Data Model

In the example microservice application users can write posts in categories. They can also read the posts by category including the author name. The newest posts are on top. The categories are fixed and change seldom.

See my previous post “How to use Database Sharding and Scale an ASP.NET Core Microservice Architecture” if you are interested in source code and more details about the example application.

Logical Data Model:

Currently, one million users exist. Every day each user writes about ten posts.

Top 10 Categories

The top 10 categories will be displayed on the main page. This would require a statement like this for MySql:

SELECT CategoryId, COUNT(PostId) FROM Post GROUP BY CategoryId ORDER BY COUNT(PostId) LIMIT 10;

Executing this statement for millions of lines would be very slow. And on every page visit, it would be impossible.

Because of the large amount of data, I also decided to shard by category. So it would require merging the top lists from multiple databases:

2. Setup Redis and Implement the Top Categories

Install Docker Desktop

Create the Redis container:

C:\dev>docker run --name redis -d redis

Connect to the container and start the redis-cli:

C:\dev>docker exec -it redis redis-cli

Add Top Categories

The the top categories (“CategoriesByPostCount”) use a Redis sorted set (ZSET).

Add the first entry with ZADD and 99 posts for the category “Category5”:

127.0.0.1:6379> ZADD CategoriesByPostCount GT 99 "Category5"

It adds one entry:

(integer) 1

Add some more entries:

> ZADD CategoriesByPostCount GT 1 "Category1"

(integer) 1

> ZADD CategoriesByPostCount GT 10 "Category2"

(integer) 1

Update Category5:

> ZADD CategoriesByPostCount GT 100 "Category5"

(integer) 1

> ZADD CategoriesByPostCount GT 98 "Category5"

(integer) 0

The last command gives a result of zero. This happens because of the GT parameter. The parameter helps to handle situations where updates arrive out-of-order (post counts don’t decrease).

Read Top Categories

Use ZRANGE and read the top 10 categories with count of posts:

> ZRANGE CategoriesByPostCount 0 9 WITHSCORES REV

1) "Category5"
2) "100"
3) "Category2"
4) "10"
5) "Category1"
6) "1"

Easily retrieve the second page (entries 11–20), etc:

ZRANGE CategoriesByPostCount 10 19 WITHSCORES REV

Prerequisites

The posts per category can be counted in SQL when a new post is created:

BEGIN TRANSACTION
INSERT INTO Post (...)
UPDATE Categories SET PostCount = PostCount + 1
COMMIT TRANSACTION

This is possible because the database is sharded by category. All posts of one category are in the same database.

3. Top Users, Latest User Posts, and the Inbox Pattern

The user’s posts are scattered over all shards. It is not possible to use UPDATE User SET PostCount = PostCount + 1 and then update Redis.

The operation in Redis has to be “idempotent”. The inbox pattern makes this possible.

Further reading: Outbox, Inbox patterns and delivery guarantees explained

Add Posts (with a race condition)

On every new post add add an entry to the *PostsByTimestamp *sorted set of the user:

> ZADD {User:5}:PostsByTimestamp 3455667878 '{Title: "MyPostTitle", Category: "Category5", PostId: 13}'

(integer) 1

Then increment the post count in UsersByPostCount:

> ZINCRBY UsersByPostCount 1 "5"

To make it idempotent check the result of adding the post to the inbox. Issuing the command again gives a result of zero (the entry already existed):

> ZADD {User:5}:PostsByTimestamp 3455667878 '{Title: "MyPostTitle", Category: "Category5", PostId: 13}'

(integer) 0

Then don’t increment UsersByPostCount.

The command ZADD to PostsByTimestamp and the command ZINCRBY to UsersByPostCount have to be atomic. I will show you how to use a Redis Lua-Script to make it atomic. But first, let’s read the top users and latest user posts.

Read the Top Users and the Latest User Posts

Top 10 users:

> ZRANGE UsersByPostCount 0 9 WITHSCORES REV

1) "6"
2) "10"
3) "5"
4) "8"
5) "3"
6) "4"
7) "1"
8) "3"

The user with ID 6 has 10 posts, ID 5 has 8 posts, etc.

Top posts of the user with ID 5:

> ZRANGE {User:5}:PostsByTimestamp 0 9 WITHSCORES REV

1) "{Title: \"MyPostTitle2\", Category: \"Category1\", PostId: 14}"
2) "3455667999"
3) "{Title: \"MyPostTitle\", Category: \"Category5\", PostId: 13}"
4) "3455667878"

4. Lua Scripting for Atomicity

Atomically Add Posts with Lua Scripting

A Redis Lua script can make the command ZADD to PostsByTimestamp and the command ZINCRBY to UsersByPostCount atomic. But an extra counter per user is needed so that all key parameters map to the same Redis hash tag.

The curly braces like in the key “{User:5}:PostsByTimestamp” are signifiers for a Redis hash tag.

This Lua script tries to add a key to a sorted set. If it can add the key, it also increments a counter. If the key already exists, it returns the value of the key:

Use EVAL to call the Lua script and pass “{User:8}:PostsByTimestamp” and “{User:8}:PostCount” as keys (one line on the command line):

> EVAL "if tonumber(redis.call('ZADD', KEYS[1], ARGV[1], ARGV[2])) == 1 then return redis.call('INCR', KEYS[2]) else return redis.call('GET', KEYS[2]) end" 2 {User:8}:PostsByTimestamp {User:8}:PostCount 3455667999 "{Title: \"MyPostTitle2\", Category: \"Category1\", PostId: 14}"

(integer) 1

Then set the count for user 8 in UsersByPostCount:

ZADD UsersByPostCount GT 1 "8"

Store the Script in Redis

For performance reason you can store the sript in Redis:

> SCRIPT LOAD "if tonumber(redis.call('ZADD', KEYS[1], ARGV[1], ARGV[2])) == 1 then return redis.call('INCR', KEYS[2]) else return redis.call('GET', KEYS[2]) end"

"cd9222afab5eb8d579942016a8c22427eff99429"

Use the hash to call the script:

> EVALSHA "cd9222afab5eb8d579942016a8c22427eff99429" 2 {User:8}:PostsByTimestamp {User:8}:PostCount 4455667999 "{Title: \"MyPostTitle3\", Category: \"Category1\", PostId: 20}"

(integer) 2

5. Final Thoughts and Outlook

In this article, you set up Redis and started with a simple use-case to cache aggregated data. Then you used the inbox pattern and Lua scripting for atomicity.

In one of my next articles, I will show you how to implement it in a C# ASP.NET Core microservice application.

Redis offers much more than I showed you in this article. You can explore the other commands and use-cases how they solve problems in your application. In a real-life application, you might have to use TTL to automatically expire entries so that the cache does not grow unlimited. Maybe you also need to scale Redis.

Please contact me if you have any questions, ideas, or suggestions.

DEV Community