Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Advanced data modeling for Amazon ElastiCache (DAT438)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Advanced data modeling for Amazon ElastiCache (DAT438)

In this video, Yaron and Kevin McGehee from the ElastiCache team demonstrate advanced data modeling techniques using Amazon ElastiCache and Valkey for building a massive multiplayer online game. They cover caching strategies including lazy loading with DynamoDB triggers, preventing thundering herd problems with locks, and client-side caching with invalidation subscriptions. The session explores various Valkey data structures: Hash and JSON for session storage, HyperLogLog for unique user counting with sub-1% error in 12KB, Bloom filters for membership testing with 90% memory savings, geospatial commands for location queries, and Pub/Sub for real-time chat. They also demonstrate semantic caching with vector search achieving 95% recall, and rate limiting implementations using both simple counters and token bucket algorithms with Lua scripts. Each concept includes practical code examples using Valkey Glide.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction to Amazon ElastiCache and Valkey: Building a Scalable MMORPG

Hello everyone, thank you for joining our session today. My name is Yaron, and I'm a Senior Engineering Manager. Here with me is Kevin McGehee, a Principal Engineer, and we are both from the Elastica team. Today, I'm going to start with an introduction to ElastiCache and Valkey, and later we're going to dive deep together on some advanced use cases and data modeling that I hope you'll benefit from when building your applications.

For those of you who are not familiar with Amazon ElastiCache, ElastiCache is a fully managed service that makes it easy to deploy, operate, and scale an in-memory data store on the cloud. ElastiCache significantly improves the performance of your application by providing microsecond response times. We support three open source engines. We started with Redis Open Source and Memcached, and Valkey. Valkey is an open source high-performance key-value data store. It was forged from Redis Open Source last year when Redis changed its license, and last year we also announced that ElastiCache and MemoryDB support the Valkey engine. Today, all the examples will be based on the Valkey engine, and to make it even more engaging, we decided to build an application together with you so we can understand how data modeling can help us build a highly scalable application. Kevin and I chose to build a massive multiplayer online game together with you. We definitely need highly scalable technology to support that. We need to make sure we're providing very low latency and high throughput to our customers so they can enjoy the game.

Let's start with caching because caching is fundamental to making your application run much faster. I want to start with some high-level concepts. Let's assume we start building our application on EC2 and we have persistent data stored in a relational database. I'm using RDS. As long as our data is in a steady state, this architecture can work well. However, when we start to see more workloads with increased read and write throughput, we have multiple options. One option is to scale up our RDS or add more replicas so we can scale out and benefit from read throughput using replication itself. RDS itself has a caching layer, but it caches pages and blocks that don't contain the results themselves. From time to time, you need to fetch the data from disk, and there is additional latency involved in that.

For that reason, we can choose Amazon ElastiCache, which explicitly stores data in memory. Once we start to populate results into the cache, we can immediately start reading data from the cache itself and improve query response time to achieve microsecond response read times. We can also relieve pressure from the backend and from the database itself, and we can optimize our costs by allowing the backend to scale down. Now let's move forward and see more advanced use cases. I want to talk about lazy loading, which is one of the strategies to handle the cache, and I want to show you how I'm managing that together with invalidation to the cache itself so we can make sure the cache stays fresh and without stale data.

Caching Strategies: Lazy Loading, Cache Invalidation, and Solving the Thundering Herd Problem

I'm using Amazon ElastiCache, and I'm starting to read the data directly from the cache. If we receive the data, if the data is there, we have cached it. All is good; we get that in a microsecond response time. If the data is not there, we have a cache miss, then we need to go to the persistent storage layer. Here I'm using Amazon DynamoDB. Right after that, I'm updating the data into the cache. This is how I lazily populate the data, warming the data upon every read. Now, let's assume that one of the items has been updated. What we're doing here is making sure the cache stays valid and has no stale data.

For that, I'm going to use the trigger function that I have on Amazon DynamoDB to trigger a Lambda. The Lambda is responsible for invalidating the specific item on the cache so we can make sure that the next time we do lazy loading on the same path, we're going to update that data.

Now, sometimes we have very hot data in our cache. We also use time to live because this is the best way to manage our cache memory, and we want to make sure that the data is also evicted from time to time. So we're going to use expiry. But if we have a very hot item on the cache, we still want to make sure that we have a way to update the time to live before it expires so we can benefit from continuing to read it from the cache itself. For that, we can allow some background tasks to update the data before it's going to expire. In my example here, I'm using Valkey multi-command. The multi-command actually queues multiple different commands and executes them atomically on the server side. What you see here, I'm using GET first to receive the item itself, and then I'm checking what is the expiry time that's left for this item. Now in different examples, as you can see here, we have multiple clients that are slowly listening to the expiry until I get the last client that crossed the 5 second threshold. That's the one that's going to update the data on the cache to make sure that we have more expiry period for this hot item.

Now, as a connector to that, there is the thundering herd problem that we want to avoid. This problem occurs when multiple processes or threads attempt to proceed simultaneously, often resulting in high tension and resource strain or bottlenecks, and this is something that we want to avoid. So let's assume that I have a very hot item on my cache that's about to expire. Now, if all my microservices try to fetch the item from the cache, they all will receive a cache miss. What eventually will happen is they all automatically, all together, will go to their database and will put much more pressure on my RDS. And eventually we're going to impact our latency, and this is something that we want to avoid.

For that, we can build a kind of barrier, a synchronization mechanism that allows only one of the clients to request the data from the database itself and only that client is going to update and populate it into the cache. So once that client is finished and when all the others are waiting for it, we can read the data and receive a cache hit. So how do we do that? How do we build that? For this example, I took only two microservices, and I will go step by step and show you how we can build that together.

So the first microservice tries to fetch the item from the cache, but the item is not there, so we're going to receive a nil. The second microservice now is a bit slower. It tries to fetch the same item. It will also receive a nil because the item is not there. Now, the first microservice will try to acquire a lock. I'm using here the NX and the EX arguments, which means that NX, if the lock already exists, I should fail. And EX means that I'm going to expire the lock after 5 seconds. It's successful because the lock is not there. I was able to acquire it, and then I'm starting to query the data from the database. Now, my second microservice tries to acquire the lock, and remember, because I'm using the NX, I'm going to fail because the lock is already there. Now what I'm going to do is wait until the lock will be freed.

So I'm waiting periodically until the lock is freed by the first microservice. In the meanwhile, the first microservice finishes fetching the item from the database, is going to populate it into the cache and free the lock. After that, my second microservice will be able to continue and fetch the item from the cache itself.

Client-Side Caching with Connection Pooling and Invalidation Tracking

Another advanced approach is client-side caching. That means that if I'm even more sensitive to latency, I can use my local cache to serve my on-demand workloads, which means I can benefit from very low latency capability. So for that we have multiple ways to achieve that capability.

First, the idea is very simple. We are fetching the item from the remote cache. Once we receive it, we store it on the local cache, and from that point, I don't need to talk with the remote cache because I can serve the items and requests from my local cache. But there is a problem: I need to manage the data freshness on my local cache. One approach is to use the time to live, the expiry, that I can put on my local cache. The second one is to subscribe to the remote cache and to receive notifications upon every change.

So let's see how it works. The first and the default approach of Valkey is to remember to store all the data, the mapping between the clients and the keys that they accessed, and every time we change one of the keys, I have the data stored in my Valkey and I can update the specific clients. This is very efficient, but on the other hand, we need to have additional memory to store on my server side. The second way is to subscribe to a prefix. In my example here, I can subscribe to a user as a prefix. So every time any microservice is going to update a key with the same prefix, I'm going to receive a notification on that.

So how am I going to implement that on my client side? First, we recommend as a key principle to use a connection pool, a long-lived connection. The reason for that is because creating a TCP connection is an expensive operation, specifically if you compare it to Valkey get and set, which is an order of magnitude faster than creating a connection. Additionally, using a client connection pool with a finite size reduces the overhead of connection management, and it also bounds the concurrent incoming connections from the client application.

So what I'm doing here, I'm splitting my connection pool into two types. I'm going to have connection zero as the invalidation connection, the control connection. And all the rest from one to nine are going to serve as the data connections. Connection zero will subscribe to the invalidation channel. And all the other connections will perform the tracking on redirect on connection zero, which means that connection zero will encapsulate all the invalidation to all the rest of the connections. So let's assume that I'm updating one of the keys. Immediately connection zero will receive that update on the invalidation connection with the specific item, and we'll make sure to update the data on my local cache.

Data Structures for Session Management: Hash vs JSON

Now, sometimes we are thinking about how to store the data on our Valkey engine. So we have a lot of ways to store the data there. I want to talk about two types because in my example, I want to store some session information. I will start with hash data structure, which is a very common data structure to store session data. It's very simple because it's a mapping between key and value pairs, and we have the key that directly accesses the cache itself. So in order to create the hash itself, we're going to use the HSET command and we're going to provide the hash name and the list of pairs of keys and values. We can also benefit from very fast random access in constant complexity to receive a specific item from the cache itself.

Let's go through a simple code to see how easy it is to use that. So here I'm using Valkey Glide to connect to the Valkey server. And I'm generating a session, and I'm using the native HashMap that I have on my Python to use the HashMap. Then I'm generating a UUID that will serve as the session ID. Later, I'm going to use the HSET command, the Valkey command, to serialize the hash data and using the session ID itself. To receive the data from Valkey, I just need to use the session ID that I generated, and I'm going to use also the HGETALL command, and later I can serialize it to my local objects. Very simple. A different data structure that we also recommend to use is JSON, very popular.

JSON is a data structure that consists of two main objects: arrays and key-value pairs. In this example, I'm creating the JSON document using JSON.set. I can simply use the JSONPath to retrieve specific items from the JSON. I can use it to retrieve items from the array itself or from the upper level of the JSON. I can also use the array append to add additional items into the existing JSON very simply.

Let's go through a simple code example to see how it works. I'm connecting here to my Valkey server and creating the JSON document at the beginning. Then later, I'm receiving the email of one of my users very simply, and I'm doing the same to retrieve the age. I can also use the increment number to increment one of the indexes in my JSON itself.

So what is the main difference between JSON and Hash? JSON is usually used for nested documents, while Hash is a key-value pair for more simple objects like a session store. With JSON, we can use it for more complex ones. JSON supports JSONPath to filter on the server side, while with Hash, we need to do it and manipulate it on the client side. However, with both of them, we can modify the items on the server side.

Semantic Caching and Vector Search for Generative AI Applications

We announced a few weeks ago the support of semantic caching on ElastiCache, which can also retrieve one of the best performance for vector search with a high ratio of recall of 95 percent. To understand semantic caching, we first need to understand vector embeddings. You can think about translating unstructured data like documents, videos, audios, and images to a representation of your data. Vector embeddings capture the semantic relationship between different semantic elements.

For example, with text, the semantic awareness of a word like "book" can be understood sometimes when we're reading a book and sometimes when we're reserving something. To process and create vectors, we involve ingesting data from the source, splitting it into chunks, and then converting it to vectors. Let's assume that we have a chatbot or some other application in our game, and we want to get a query prompt from the user.

We will use the same example, like I mentioned before, with a book, but I'm going to add furniture and finance here. We will visualize it with only three dimensions for simplicity. Let's assume that we already have a question in our vector that says "How long do I have to wait for the next quest?" and then I'm receiving a new query that says "When does the next quest start?" The cool thing here is that semantic search can actually understand that these have the same meaning and return us the results.

Let's see how it works. Let's assume that we have a generative AI application, and our user is now querying "What is the next quest to start? When is it supposed to start?" What we are going to do is query the foundation model and ask them when the next quest is going to start. This is a bit costly and expensive from both cost and performance perspective. But once we receive the data, we can return it to the client itself.

Now let's assume that I'm adding ElastiCache into the architecture. I'm using vector search here. The user now tries to generate the same question for my application, but before that, I'm going to use Amazon Titan to generate the embedding vector for me. Once I have the vector itself, I can fetch the data from my ElastiCache vector search. Because this is the first time, I don't have the data yet in my cache, so I'm going to receive a cache miss.

Now I need to go again to my foundation model, ask the question, and get back the response. But this time I store the item in my cache for the next time I'm going to read it. But let's make it even more cool and more complex.

Now I'm receiving a similar query, not the exact same query. We can understand that traditional caching will not work in that way. We cannot just store the data on the cache and expect it to be available as a cache in the next iteration. For that, we have vector search. Let's go through the flow. We go to the generative AI, then we create the embedding vector. Once we generate the embedding vector, with a different question, we are asking the ElastiCache vector search. But this time we're going to receive a cache hit because contextually it understands that they have the same context. With that approach, we are saving the need to go to the foundation model, which is expensive both from cost and performance perspectives.

Let's go through the API itself and see how it works. I'm using FT.CREATE to create the index itself. I can also use JSON to store the data, but here I'm showing an example with Hash. We are using HNSW as the algorithm because it works well in high dimensional space and is very optimized for performance. I'm going to use dimensions of 128. In my previous example I showed you only 3, but here I'm going to use more dimensions. I'm going to use the cosine distance metric as well.

Now, to search the item, I'm using the closest neighbors. I retrieved only 10 in that example from my vector search. Now I have another option where I can filter out some of the results on the fly on the server side. For that, I'm using the index. Here in that example, I'm using filtering only for per country. So in this code that I'm going to show you, we populate the vector store with the text data that builds the foundation of the semantic search body.

First, we're going to load the text data and split it into chunks. We're going to use the LangChain library for that. Next, we're going to convert the chunks into embeddings using the embedding model, and we are using the vector tan to use the embedded vectors. To query the data, we're going to use FT.CREATE, which we use for FT Create. We're going to store it first on the vector search. So to fetch the item and fetch the data, we're going to use FT.SEARCH. We use the same approach here. We're going to use the closest neighbors. Here we're using only 3, and once we have it, we can search the item on the vector search itself.

Building In-Game Chat with Pub/Sub: Sharded vs Classic Approaches

Now I want to call on Kevin to continue with more advanced use cases. Thank you. Let's talk about some additional data structures that you can use in Valkey, starting with one of the functionalities of publish and subscribe. Valkey, as we talked about before, is a rich data structure server. With our MMORPG, we want to be able to answer questions about in-game chat. We want to build a chat functionality for our game where players can join and participate in an in-game chat that's ephemeral.

Let's say here we have a Shadow Dragons Guild and a bunch of users that want to participate and communicate with each other. For that, we can use the pub sub feature within Valkey. If you're not familiar with the pub sub pattern, it's a messaging pattern where there are producers and publishers of messages that are decoupled from the subscribers. A subscriber comes in and subscribes to a named topic. Let's say our topic here is just shadow dragons, and a user can come in and say I want to listen to messages on this topic. Another user can come and publish messages to those topics. So I can come say hello, and then anyone that subscribed to that topic will get those messages.

So how does this work inside Valkey? The data here is ephemeral. Valkey is basically just used as a messaging bus so that the data is relayed from producers to consumers directly. It's not stored in memory other than for the transition between the producer and consumer.

It allows at most once delivery. So if you're not listening to the topic, then you'll miss the message. However, there is another data structure that we're not going to go into today called streams that allows you to store messages inside Valkey in memory for long term storage and has a number of other rich features, such as consumer groups to be able to resume where you left off. But Pub/Sub is our choice for this since the chat message is just ephemeral.

There are two different types of Pub/Sub in Valkey. One is what I call classic Pub/Sub, and one is sharded Pub/Sub, which is relatively new. We recommend using sharded Pub/Sub because it has better scalability properties. In classic Pub/Sub, all of the messages that you send are forwarded to all of the nodes in the cluster, so you have this large fan out, and that means any subscriber can go to any node and get all of the messages on a topic. But of course that doesn't scale as well when you have very high volume Pub/Sub traffic.

Whereas with sharded Pub/Sub, just like with sharded key space in cluster mode enabled, it goes to a particular shard, and all of the producers and consumers know which shard to go to. So the fan out is much smaller. The main downside of using sharded Pub/Sub is it doesn't support wildcard identifiers. You can't subscribe to a large number of topics. You have to do a specific named topic so that you get hashed to the right shard.

So how does this work in practice? Let's say we have two users here, Alice and Bob, and they want to join and participate in this chat room. For sharded Pub/Sub we'll use the SSUBSCRIBE command for Alice to subscribe, and we'll prefix our key name here with "chat" and then "shadow dragons" being the name. It'll return back the identifier of the topic as well as a message saying it's successfully subscribed.

Now Alice can publish to this message. She doesn't have to be subscribed, but she can publish to a topic. We can do any sort of string to publish to a topic, but here we'll format it in JSON so it has enough metadata to render the chat in the application. So we'll do SPUBLISH, we'll get back a one saying it's been successfully done, and then immediately after that we would get this SMESSAGE because Alice is also subscribed to that topic. The message has in it the topic that it was published to as well as the actual message that the publisher sent.

Meanwhile, Bob can come along and subscribe to it similar to how Alice did, and then do a publish just before saying "Bob here." Then that message shows up on both of them because they're both subscribed at the same time. You can build pretty rich applications this way. It doesn't require a whole lot of memory since these messages are just stored transiently and relayed to the subscribers.

Probabilistic Data Structures: HyperLogLog for User Counts and Bloom Filters for Membership Testing

Next, let's look at a couple of different probabilistic data structures that we'll talk about in Valkey. Valkey is an in-memory data structure server, so storage matters. You don't want to have to store a large amount of data if you don't have to. There are some applications in our MORPG where accuracy isn't paramount, and so we can get away with answers that are perhaps off by a bit.

One such example is a unique user count. Say we want to keep track of how many daily active users, unique daily active users are playing our game. Of course we can do a counter and get non-unique users, but we want to do something that is a little bit more advanced. Normally, let's say we want to do it on a day by day basis. We have four different users here that are playing our game.

Traditionally, if you were to use a set data structure it would take over and space. We would store all of the different user names or user IDs that are associated with that counter, and then we could just take the set and that way we would ensure uniqueness. But within Valkey there is a data structure called a HyperLogLog. This can approximate set cardinality, so it can answer the question of how many unique users you might see. The advantage here is it doesn't take linear space. It is bounded to 12 kilobytes in Valkey, and it has a less than 1% standard error rate, which is pretty remarkable.

So it allows constant time operations because it is small, so it doesn't require even space or login access.

So what is the intuition behind how HyperLogLog works? Say these are some poorly drawn coins that we're flipping. If I tell you I'm flipping a bunch of coins but I get 5 heads in a row, it's likely that I flipped a bunch of coins in order to see a sequence of 5 heads in a row. This is the intuition behind how the algorithm works. It takes these values, observes them, and then accounts for the repeating patterns that are less and less likely over time.

So what does that exactly mean? The HyperLogLog algorithm takes your user or whatever string that you want to count, and it runs it through MurmurHash. This produces a 64-bit uniformly distributed binary string, which is essentially a seemingly random value of zeros and ones. It takes the first 14 bits of that binary value and computes a hash bucket out of it. There are 16,000 buckets that can be computed within the HyperLogLog.

Then it takes the remaining digits and counts the number of leading zeros. This is the analogy of coin flips. The larger number of leading zeros that are happening in this uniformly distributed binary string, the more values you may have seen in order to observe that pattern. It takes the bucket and the number of leading zeros and stores the maximum number of leading zeros per bucket that it has observed in the sequence of strings being added to it. Then it takes the harmonic mean of this overall, which is a mathematical equation to basically identify the total number of observations.

Luckily, you don't have to actually deal with any of that math when you're using HyperLogLog in Valkey. It has a pretty simple API to interact with it, which uses the PF prefix. PFADD will add a number of items into a particular HyperLogLog. Say we want to name it daily active users with the date. We can add 3 to it and we'll get back a 1 saying it's been successful. If we issue a PFCOUNT command, we'll get back 3 in this case.

If we were to try and add the same user, it will come up with the same hash and shouldn't modify it. We will get back a 0 saying it has been unsuccessful. The count is unchanged in this case. There are some other interesting properties here too. Say we want to subdivide it. Instead of just doing daily active users over the entire MMORPG, we want to partition it into different games. We can add the 3 users to game one and maybe just Alice and Doris to game 2. These are different HyperLogLogs. HyperLogLogs are composable, so we can use this command called PFMERGE.

We can take the two of them and it basically goes through that data structure we talked about and just updates the max number of leading zeros based on the number in each bucket. It allows you to then get a total sum of unique users across multiple HyperLogLogs. In this case, we have only 4 unique users even though we've had 5 different observations. So we'll get back a 4 when we count that aggregated HyperLogLog.

I ran a quick simulation of the memory usage and error rate of a set and a HyperLogLog. In adding 10,000 numbers to a set, you'd get back 10,000 exactly because it's storing every single number in the string and then calculating that cardinality. But the memory usage of the set as calculated by Valkey is about 420 kilobytes. Whereas if doing the same thing with the HyperLogLog, I got back 9,987, so 0.13% error. But the memory usage here is 12 kilobytes. You can see that 12 kilobytes will not expand beyond that. It's the maximum size of the HyperLogLog, but the set usage would increase with the number of unique users that get added.

Next, let's answer a slightly different question of whether two players know each other. Say you want to keep track in your games whether you've observed or interacted with a player before. Basically, this forms a graph that you'd have to keep track of the different users that have engaged with each other, which you could store in Valkey with sets again. We have the different interactions for each individual player.

The downside, of course, is that this also impacts storage space. So can we do better than that? Can we avoid having to store all these combinations? With that, we can use a Bloom filter, which is a relatively recent data structure added to Valkey. This answers a different question—it answers the membership testing question of whether a given user has interacted with the particular user that you're querying for. This can result in over 90% memory usage savings over a set depending on your configuration.

Unlike the HyperLogLog, this has some configuration options. It has a tunable false positive rate, that you're able to tune. The lower the false positive rate, the higher the memory usage of that data structure. By default, it's set to a 1% false positive rate. So the error that you can get is that you would have seen interacted with someone when in reality you hadn't done that. This is also a constant time operation. It uses a number of different hash functions, which we'll see in a minute, in order to test and add members to this set, but it's typically a relatively small constant time.

So how does this work under the cover? The intuition is that you have this fixed size bit array that has a number of different hash functions that operate on it. Say for this example we have 10 bits here, and we want to add Bob to it. Let's say we just have two hash functions. We'll calculate hash #1 of Bob and then modulo it by the number of bits, and we get bit number 5. We'll go and update that in our bit array. We'll do the same thing with the second hash function here, which is another unique hash function. We'll get the value of 9, and we'll update that too.

We can do the same thing with Alice. Everyone that we add goes to the same function, uses this MurmurHash as well. We can update Alice here. There's a conflict. We get back 5, and 5 is already 1, so we don't do anything with it. Let's say hash #2 of Alice is the first position, so we'll go and update that. When you go and test membership, it goes through the same exact process too. Say we want to check to see if someone else is present. We will go through and observe whether the bits that are hashed are 1 or 0. This is why, because the hash function is non-unique, you can have false positives.

In this case, we can see that Alice was not previously added because that first bit is 0. But if, for example, you get back position bit 5 and bit 9 for another user, then you might have a false positive where it conflicted with Bob. If all the bits are 0, then it was definitely never added to the set. But if all of the values that we look at are 1, then it was maybe added to the set previously. So we don't have to actually store the underlying strings, just basically some representation of the hashed value. So the memory usage savings are pretty significant here.

If you look at a 1% false positive rate, you can get about 112 million items with 128 megabytes, or scaling linearly, 448 million items with 512 megabytes. If you were to drop the false positive rate to 0.1%, then with the same memory usage, you can still get a relatively large capacity of 74 million with 128 megabytes or 298 million with 512 megabytes. And of course, you can tune this as your needs fit.

How do you actually use this in Valkey? It has a relatively simple API where you can interact with it as you would a set. There's the BF.ADD command, which allows you to add strings into a given Bloom filter. We get back a 1 here, saying Bob has been added. We can similarly add multiple users simultaneously and atomically to the interactions here. And then there's the EXISTS command, which is the membership testing command where you can just test if a given user was there. We'll get back a 0. However, Bob, of course, would return back a 1, saying that it does exist and has been added. But what can happen is that you get back a false positive here. So we're testing whether Frank has been added to the interactions with Alice, but we get a result that will conflict in this case with either Bob, Charlie, or Doris. We get a 1 in this case if it conflicts with either Bob, Charlie, or Doris's hash values.

Geospatial Commands for Location-Based Queries in Virtual Worlds

Let's move on to another data structure within Valkey that can keep track of geospatial data. Let's say we are managing an MMORPG and we have a virtual rich world where users are in different locations. We want to see which users and points of interest are located near a given user at a point in time, and for this we can use Valkey's geospatial commands.

So what is geospatial and how does it work? In Valkey, you would map first with a latitude and longitude. Basically, you would take your virtual world and divide it into a normal grid. That latitude and longitude that you store as input is taken into a geohash algorithm and stored as a traditional geohash. It works by subdividing the world into smaller and smaller grids and generates a geohash, which is a string of characters. We would take it into a 52-bit integer.

The properties of the integer are interesting in that the closer these numbers are, the closer they are in location to each other because of the way that it interleaves the latitude and longitude. This has the nice property that you can use it as a score, so things that are close to each other are numerically close to each other in latitude and longitude. This score is then fed into another data structure within Valkey called a sorted set. Sorted sets are often used for things like leaderboards where you want to keep track of high scores. In this case, the sorted set is actually used to query things that are nearby a given latitude and longitude. However, it has some convenience functions on top of it, allowing you to do bounding box and radius querying within O of log n time.

How would you actually use it in practice? You have these GEOADD commands. It has these convenience functions on top of the sorted set. You can do a GEOADD with different latitude and longitude. Here we're adding Alice as well as a point of interest, the beach down here on the bottom. We get back a 1 saying it's been successfully added. Then let's say we have another player here that wants to query for the points of interest around them. So you use this GEORADIUS command and you query the map. It has a slightly different latitude and longitude, so we can do different radii here. Let's say we do a 3-kilometer radius. We would get back both items that we've added here: Alice as well as the beach at the bottom. However, if we were to do a smaller radius, just a 1-kilometer radius around that point, we would just get back Alice. So you can use it to query things around it and then use it to filter further so that you can query the distance between any two points as well and get it back in a number of different units. Say we want to see the distance between Alice and Beach One in meters and get back that it's about 1,617 meters between them. So you can use it to further filter the results of the radius if you want to do things like bounding boxes.

Rate Limiting with Lua Scripts: From Simple Counters to Token Bucket Algorithms

Now let's look at a usage of rate limiting in our application. You might want to use rate limiting in an MMORPG to enforce game mechanics. Let's say we want to answer the question: does a player have enough stamina to cast a given spell at a point in time? Bob is trying to access a resource, in this case spell casting, and can only do so many spells in a certain time period before that quota resets. We can do this simply in Valkey using a counter effectively. The string data type can also support numerical operations. You have increment and decrement commands in Valkey to manipulate numbers. If I were to increment a given counter here, it will create it and set it to one. I can further manipulate that, and if I increment it again it will go to 2. This is of course a constant time operation that you're performing, so we can use this to build a rate limiter for our application. Let's say we want to start our counter here at 0 and allow 3 requests until a given time delay is up, so you just have 3 requests in a 5-second period.

What we can do is start our counter and use a Lua script to do this atomic logic to be able to count and return back a value, then make sure it expires at the end of that time period. We'll use a TTL basically to reset our rate limiter at the end of the duration. Just looking at this numerically, users trying to offer requests will count up to 3 here. Once it gets to 3, no more requests are allowed until the TTL fires. As soon as the TTL fires, that item is basically removed from the cache, and then that same process can restart again as if the user has sufficient credits.

Let's walk through that Lua script to see how it works. Valkey supports some simple scripting operations where you can do compound operations atomically in a Lua script. Lua is a language that you might not be familiar with, but it's relatively straightforward. It can interact with the server command to do simple logic on top of it. Let's say we have a couple of different globals here in our script. We want to set our limit to 3, and our expire time would be 4 seconds in this case.

We'll start by fetching the key that we are operating on, in this case the given user's stamina, and we'll fetch it from the Valkey server that we are running on. If it doesn't exist, meaning that it has either been evicted or it's starting from scratch, then we'll set it to zero and set the expiry time to 4 seconds. If it does exist, then we have already previously fetched the value, and then we just check if we're within our limit. If we are, then we'll be able to increment the key and return 1, saying that the user is allowed to cast the spell. Otherwise, we'll return 0.

This is sufficient to build a simple rate limiter that just allows a specific number. But let's say we want to have something a little bit more advanced. Let's see how you actually use it too. We can load the script first by just inserting it into Valkey. We can do SCRIPT_LOAD and put that entire string that you just saw before into the server. Which then gives you back this SHA ID. This is the unique identifier, hash value of the script that you can then use in subsequent requests to invoke that script.

So then you can do EVALSHA, the script ID, and then pass it the arguments here like we saw before. We want to give it the key name, the number of spells that Bob has placed in this particular example. And it will then return back a 1 or 0 whether that is an allowed spell or a disallowed spell. This is relatively simple, but let's say we want to have spells that have different types, different costs, so they can do a simple spell or a more complex spell. Our previous example wouldn't have worked with that because it just allows a simple counter.

With this we can do a token bucket, which is a more advanced rate limiting algorithm where you have a given capacity and refill rate of that bucket. If the bucket has enough credits for your given spell, we'll subtract from it. Otherwise, it will be disallowed. We keep track of a little bit more metadata associated with a given user's spell count here. We can do this in Valkey by using the hash data structure that was talked about earlier. For this we need to keep track of two different pieces of information. One is the number of tokens that are currently in the bucket, and two is the last time we updated or touched that bucket. You need the timestamp because tokens accrue over time, so you can calculate the difference there.

Let's break down what the script looks like in this scenario. Still using Lua script, but it's a little bit more complex. We can look at some advanced Lua scripting here. Rather than having hard-coded values like global values in the script before, we're going to take in these values as arguments here. We have the bucket size and the refill rate as arguments to the script, as well as the number of tokens for the given request, and we still get the key out.

The first thing we need to do is get the value of the hash, which may or may not exist if we haven't run this before. We will fetch using HMGET as well as the current time, which is a value of a command within Valkey. The first step is to refill the bucket. We check to see if there was an update time, and if that exists, then we'll go ahead and calculate the difference in terms of credits. We refill the bucket as the first step, and this means you don't have to be running this periodically. You can just run it any time that you need.

It will refill the bucket up to the maximum size. If the bucket is already at the maximum size or beyond that, we'll reset it to the maximum capacity. Then we need to determine if the request is allowed. So if we have enough tokens, if our current token count is more than the number of requested tokens, then we will subtract and allow the request. Otherwise, it will be disallowed and the number of tokens will not be updated.

Then we'll update the data structure to represent the current number of tokens and the last update time. Lastly, we'll do some optimization here. We could keep this hash in the data structure forever and it will eventually get updated, but if the user goes away, we want to be able to reap these at some point in time. So we can use the TTL to expire this hash data structure at the point where it would have reached the ceiling of the bucket capacity anyway. We'll take the bucket size divided by the refill rate and expire it at that point in time.

With that, I want to thank you for coming, and we'll be available off to the side here to answer any questions you may have about Valkey or data structure modeling. Thank you so much.

; This article is entirely auto-generated using Amazon Bedrock.