Please Redis Responsibly

Molly Struve on January 17, 2019

Redis is all about speed. But just because Redis is fast, doesn't mean it still doesn't take up time and resources when you are making requests t... [Read Full]
markdown guide
 

There are only 2 hard problems in Computer Science: cache invalidation, naming things and off-by-one errors.

Are indexes guaranteed to never change once created? If Redis goes down and has to be rebuilt will the resulting KV store be the same? Basically, is it possible for the client cache to become stale?

While I sometime consume data from Elastic though other tools I've never had to actually use it and am not familiar with its implementation, so forgive me if these are "no, DUH" questions.

 

Great questions!

Are indexes guaranteed to never change once created?

Indexes can change, but it is not common. We shuffle data maybe once or twice a year depending on growth. For the most part, once an index is created it will not change. Sometimes we have a couple clients on a single index and a client will get too big so we have to move them to a new index. When we do that we use what we call index aliases. These allow us to name the indexes whatever we want and reference them by standard names.

For example: Say I have an index named index_a. If I want to store client 1, 2, and 3 on that index I would give it three aliases, client_1_index, client_2_index, and client_3_index. Then any time I make a request to client_1_index Elasticsearch knows I mean index_a. This allows us to move data around and when the data transfer is complete we simply point the alias at the new index.

If Redis goes down and has to be rebuilt will the resulting KV store be the same?

For Redis, we use Elasticache on AWS, so it has a replica in case our main instance fails which is helpful. However, if Redis did completely go down, we would still survive. The Redis values are populated by a request to Elasticsearch. If Redis went down that would mean rather than fetching the values from Redis, we would have to talk to Elasticsearch to figure out where to put the vulnerability. This means more load for Elasticsearch, which we would not want long term, but we could handle it for a short period.

Basically, is it possible for the client cache to become stale?

When we store the keys for the index_name for a client, we set an expiration of 24 hours on them. This just ensures that if a client gets removed we dont have keys hanging around forever. Otherwise, since we are just moving the aliases around and and those basically never change the cache's don't ever go stale. If for some reason we needed to update an alias we would manually bust the cache for that client.

Hopefully I explained that clearly! If not please let me know and I will take another whack at it 😃

 

So the cached data can change, but extremely rarely and you control when it changes. You have a method to patch the cache using aliasing. And even if everyone drops the ball you have an expiry rule of 24 hours so the worse case scenario is that some people would have the wrong data (or no data) for at most a day.
Cool, got it.
PR approved 😄

You got it! We have guards in place so its never wrong data(bc that would be bad!), worst case it would be no data 😊

 

Of course I am not familiar with the whole problem as you are. But, I am wondering if using aliases was enough in this case without the need of storing index names in Redis.

alias_user_1    |    alias_user_2   |   alias_user_3, alias_user_4
Index A         | Index B           | Index C
user_1          | user_2            | user_3,user_4

Then if you need a specific document you just request it using the alias and not the index name.

 

We actually store the alias names in Redis. The index names are never referenced anywhere in our application. We could have stored them in MySQL on each client but we choose Redis because it was fast and if we ever needed to update one it was as simple as just busting the cache.

 
 

Great overview! Also nice to see someone else using a very similar implementation to what I’ve used in the past for fairly long lived data - did you have to worry about hash tags / slots for multi-deletes in your cache busting strategy? I had some headaches with that when first moving to clustered mode...

 

We never ran into any issues with hashtags or multi deletes, we have run into issues with large key deletes but those are a thing of the past now with Redis 4 which will handle them async.

 

A cache within a cache within a cache. An inception of caches.

Good work! Quite a lot of $$$ saving as well ;)

 
 
 

Was talking with some Netflix engineers who said they use Redis waaay beyond it's intended use because it was just so easy and fast. LOLz!

code of conduct - report abuse