Husni Adil Makmur

Posted on Dec 12, 2020 • Edited on Dec 16, 2020

Cache-Aside Pattern

#go #mysql #redis #cache

Introduction

When you develop an app that uses database as storage, i.e. MySQL, you probably need to use a cache layer to help your application serve data more quickly, especially for serving data that's not changed too often. Whether you stored it in application memory or external cache server, i.e. Redis. This is called cache-aside pattern.

When your app receives a request, it'll try to find the data in the cache layer first.

If the data is in the cache (cache hit), it uses that data instead of requesting to database.
If the data is not in the cache (cache miss), it will get the data from database.
Write the copy of data that's obtained from the database into cache layer.

Cached data lifetime

When you're using this strategy, you need to define the lifetime of cached data by defining TTL (Time to live). You need to set the TTL value correctly for getting the most of caching benefit. If you set the period too short, your app may hit your database too often and this could increase latency. In the other hand, when you set the period too long, your app may be in the inconsistent state since it serves stale data. There's no one correct setting for all cases, so you need to find a correct setting that suits your use case.

Time to live (TTL) or hop limit is a mechanism which limits the lifespan or lifetime of data in a computer or network. TTL may be implemented as a counter or timestamp attached to or embedded in the data. Once the prescribed event count or timespan has elapsed, data is discarded or revalidated. In computer networking, TTL prevents a data packet from circulating indefinitely. In computing applications, TTL is commonly used to improve the performance and manage the caching of data.

View on Wikipedia>

Data eviction policies

In some cases, you may be considering and using the data eviction policy in case the cache storage is full. This is very useful when you need to store many objects in the cache on a limited amount of cache storage/memory. There are many policies out there, for example FIFO, LIFO, LRU, ARC, etc. You may choose one that suits for your needs.

In computing, cache replacement policies are optimizing instructions or algorithms which a computer program or hardware-maintained structure can utilize to manage a cache of information. Caching improves performance by keeping recent or often-used data items in memory locations which are faster, or computationally cheaper to access, than normal memory stores. When the cache is full, the algorithm must choose which items to discard to make room for new data.

View on Wikipedia>

If you choose to use Redis, you might want to read the Redis Data Eviction Policies as well.

Another additional mechanism you might need to implement is to manually evict corresponding cached data when there's a write in the database for a particular data.

In-memory cache vs external cache

Choosing caching strategy is always depending on the use case. Both strategies (in-memory cache or external cache) have their own strength.

Using In-memory cache

Pros

Cheap, no need to spawn cache server instance.

Cons

Different cache state when your application has two or more instances, since every instance will have their own managed cache data.

Using external cache

Pros

Shared state, means consistent cached data accross multiple app instances.
Your app doesn't need additional memory to hold all cached data.

Cons

Might be expensive. You have to spend more bucks for having external layers, i.e. Redis servers/clusters.
Additional effort if you manage the servers yourselves.
Network latency.

When to use

There is no silver bullet on solving app performance problem. This is being one of many patterns you can use. All is depending on the use case. You might consider this pattern when:

Your database doesn't have built-in cache layer
Your existing cache layer doesn't have a built-in read-through or write-through operation to the database

But then again, you need to look up on your use case.

Caching static data

If your app only depends on static data, it may not be suitable for using cache-aside pattern. Instead, you may consider:

Loading the data on app startup if possible.
Setting the cached data to never expire.

Cache-aside pattern in Go

Let's go with some coding in Go to demonstrate this pattern.

Since this is a simulation, We're not actually using a real MySQL or Redis instances. Instead, we'll have some simulation. And by the way, this is a very simple example that demonstrate how cache-aside pattern works.

First of all, we need to have an interface that you're gonna use it for simulating a repository that we can implement it later as a MySQL and Redis repositories.

type Repository interface {
  DoAnExpensiveQuery(id string) (*string, error)
}

We have one method here. Both MySQL and Redis implementation should implements this method.

Now we're gonna create the MySQL repository that's responsible for handling communication with MySQL.

package repository

import (
  "errors"
  "fmt"
  "sync"
  "time"
)

type MySQLRepository struct {
  mysqldb sync.Map
}

func NewMySQLRepository() Repository {
  return &MySQLRepository{
    mysqldb: sync.Map{},
  }
}

// demonstrate an expensivev query or frequently accessed query
func (m *MySQLRepository) DoAnExpensiveQuery(id string) (*string, error) {
  if rawData, ok := m.mysqldb.Load(id); ok {
    data := rawData.(string)
    return &data, nil
  }
  return nil, errors.New("record not found")
}

So here we're using sync.Map package to simulate the mysql client. Instead of reading from and writing to real database, we're actually reading from and writing to the Map.

And now, the Redis part.

package repository

import (
  "fmt"
  "time"

  "github.com/bluele/gcache"
)

type RedisRepository struct {
  redis gcache.Cache
  mysql Repository
  ttl   time.Duration
}

func NewRedisRepository(ttl time.Duration, mysql Repository) Repository {
  return &RedisRepository{
    redis: gcache.New(100).LRU().Build(),
    mysql: mysql,
    ttl:   ttl,
  }
}

func (r *RedisRepository) DoAnExpensiveQuery(id string) (*string, error) {
  rawData, err := r.redis.Get(id)
  if err == nil && rawData != nil {
    // if the data is in redis, return it
    fmt.Printf("Cache hit for id: %s\n", id)
    data := rawData.(string)
    return &data, nil
  }

  if err != nil {
    // in case of error, do not return
    // have a try reading from database
    fmt.Printf("Cache miss for id: %s\n", id)
  }

  // if the data is not in the cache yet,
  // get it from database
  result, err := r.mysql.DoAnExpensiveQuery(id)
  if err != nil {
    return nil, err
  }
  // and eventually store the value to cache
  go func(result string) {
    r.redis.SetWithExpire(id, result, r.ttl)
  }(*result)

  return result, nil
}

This part is more complex than the MySQL. The Redis repository depends on MySQL repository as the source of truth.

So basically this adheres to our previous diagram. Let me paste it here so you don't need to scroll up.

When your app receives a request, it'll try to find the data in the cache layer first.

If the data is in the cache (cache hit), it uses that data instead of requesting to database.
If the data is not in the cache (cache miss), it will get the data from database.
Write the copy of data that's obtained from the database into cache layer.

And then we'll have a main function to call them. Notice that we're using decorator pattern to achieve this.

package main

import (
  "fmt"
  "time"

  "github.com/husniadil/cache-aside-pattern/repository"
)

func main() {
  ttl := time.Second * 3

  repo := repository.NewMySQLRepository()
  repo = repository.NewRedisRepository(ttl, repo)

  id := "2c1b7cd2-0420-4b73-a3f9-734504842fb9"

  var names []string
  for i := 0; i < 5; i++ {
    name, err := repo.DoAnExpensiveQuery(id)
    if err != nil {
      fmt.Printf("Error loading [%s]: %s", id, err.Error())
      continue
    }
    names = append(names, *name)
  }

  // wait for cache expiration and we'll see a cache miss
  fmt.Printf("-------------- waiting cache expiration --------------\n\n")

  time.Sleep(ttl)

  for i := 0; i < 5; i++ {
    name, err := repo.DoAnExpensiveQuery(id)
    if err != nil {
      fmt.Printf("Error loading [%s]: %s", id, err.Error())
      continue
    }
    names = append(names, *name)
  }

  fmt.Println("Result:", names)
}

It prints:

Cache miss for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Cache hit for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Cache hit for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Cache hit for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Cache hit for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

-------------- waiting cache expiration --------------

Cache miss for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Cache hit for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Cache hit for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Cache hit for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Cache hit for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Result: [Husni Husni Husni Husni Husni Husni Husni Husni Husni Husni]

At the first hit, since we don't have a cached version of the data, it hits the database, and subsequent calls will hit the cache layer.

All code above is hosted on GitHub. I added some additional logs to simulate the latency.

You can browse the code or run the simulation here.

Cache stampede problem

There is another beast to tame when you're working on cache-aside pattern, it's the cache stampede problem. If we quote it from Wikipedia, it says:

A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under a very high load. This behaviour is sometimes also called dog-piling.

View on Wikipedia>

So, for a real world example, let's say we have an e-commerce site, and we're sending some promotions directly via push notification. It means that most of our users are clicking the notification on their phone directly at the same time.

Let's say we'll have 1000 users opening the app at the same time, at the same second. Just a little time prior to the traffic burst, our cache layer is on a cold state, it means that some of the cached data are already expired. In this scenario all sudden traffic is likely hitting the database directly -- cache miss. The database is under very high load at this time.

Our code above does not withstand the cache stampede when we're simulating parallel execution.

Let's change the code to simulate parallel execution.

import (
  // ............
  "sync"
  // ............
)

func main() {
  // ............
  // ............

  id := "2c1b7cd2-0420-4b73-a3f9-734504842fb9"

  var wg sync.WaitGroup
  wg.Add(5)
  var names []string
  for i := 0; i < 5; i++ {
    go func(id string) {
      defer wg.Done()
      name, err := redisRepository.DoAnExpensiveQuery(id)
      if err != nil {
        fmt.Printf("Error loading [%s]: %s", id, err.Error())
        return
      }
      names = append(names, *name)
    }(id)
  }
  wg.Wait()

  fmt.Println("Result:", names)
}

It prints:

Cache miss for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9
Cache miss for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9
Cache miss for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9
Cache miss for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9
Cache miss for id: 2c1b7cd2-0420-4b73-a3f9-734504842fb9

Result: [Husni Husni Husni Husni Husni]

You can browse the code or run the simulation here.

As you can see, all of them is cache miss. Our MySQL database is now working hard to serve all the same data. So how can we solve it?

There are some different ways to handle this problem, I'll cover it in the next posts.

I hope this article is clear enough and easy to digest. If you have some feedbacks, let me know in the comment section below.

Cover image: Tyler Daviaux on Unsplash

Top comments (8)

David Kröll • Dec 12 '20 • Edited

Hi, very nice article with real background knowledge. But I think you missed the biggest advantage of the (self-written) in-memory cache: real hard performance. There are no blocking I/O connections, networking syscalls etc.

I've also developed such a system (was never real productive) where I implemented an in-memory cache. I think in such an example, you pointed out, like just fetching data from a database there is no real need to have an external caching service deployed. By optimizing and adding more memory to the MySQL instance you may also achieve near the performance like with redis (with less complexity). You just have to make sure that the cache (query and results) inside MySQL is big enough to fit as much pages in it. You would not have to deal with TTL, cache hit/miss and so on. Furthermore you would also have the single instance of thruth.

Of course you are welcomed to read about the results I managed to achive with my in-memory cache (also in Go). I've also conducted much performance evaluations.

davidkroell / shortcut

URL Shortner in Go

shortcut

Shortcut is an URL Shortener written in Go. It's fully featured with a JSON API and an in-memory cache.

Benchmark

For an URL Shortener it's very important to satisfy as most request per seconds as possible Therefore, a benchmark is conducted I am measuring the speed of shortcut with wrk 4.1.0 in a VMWare Workstation 15 virtual machine In order to find out what slows down the application, various approaches were pursued.

Host	Virtual Machine
OS	Windows 10 (1803)	Fedora 29 (Linux 4.20)
CPU	Intel i7-7500 Dual Core	Intel i7-7500 Dual Core
Memory	16GB	4GB
Go version	go 1.12 linux/amd64
Database	MySQL 8.0.15 Community Server

Original

The original application writes every request to the database (logging table). Furthermore, any request is logged on stdout. This code uses UUID as primary keys.

$ wrk -c 256 -d 5s -t 48 http://localhost:9999/asdf
Running 5s test @ http://localhost:9999/asdf
  48 threads and 256

…

View on GitHub

Husni Adil Makmur • Dec 12 '20

Hi, thanks for the feedback.

I think in such an example, you pointed out, like just fetching data from a database there is no real need to have an external caching service deployed.

Yeah, I might put a very simple example that looks overkill to implement cache-aside pattern for such problem space, you're right. But I did this for the sake of simplicity. All comes back again to the real use case.

By optimizing and adding more memory to the MySQL instance you may also achieve near the performance like with redis (with less complexity). You just have to make sure that the cache (query and results) inside MySQL is big enough to fit as much pages in it. You would not have to deal with TTL, cache hit/miss and so on. Furthermore you would also have the single instance of thruth.

By tuning up MySQL, it should be able to withstand more traffic. But I think there are also trade-offs, for instance we need to aware the maximum numbers of database connections, and need more bucks for better machine specs.

I believe there's no silver bullet to handle this problem. There are some benefits when we separate the read model for this, for example if there's a heavy locking on MySQL during writes, the customer-facing app will still serves quickly on a warm cache situation, versus no cache at all.

Of course you are welcomed to read about the results I managed to achive with my in-memory cache (also in Go). I've also conducted much performance evaluations.

This is interesting, I might miss on finding on how you handle cache invalidation when there are some data being updated (not inserted) into the database during heavy reads. Will the app serve stale data for a longer period of time until no one is accessing the endpoint?

David Kröll • Dec 12 '20

This is interesting, I might miss on finding on how you handle cache invalidation when there are some data being updated (not inserted) into the database during heavy reads.

Thanks for the reminder, I maybe missed this out (this piece of software is already two years old). But the idea of course was to update the cache first and afterwards the database.

Will the app serve stale data for a longer period of time until no one is accessing the endpoint?

I do not fully understand you question, but I've created a manager goroutine which cleans up the cache based on the configuration here:
github.com/davidkroell/shortcut/bl...

Husni Adil Makmur • Dec 12 '20

I do not fully understand you question

When a particular data it's still being accessed by many users in specific time range, that specific data will still be in the cache, although there's an update data request coming up from another endpoint. Thus new users is getting stale data (old data prior update).

Eventually, after no one is accessing the data for a period of time, the data will be deleted by the cache manager. CMIIW.

Aldira Putra • Dec 12 '20

Nice article mas 🎉
Some suggestions, please explain more on the data eviction policy, for example pros and cons of each method, sample usages and so on.