loading...

A simple "Cache Invalidation" strategy, Part 2

vigzmv profile image Vignesh M ・3 min read

"There are only two hard problems in Computer Science: cache invalidation, and naming things."
— Phil Karlton

This is in continuation to my last post, A simple caching strategy for Node REST APIs, Part 1 in which we talked about implementing a simple cache middleware, it can be summarised in this flow chart above. If you have not read that post, please read it before continuing. That post ended with a question, "How can we do cache Invalidation?", we will explore this question now.

Alright, let's do this one more time.

Ques. Why did we need caching?
Ans. So that users could get data quicker.
Ques. Why would we need cache invalidation?
Ans. So that users get recent real-time data.

And what is Cache Invalidation?

Cache invalidation is a process in a computer system whereby entries in a cache are replaced or removed.

  • "Replaced": Cache is replaced by recently updated data
  • "Removed": Entire cache is removed.

Out these two ways, "Remove" is the easiest to implement, the cache is cleared and we let it be rebuilt with new data.

Cache Invalidation Strategy

This has only two steps, but the implementation can vary drastically from architecture to architecture.

  1. Find all sources from where data change can be triggered: Data could be changed through an API endpoint, a periodic task or from a trigger somewhere hidden deep inside your codebase. Your job is to find them all and
  2. Add a method to clear/update cache after the data is changed. Simple as that. 😅

Again, the exact implementation for this can vary, you may

  1. Add a cache clearing method at a low level like, $afterUpdate of your database model.
  2. Add it to every method that changes data. It all depends on the complexity of the application.

For this post, we will extend our cache middleware to handle invalidations too, let's see how that works. This example assumes that only way to change any data of the resource is by doing a POST request. We will add a new method called clear to our middleware.

// middlewares/cache.js

const NodeCache = require('node-cache')
const cache = new NodeCache({ stdTTL: 5 * 60 })

function getUrlFromRequest(req) {
    ...
}

function set(req, res, next) {
    ...  
}

function get(req, res, next) {
    ...
}

+ function clear(req, res, next) {
+   cache.keys(function(err, keys) {
+       if (!err) {
+           let resourceUrl = req.baseUrl;
+           const resourceKeys = keys.filter(k => k.includes(resourceUrl));
+           cache.del(resourceKeys);
+       }
+   });
+   return next();
+ }

module.exports = { get, set, clear }

And use it in our routes

// products/routes.js

router.get(
    ...
)

router.post(
    '/',
    productsController.create,
    cache.clear, // 👈
    responseHandler
)

And Done!

Whenever a POST request is made, the data would have been changed, thus we trigger a cache clear, so the cache can be rebuilt when the next GET request comes in.

What exactly is happening in the cache.clear?

// middlewares/cache.js

...
...

function clear(req, res, next) {
    cache.keys(function(err, keys) {
        if (!err) {
            // again, it depends on your application architecture,
            // how you would retrive and clear the cache that needs to be cleared.
            // You may use query path, query params or anything. 
            let resourceUrl = req.baseUrl;
            const resourceKeys = keys.filter(k => k.includes(resourceUrl));

            cache.del(resourceKeys);
        }
    });
    return next();
}
  • cache.keys returns all stored cache keys,
  • req.baseUrl has the basepath of the request, eg. 'products'
  • resourceKeys gets all the keys which have the base path as a substring. (Remember?, the keys were simply the GET URLs pointing to the resource)
  • cache.del clears the cache pointed by the keys.

For Example, if our cache had values with keys like

A POST request to /products/ will clear all these 👆 caches and they will be rebuilt when a new GET request comes in.

For my example, simply clearing all cache which had keys within the scope of the POST request's base path worked.

With this setup, theoretically, we could set our cache TTL to be infinite, because every change to the data will clear it and cache will always have the recent data. But for sake of sanity, we kept our TTL to 15 mins. Now our users always had the recent data, faster.

That's all for today. Happy Coding!

Follow me on Twitter | Github, I build and post cool stuff. 👨‍💻

Discussion

pic
Editor guide
Collapse
davidbradbury profile image
David Bradbury 🐧

Very clear overview! Good work.

Collapse
vigzmv profile image