First steps to caching PostgreSQL queries with Redis

#postgres #redis #database #sql

Is Redis the instant silver bullet to solve all of your app’s performance issues? Yes. </end>

Except, of course: it isn’t. Introducing Redis to cache your database layer can certainly boost your performance. But the caveat, like everything we do in development, is that half the battle is in planning your strategy.

The old joke goes that there are two problems in computer science: cache invalidation and naming things. Like many old jokes, there’s more than a hint of truth there. Caching has the potential to solve bottlenecks and squeeze more performance out of your existing Postgres instance. However, there are some basics that you need to get right.

Before we dive into the nitty-gritty, let’s have a quick refresher on both PostgreSQL and Redis.

PostgreSQL: the truth

If you’re using Rails, Laravel, or some other web development framework that has its own ORM, it’s easy to forget about the engine behind your data.

Let’s get this out of the way: PostgreSQL is fast but its primary job is to be right. PostgreSQL is your app’s undeniable source of truth. Whereas some NoSQL databases will serve a stale version of a record if doing so saves time, PostgreSQL only ever gives you the most up-to-date data.

Where Postgres can become slow is when its power is misused. “Powerful” is often a weasel word when it comes to tech, but in the case of PostgreSQL it’s an accurate description. With Postgres, you can structure and query your data in almost infinite ways. It gives you immense power over your data but with that power comes a hundred and one ways to trip yourself up.

The two big gotchas are data modelling and query design. Web framework ORMs go to some lengths to optimize both but they also need to be generalists. Once you get to the point of noticing the speed of your Postgres instance, it might be that you’ve hit the limits of how well a generalist ORM can handle your app’s data needs. Or perhaps you already use hand-written queries but they’re not as performant as they should be.

Either way, introducing caching can help you offset problems brought about by inefficient data models and queries but, unless you fix them, your database will only ever be working against the overall efficiency of your app.

Redis: the accelerator

If PostgreSQL is the truth, then Redis is the speed. You probably know Redis already as an in-memory key-value store. Perhaps you’ve used it to store session data and similar ephemeral data. As it’s primarily in-memory, Redis is ideal for that type of data where speed of access is the most important thing. So, that makes it great for caching, right?

Yes—and there’s more to Redis. Not only does it give you a bunch of different data types but it also persists to disk.

So, Redis is the truth, too? It can be if you want it to be. And this is why caching PostgreSQL with Redis can be a bigger discussion than you might expect.

Three ways to use Redis with PostgreSQL

So, this gives us three different ways to use Redis:

Read-through: off-set reads from Postgres by caching commonly accessed data in Redis.
Write-back: write data to Redis first and then push it out to Postgres later.
Cut out Postgres entirely: some data doesn’t need to be in Postgres at all and, instead, can live only in Redis .

How do you know which is right and when?

Choosing what to cache

There aren’t hard and fast rules to say what you should cache and what you shouldn’t but the following questions can help make the decision:

Does this query impact user experience? If a query blocks rendering of the UI, for example, then it’s likely to be a good candidate for caching.
Does it matter if this data is slightly stale? Some data absolutely has to be the very latest; a bank balance, for example. Other data can be somewhat old without any negative effects. If it can be stale, it’s a good candidate for caching.
Will this query grow with usage? As your app grows, are there queries that will run more and more?
Do I need to run complex queries on this data or do I only ever pull it straight back out of the database in the same way it was written? If you’re not using the power of PostgreSQL’s querying on the data, then maybe it doesn’t need to be stored there.
Could I stand to lose this data? Even if you’re using Redis’s persistence, it is still memory-first. So, if there’s data you could stand to lose, perhaps it’s best to write it to Redis first and then, optionally, write it out to Postgres later.
Is there a specialized tool that would do a better job? For example, logs. Logs have high write throughput, which isn’t PostgreSQL’s strong point, and there are specialized logging tools that will do the job better.

And, don’t forget the one question you should ask before all of these: "could I rewrite the query in such a way that it runs better on PostgreSQL?" That’s not the end of the story, but you should fix fundamental problems before hoping caching will solve them.

Keeping the cache up to date

That old joke –– the one about cache invalidation –– is so on point because, once you start using a cache, you’ll spend a lot of your time thinking about how to ensure the data in your cache is up to date.

When you decide to refresh your cache depends on the tradeoff you want to make between the expense of updating the cache versus the risk of serving outdated data.

There are many cache invalidation policies to choose from but broadly speaking, you’ll be using something that looks like one of these:

Time to live: you ask Redis to automatically expire an item after a set time period, meaning your data layer will have to go to PostgreSQL to refresh the item.
Cache on write: for data with a high read rate and that changes infrequently, you can simply update the cache each time you write to PostgreSQL; this becomes expensive as write frequency increases.
Least recently used: this eviction method focuses on freeing resources, rather than keeping data fresh, by deleting from the cache data that isn’t getting read.

Those three are a simplification but offer a glimpse of how to think about keeping your cache fresh.

Taking the first steps

Getting started with Redis is pretty straightforward. You can tinker on your laptop and there are packages for Rails, Laravel, Django, and most major web frameworks to help you get going. Once you’re ready for production, a hosted solution like Heroku Redis can provide a low-ops way to put Redis into production without the overhead of maintaining the server.

Once you’ve experimented with caching, you’ll find that there’s so much more to Redis. Whether it’s Redis’s support for pub/sub messaging, advanced querying with Lua scripting, or creating a cluster for high availability, Redis can take a first-class position in your app’s data layer.