Matthew Revell for Heroku

Posted on Feb 3, 2020

Preparing your Postgres data for scale-out

#postgres #scaling #partitioning #data

App success is a double-edged sword. You’re getting the uptake you wanted but now you have new problems.

Those hacky bits of code you thought you could get away with suddenly become bottlenecks. But that’s nothing compared to what’s happening in your Postgres instance.

There comes a point, though, when you’ve optimized everything and yet you just can’t squeeze any more performance out of your database layer. That’s when it’s time to look at how to scale Postgres.

What type of scale?

There are two ways to scale a database:

Scale-up: you have one database instance but give it more memory, CPU, disk
Scale-out: you add more database instances.

Scaling up –– or vertical scaling –– is relatively easy. Add more CPU and, broadly speaking, Postgres can handle more concurrent connections. Add RAM and more queries will run in memory rather than paging out to disk. Add more disk and you have more capacity both for the data itself and indexes to speed-up querying.

That approach can only go so far. Not only are there limits to how large the database server can grow but you have an increasingly unwieldy single point of failure.

Scaling out –– otherwise known as horizontal scaling –– adds more database nodes that each take part of the workload. This takes more work than scaling up but means that you’re no longer piling more and more work onto a single database instance.

	Scale-up	Scale-out
Ease	Can be very easy, especially if using a hosted Postgres service	Takes up-front planning and ongoing maintenance
Effectiveness	Delivers a quick performance boost but with diminishing returns	Offers a long-term solution to scaling issues
Unique benefits	The entire database scales for both reads and writes	Parts, and maybe all, of your database can stay online even if an individual node goes offline
Gotchas	The database is a single point of failure	It’s easier to scale out for reads than writes The data model might need to change

Read-only or read-write

If you choose to scale-out, then there’s another decision you need to make straight away: are you scaling reads or both reads and writes?

A standard Postgres database can have just one primary node. That’s the node that accepts writes. However, it can have many secondary read-only nodes. If your app’s data pattern is read-heavy, that’s often an relatively easy model for scaling-out. Let’s say your app provides train times. Most database activity will be time table look-ups. In that case, secondary read-only nodes would be an ideal way to scale capacity.

If your app is write heavy, then a single writable node quickly becomes a bottleneck. In that case, you need to think about splitting the write-heavy parts of your database into separate partitions or “shards”.

Sharding data

The idea behind sharding data is pretty simple: you effectively split your database into multiple separate databases. And with multiple databases come multiple writable primary nodes.

Sharding isn’t unique to scale-out; it’s common to shard data even in scale-up scenarios, as splitting data can help reduce index growth and reduce write locking on a single instance. However, for most Postgres users, sharding is the only way to achieve a writeable scale-out database.

Either way, splitting your database into different partitions impacts how you think about your data. The good news is that there are common sharding patterns you can choose from. Which you select depends on the shape of your data and how you need to query it.

Four common sharding patterns

When you come to shard your data, you should start by looking at whether a commonly used sharding scheme suits your app. That way, you’ll find it easier to get help from peers and online.

The most common sharding schemes are:

Range
Vertical
Key-based
Directory-based.

Let’s look at each in turn.

Range-based partitioning

Let’s say you want to scale -out a database of customers. One simple way to shard the database would be by customer name. People whose name begins with A-F might be in partition number 1, G-L in partition 2, M-R in partition 3, and S-Z in the fourth partition.

This is easy to think about. You can take a look at your data and quickly come up with the ranges that make sense. However, it’s easy to get into difficulties. Naive ranges –– such as alphabetizing surnames –– can easily lead to one shard being much busier than others. For example, according to the US census, there were 2.37 million Smiths in the United States in the year 2000, 1.85 million Joneses, and 1.5 million people with the last name Williams. In our simple example above, that fourth partition would be home to two of the top three most popular names.

Looking more closely at the data we’re working with can help us to make nuanced decisions in setting up a range-based partitioning scheme. We might decide that S gets its own partition, for example, in order to handle all those Smiths.

However, the risk with range-based partitioning is that the scheme isn’t necessarily drawn from the shape of the data but instead from something that makes sense to us as humans.

Vertical partitioning

Take the example of an ecommerce site. There are natural distinctions between different types of data that only rarely need to come together. For example, product descriptions and warehouse employee data are distinct.

Vertical partitioning is where you take the natural divisions in your data as the lead for your sharding scheme. So, in your ecommerce example, we might have a shard for product information, one for shipping, another for competitor pricing, and so on.

As this scheme is based on the data itself, then sizing each shard is more obvious. Employee timesheets might need a far smaller node than, say, customer account data.

The main disadvantage, though, is that you will eventually get to a point where you’ve partitioned the data as much as makes sense. Let’s say your ecommerce site were particularly successful. You might have a shard dedicated only to customer records. How then should you further partition that data if you need to scale more?

Hash-based partitioning

The previous two schemes were based on things that make sense to humans. Hash based partitioning is effectively random. Rather than partitioning on some aspect of the data itself, you create a hash from the key of what you’re storing and that determines which partition to use.

In our ecommerce example, the key for a customer record might be their email address. Let’s say we have five servers. We perform a hash function on the email address that gives us the number 137692 and we find the partition where it should live by performing a modulo operation using the number of servers as the divisor. In this case, the modulo gives us 2, so that customer’s record should live in partition 2.

In effect, hash-based partitioning is random and so it’s less likely for hotspots to appear. Smiths, for example, would find themselves distributed across the cluster. Hash-based sharding can also be a solution to the problem of how to further scale a vertically sharded database.

The downside is that adding more servers to the cluster would require recalculating where each record should live, because it will change the divisor in the modulo operation.

Directory-based partitioning

This is the most complex scheme of the four as it requires an intermediary layer between your app code and the database. With directory-based sharding, that intermediary layer is a look-up service that implements another sharding scheme.

The advantage here is that you can change the underlying partitioning scheme without having to change your application code. It can also act as a bridge between a previous scheme and its replacement. In our hash-based example above, a look-up service could translate hashes created for a five node cluster into hashes that work in a newly enlarged ten server cluster.

One obvious disadvantage is that now you’re running not only a partitioned database cluster but also another service. That increases maintenance and adds a step on the path between your app and the data you need.

Sharding helps but it’s not all good

Sharding offers a way to scale-out a write-heavy Postgres database but there are disadvantages. Joins across shards become expensive and might not even be possible at all. You might even find you need to compromise one of the key advantages of a relational database and maintain multiple copies of the same data in different places.

There are other options. Redis can help offset the burden on your Postgres instance, for both reads and writes. And once you’ve taken that first step outside of Postgres, you might find that some aspects of your data are better suited to other tools, such as Elastic or Kafka.

Either way, scaling beyond a single Postgres instance takes preparation and a thorough understanding of how your app accesses your data.

Jukebox letters photo by Diomari Madula
Candy photo by vaun0815
Numbers photo by Nick Hillier
Drawers photo by Kolar.io

Top comments (2)

Masked Man • Jul 7 '21

You mentioned Redis in the final section, but how come? Should I queue every write operation and feed it to the database in a constant speed?

nexovec • Dec 28 '23 • Edited

I suspect that referred mainly to caching and batching read queries, as opposed to writes, because with writes you'd lose ACID - some of the operations might fail. With reads, your cache may be outdated, but it won't break your data