ScaleGrid for ScaleGrid

Posted on Nov 13, 2019 • Edited on Dec 26, 2019 • Originally published at scalegrid.io

Making it Easier to Manage a Production PostgreSQL Database

#production #database #sql #postgres

The past several years have seen increasing adoption for PostgreSQL. PostgreSQL is an amazing relational database. Feature-wise, it is up there with the best, if not the best. There are many things I love about it - PL/ PG SQL, smart defaults, replication (that actually works out of the box), and an active and vibrant open source community. However, beyond just the features, there are other important aspects of a database that need to be considered. If you are planning to build a large 24/7 operation, the ability to easily operate the database once it is in production becomes a very important factor. In this aspect, PostgreSQL does not hold up very well. In this blog post, we will detail some of these operational challenges with PostgreSQL. There is nothing fundamentally unfixable here, just a question of prioritization. Hopefully we can generate enough interest in the open source community to prioritize these features.

1. No Automatic Client Driver Detection of Master Failover

The PostgreSQL client driver does not automatically detect when there has been a master failover (and a new master has been elected). In order to work around this, administrators have to deploy a proxy layer on the server-side. The popular choices are DNS mapping, virtual IP mapping, PgPool and HAProxy. All of these options can be made to work well, but there is a significant additional learning and administrator effort required. In the case where a proxy is introduced in the data path, there is also a considerable performance impact. This is a standard built-in feature in many of the new NoSQL databases, and PostgreSQL would do great to take a leaf out of their books when it comes to operations.

2. No Built-In Automatic Failover Between Master & Standby

When a PostgreSQL master fails, one of the standby servers needs to be elected to master. This mechanism is not built into PostgreSQL. Typically, third-party software tools like Patroni, Pacemaker, etc. are used to handle this scenario. Why not have this built it into the server? These third-party tools look deceptively simple, but it requires considerable effort, knowledge, and testing on the part of the administrator to get this right. By building this into the database, you are doing an enormous favor to your database administrator.

3. No Zero Downtime Major Upgrade

It is not possible to upgrade your PostgreSQL database from one major version to another without downtime. You essentially have to shut down all of your servers and use pg_upgrade to upgrade your data to the newer version. The downtime is not large since there is no data copy involved, however, there is still downtime. If you are running a 24/7 operation, this might not be an option for you.

With the release of logical replication, we do have an alternative option for online upgrade.

Build a brand new PostgreSQL Master-Standby setup with the new version.
Setup logical replication to replicate from the older version to the newer version.
Once you are ready, change your connection string to point from the older setup to the new setup.

Again, this can be made to work, but the overhead is enormous. Ideally, what is needed here is a way to upgrade in-place in a rolling fashion over a master standby setup. MySQL upgrade allows you to upgrade your slaves in-place to the new version and then trigger a failover.

4. No In-Place VACUUM FULL

Autovacuum/VACUUM is very useful and helps address this issue to an extent. You should regularly examine the bloat on your tables to ensure your autovacuum settings are appropriate and working well for your table. However, autovacuum does not go all the way - it doesn't actually end up merging and deleting the pages. If you have a large number of updates, insert and delete workloads, your pages will end up fragmented, affecting your performance. The only way around this is to run VACUUM FULL that will basically rebuild all the pages to eliminate fragmentation. However, this process can only be done with downtime - your table is down for the duration for the VACUUM FULL. For large datasets, this can end up taking several hours and is not a practical alternative if you want to run a 24/7 operation.

Note: The community has already started work on the zheap storage engine that overcomes this limitation.

If there are other improvements that you think would be useful, feel free to leave a comment on our blog post.