I've finally finished migrating our 9.6 PostgreSQL database from Google Cloud SQL to 14 running on a VM under our control. This may not be its final home, but at least it has been liberated from the tyranny of the managed database.
I have less experience with AWS, and no experience with any other managed database solutions. However, based on my experience of Cloud SQL, I think that self managing a database like PostgreSQL will be my default, with managed databases being the exception rather than the rule.
There are some criticisms I have that are specific to GCP (Google Cloud Platform), and some that apply to any managed solution. Without further ado, here are the things I have disliked like about managed solutions, and Google Cloud SQL in particular.
A major point of having a backup is that it enables you to recover in the case of a disaster. One of the most far reaching/thorough disaster is your database is deleted, perhaps by accident or maliciously.
No problem, you might assume. A managed database solution provides backups, so just restore from backups -- you did enable backups, right?
Well, as it turns out, Cloud SQL deletes backups along with the instance. That is insane behaviour, and the bold red warning in the documentation does not do it justice:
Warning: All data on an instance, including backups, is permanently lost when that instance is deleted. To preserve your data, export it to Cloud Storage before you delete it. The Cloud SQL Admin role includes the permission to delete the instance. To prevent accidental deletion, grant this role only as needed.
And what if your database is actually accidentally deleted? You scramble against the clock and hope you can find it hidden away.
It is possible to do exports as described above. I'd strongly recommend automating such an export if you can, to ensure you have a dump in case of a disaster of this magnitude. However, these are full dumps of the database, and so hard to do regularly enought to reduce the amount of data lost.
This wouldn't be so bad if you could better manage backups...
Backups are limited. These days, pgbackrest is the go-to backup solution for PostgreSQL, and having used it I am very impressed so far. It provides full backups, differential, and incremental, as well as archiving of WAL segments for point in time recovery. It allows great flexibility in schedules and destinations for backups, how long to keep backups for, how many full backups. For example, you can have backups made to a local disk, and other backups to an external S3-compatible bucket, each with their own settings and schedules (e.g., scheduled via cron).
In Cloud SQL, you can enable point in time recovery, you can run ad-hoc backups, and you can schedule them with limited scheduling options. But if you ever delete your instance, those are all gone. Your scheduled backups and point in time recovery options are, it seems, linked to that instance that are promised to be wiped if you delete it.
If you want to have some external backups, well, as I described above, you're looking at a full dump each time
Want to run backups to an external location, like your own server or backups? Then you're back to the export option, where it does a full dump of the database each time. It's not great, but it's better than having all your eggs in a basket that can get deleted all at once.
There might be options around replicating to a database under your control, so that you can run better backups on that replica. I had troubles, which I describe in part below. Perhaps the story is better with more recent versions of Cloud SQL, but I won't be testing -- logical replication wasn't built into PostgreSQL until version 10, so I was relying on Google's support for pglogical.
If you could pull off a logical replica to a database you manage yourself, then you could arrange solid backups from that replica. However, if you get to the point where you're using replicas in order to better manage backups, then it might be worth considering where the value of a managed solution is for you.
We've been stuck on 9.6 for a while, watching major releases tick by. Our database wasn't particularly large, maybe 200GB at peak, which is by no means a large database compared to the scale that some operate at. However, even at that size, it's still problematic to do upgrades.
The problem is that the managed database offers no way to perform upgrades. The officially recommended method:
To upgrade to a later version, consider using the Database Migration Service or exporting and then importing your database to a new Cloud SQL instance.
The Database Migration Service worked, with some effort, when I tried to use it. It used logical rather than physical replication, allowing us to have the replica on a newer version -- specifically, the pglogical extension because we were on 9.6. The replica was set up and streaming changes, and running on the latest version, working well enough. However, I couldn't cut over immediately, and waiting a few days seemed to be a foolish decision. The sync stopped working, and I could not find any error messages or information to debug why. Re-creating the migration from scratch did not work.
Contacting support was of no help. When attempting to purchase support, the support button wouldn't work. I then attempted to contact support to get support in helping me to purchase support, whereupon I was unhelpfully directed back to the page that wasn't working.
That put us back to option 2: export then import database. The problem with this is that it involves significant downtime. The time to export the database, and then import, took hours.
Needless to say, we've been through that process, after significantly cutting down the database in size, to reduce the downtime. It was the perfect opportunity to move off Cloud SQL permanently, so that we won't be at the mercy of inscrutable cloud tools and restricted backup options.
Now we move to some of the other reasons why I lean away from managed databases overall. The first (but not most important) on the list is extensions. As part of my recent efforts to master my craft, including databases, I've become interested in the possibilities of extending the database. That includes using third party extensions, as well as developing my own. This is a theoretical criticism for now, as I have not yet had a need for this, but I can definitely foresee it as a possibility.
Managed databases do support a collection of extensions out of the box, but you cannot use any that are not on their list, including your own.
Since the database is not controlled by you, you have limited options for tuning its settings. By managing the database yourself, you have the option to change whichever settings you wish to match your operational needs.
If you are fortunate enough that your cloud provider makes upgrades easier, you are still limited to upgrading to only versions that they support. When you manage the database yourself, upgrades are in your control, and you can jump to a new version early if you need.
With Cloud SQL, you do not have access to the super user, and so your ability to manage the database is limited. With a self-managed database, you have full control.
Of course, by running a self managed database, there are absolutely downsides. If you want a new database, then spinning it up with a manged solution is very quick and easy to do. Managing it yourself, you'll have to do things like:
- Set up the VM and install the software
- Run OS updates
- Set up and test backups
- Set up replication yourself
- Clones for testing may be more difficult to make
- PostgreSQL out of the box settings are not great, and you'll need to tune these yourself
There are absolutely downsides to self management, but for projects of a certain significance, I believe the benefits become worth the effort. If for no other reason, when considering Cloud SQL, better control over backups, and easier upgrades, are hard to pass up.