Working with databases at a scale

#database #sre #postgres

Hey! This time I will cover the topic of unsafe migrations in more significant projects. If you have a relatively small project, then there’s nothing wrong with making some changes. Scale changes everything in working with databases. But let’s start from scratch - feel free to go to the topic you are interested in.

What is a database transaction?

Let’s assume we are playing a game with boxes. You can buy a package from me, but you can’t see what’s inside, okay? I am preparing everything for these boxes to be grouped, and only when the box is ready will you see it in the game. If it’s during the packing process - you don’t know about that box. After that, if the package matches your expectations (the size, the color, etc.), you should buy it. At this moment, I’m passing that box to you, and you can tell me if it’s okay or not. If, for instance, the box contains the same t-shirt you already have, you can return the package, but not that one box element. Everything or nothing - that’s the rule!

And… well, now let’s compare that shitty analogy to databases. The box is a transaction that may include a lot of different elements (SQL statements like INSERT, DELETE or UPDATE operations). So buying the box is an analogy of committing (saving, persisting) transactions into the database. These transactions are ACID (atomicity, consistency, isolation, and durability), so you can agree on committing everything or nothing. If you don’t agree with some box elements → you can discard the whole package - the same as the database is doing a rollback of your operations.

What is database migration?

Okay, so it's time to explain a database migration if you know what a database transaction is. When working with applications, we need to introduce changes to our databases. As Heraclitus (a Greek philosopher) said: "change is the only constant in life" - the same is in our applications. If you're creating any professional software, you should use migrations to avoid manual changes in your databases and keep the history of changes in a good way. Most (if not all) frameworks have tools built-in for managing database structure from the code. Thanks to these tools, we can write every structure change as a line of SQL queries or, as in many technologies - use their DSLs provided to describe what's changing. For instance, these transactions might look like this PHP Doctrine example:

<?php

declare(strict_types=1);

namespace DoctrineMigrations;

use Doctrine\DBAL\Schema\Schema;
use Doctrine\Migrations\AbstractMigration;

/**
 * Auto-generated Migration: Please modify to your needs!
 */
final class Version20220530204833 extends AbstractMigration
{
    public function getDescription(): string
    {
        return '';
    }

    public function up(Schema $schema): void
    {
        $this->addSql('ALTER TABLE blog.posts ADD identifier CHAR(10)');
    }

    public function down(Schema $schema): void
    {
        $this->addSql('ALTER TABLE blog.posts DROP identifier');
    }
}

Or like this Elixir Ecto example:

defmodule PostRepo.Migrations.AddIdentifierToPostsTable do
  use Ecto.Migration

  def up do
    alter table("posts") do
        add :identifier, :string, length: 10
    end
  end

  def down do
    alter table("posts") do
        remove :identifier
    end
  end
end

It looks much different, but that’s about using direct SQL queries (as in PHP Doctrine) or DSL provided by the framework (Elixir’s Ecto). As you can see, both contain sections up and down - the first is used when we’re deploying our changes, and the second is for rolling back changes when something happens and we have to revert changes. These migrations might be auto-generated by the framework or manually created, but the effect is the same - database changes described in code and kept in the same git repository.

What is a deployment unit?

Okay, so if we are on the same page, what transactions and migrations are - then we can cover the last important topic necessary to understand problems with migrations at a bigger scale. This topic is called a deployment unit. When deploying our changes into the production environment, they are usually closed together as one Docker image. This image may contain migrations code with the database changes you plan to introduce. The deployment might be composed of a minimum of one pull request merged. Sometimes, there are more pull requests - especially if you're working on a much bigger project or you are not doing continuous delivery, but you are doing releases on-demand. What is worth adding is that - the migrations come first into production; after that, your business code changes are available to the users. That has some drawbacks - that you'll see in the next part of this post.

Mistakes

Right now, I’ll cover some possible mistakes when you are working on a bigger project with more significant traffic. I will split errors into a few groups like common mistakes - no matter which database system you’re using - that’s a problem and PostgreSQL mistakes. So let’s move on and cover all tricky things.

Common mistakes

Adding a column with a default value

If you are using PostgreSQL version below 11, MariaDB below 10.3, or MySQL below 8.0.12, you should be careful when adding a new column with a default value. The reason is that doing an ALTER TABLE operation and adding another column will rewrite the whole table. This is because all changes will be executed on a table copy. Then after introducing all changes, it will apply them to the original one. So this operation might hurt for more extensive tables because we block incoming queries. However, an INSTANT algorithm in newer database engine versions will not stop incoming SQL queries, so it's not that bad.
Modifying or removing columns in the same deployment unit with your code change

As you already know from that point about deployment units - migrations are executed before deploying the code from a particular deployment. Because of that, you may encounter downtime if you are deploying a migration that will do some database structure changes and the code that is using the new version of the structure because - after executing migrations, the old code will still be looking for the old design, ouch!
Changing the type of a column

This problem is so widespread that it might be a prom queen. Well, you might ask - why? The situation is more difficult because - at first, we are deploying migrations (as you already might know from previous paragraphs), so all the old code will try to use the ancient structure (pre-migration). The second thing is... this ALTER will force the entire table to be recreated (copied into memory, changed, applied) in database memory. This will block all reads and writes operations in engines like PostgreSQL, and for MySQL and MariaDB, we'll stop all writes. That's horrible, isn't it? So, for instance, if you have a large database table with millions of records, changing column type will be so dangerous for you in terms of extended downtime. Remember: reliability is the essential feature of every application because not working application is just useless. So, instead of changing column type - consider adding a new column with the desired column type, write data to both columns, backfill old data from old records and prevent writing to the old column. Then you can drop the column. It sounds like a journey, but that's not a sandbox if you take care of an extensive database table. Do that slower but without any downtimes; it's a win!
Renaming database columns, tables

That's similar to the previous one with changing a column's type. However, renames may cause a lot of errors in your system, so it's much better to follow the last instruction to create a new table/column, write to both spaces, and iteratively get rid of the old one with an invalid name.

PostgreSQL mistakes

Not using concurrently built indexes.

PostgreSQL has the functionality to add indexes concurrently. Remember - by default, migrations are transactional, so when adding an index to an existing table, it's better to disable a transaction for a particular migration and run concurrently. Thanks to that, PostgreSQL will not lock mutations (INSERT, UPDATE, DELETE) as in non-concurrent CREATE INDEX operation. However, it's more expensive because the engine needs to execute two table scans, etc. More info.
Don’t set NOT NULL on an existing column.

If you are working on a more extensive database with higher throughput, adding NOT NULL to your column might not be the best option in PostgreSQL. Why? Because this operation blocks, reads, and writes to your table until each row will be checked if NOT NULL is possible to apply. Instead, it's better to add a check constraint that will ensure you're ready to set NOT NULL. Of course, this check shouldn't be validated yet. Instead, you can validate this constraint in the subsequent migration (the best deployment unit) and apply classically NOT NULL. This is a much safer way.

It all matters when it comes to the scale

Yep, all these rules and tips apply when struggling with a bigger scale. For small applications with small databases and not so much traffic - it doesn’t matter if you’ll go by the book or just by a shortcut. But when you are doing everything possible to keep the application working, you have a massive load on your system - you should consider these mistakes and try to avoid such situations. Of course, you’ll spend more time making database-related changes, but thanks to that, you’ll save money and your client's trust. That’s important - keep your system reliable, and you’ll sleep better. I’m not Matthew Walker (the author of the phenomenal book “Why we sleep”) to talk about the nature of sleep; just trust me - it’s better to make changes incrementally if we’re talking about databases.

DEV Community

Working with databases at a scale

What is a database transaction?

What is database migration?

What is a deployment unit?

Mistakes

Common mistakes

PostgreSQL mistakes

It all matters when it comes to the scale

Top comments (0)

Read next

Chat with News: Interact with your articles

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

Code Review Assistant using Open Source AI Challenge with pgai and Ollama

Best Redis Caching Strategy in Laravel: A Guide to Fast and Efficient Caching