Ovid

Posted on Apr 5, 2021

Sane Database Change Management

#database

This happens all the time when dealing with new clients:

Me: How do you update the database?
Client: Check the 'sql' directory to see if there's new SQL in there. Run that.
Me: Manually? Are you serious?
Client: Yes. We're used to it.
Me: Um, OK. So how do you back out the change if there's a problem?
Client: There's hardly ever a problem, but we fix it by hand.

And things go downhill from there. I frequently meet clients with insane database migration strategies. The "dump SQL in a directory" for people to apply is an annoyingly common strategy. There's no clear way to roll it back and, you can't declare dependencies. If you're using a database like MySQL or Oracle, if it contains DDL changes, those aren't transaction safe, so they really should be in their own migration, but they're not. I even had one client where they emailed developers to let them know which SQL to apply.

A few clients have an in-house database migration strategy involving numbered migrations. It often looks like this:

...
213-up-add-index-on-address-state.sql
213-down-add-index-on-address-state.sql
214-up-add-customer-notes-table.sql
214-down-add-customer-notes-table.sql
215-up-add-sales-tax-sproc.sql
215-down-add-sales-tax-sproc.sql

That, at least, can allow devs to back out changes (but tricky if your DDL isn't transaction-safe), but it's amazing when you get to migration 215 and you have eight developers, four of whom need to make a database change and they're arguing over who gets number 216. Yes, I've seen this happen more than once.

With a naïve numbering strategy, you can't declare dependencies, you get numbering conflicts, you really can't "tag" a particular migration for deployment, and so on.

Or there are the migration strategies which allow migrations to be written in your favorite programming language. Those are often nice, but can't always leverage the strength of the database, often write very poor SQL, and make it hard for other teams not using the language to write migrations.

There's a better way.

Sqitch

The sqitch (pronounced "skitch", not "skwitch") Web site describes sqitch as:

Sensible database-native change management for framework-free development and dependable deployment.

That is, uh, quite the mouthful. But it's accurate. It has great documentation with tutorials for Postgres, SQLite, MySQL, Firebird, Exasol, Oracle, Snowflake, and Vertica. Out of the box, sqitch offers sane, easy-to-use database change management. Since it's both free and open source, it's also easy to hook into and customize, if needed.

sqitchers / sqitch

Sensible database change management

App/Sqitch version v1.4.2-dev

Release	Coverage	Database

Sqitch is a database change management application. It currently supports:

PostgreSQL 8.4+
YugabyteDB 2.6+
CockroachDB 21+
SQLite 3.8.6+
MySQL 5.1+
MariaDB 10.0+
Oracle 10g+,
Firebird 2.0+
Vertica 7.2+
Exasol 6.0+
Snowflake

What makes it different from your typical migration approaches? A few things:

No opinions

Sqitch is not tied to any framework, ORM, or platform. Rather, it is a standalone change management system with no opinions about your database engine, application framework, or development environment.
Native scripting

Changes are implemented as scripts native to your selected database engine Writing a PostgreSQL application? Write SQL scripts for psql. Writing an Oracle-backed app? Write SQL scripts for SQL*Plus.
Dependency resolution

Database changes may declare dependencies on other changes -- even on changes from other Sqitch projects. This ensures proper order of execution, even when you've committed changes to your VCS out-of-order.
Deployment integrity

…

View on GitHub

Sqitch 101

I won’t cover setting up sqitch. It's pretty easy and the tutorials handle it well. Instead, I'll explain how I get teams up-and-running quickly with it.

However, there is one small change I recommend. By default, in the current directory, sqitch will add deploy, revert, and verify directories. Your SQL will go into those directories. I prefer to minimize the number of top-level directories in my projects, so after I have things set up, I usually run a command similar to this (the following assumes PostgreSQL):

sqitch engine alter pg --top-dir sql

That tells the sqitch program that when you're using the PostgreSQL engine, create sql/deploy/, sql/revert, and sql/verify directories. Thus, you only have one top-level sql directory for managing your sqitch files.

And for the sake of what follows, we'll assume that we have a acme_test target that we run our tests against and that we've set that target to be the default (that will make more sense when you've read the docs).

Also, please note that my usage pattern is not quite the same as what's taught in the tutorials. Instead, it's designed to be an easy workflow for any developer to understand.

After you have sqitch set up and have your initial schema added to sqitch, you can add a change. The basic pattern is "create a branch", "add sql changes", "modify code as needed", "commit" and merge back.

To be more specific, let's say that you want to add a title column to the customers table (note that the name passed to the sqitch add command is arbitrary, but I recommend you pick a naming convention and stick to it).

Create a branch in your source control
Run sqitch add customers/title
Edit sql/deploy/customers/title and sql/revert/customers/title to add your sql
Run sqitch deploy to deploy those changes
Edit your code if needed
Run your tests
Commit your changes
Merge back to your main branch

(Note: for the first and last steps, if you're using git, see my easy git workflow)

When doing this, it's also a good idea to revert your the sqitch change(s) you've added and redeploy. This makes it easier to spot the case where your revert file doesn't properly revert the deploy:

sqitch rebase --onto @HEAD^ -y

Alternatively, if you're not comfortable with the rebase command:

alias bounce='sqitch revert --to @HEAD^ -y && sqitch deploy'

Every change goes through the same pattern. It's almost like working with git, with a steady queue of changes adding up. This is a simpler pattern than what is explained in the sqitch docs, but you don't have to explain reworking or rebasing changes. There are some presentations you might want to watch if you'd like to learn more.

Sqitch was also highlighted on FLOSS Weekly:

So What?

First and foremost, you get to write your database changes in SQL, not in some "DSL" that you're provided with. You can leverage the full power of your database. Often, I find SQL generators produce poor SQL, or simply won't produce the SQL that I need. When was the last time your DSL let you create optimizer hints?

However, if you do need more than just SQL, it's easy enough to write sqitch middleware to intercept the call to the database with your own wrapper:

sqitch engine add pg --client /path/to/my/middleware

I've done this for a client who wanted to write database changes using Percona's excellent (and free) pt-online-schema-change tool.

Also, what happens if you get a conflict with git and the sqitch.plan file? It's easy to have a bad rebase and fix it incorrectly. Internally, sqitch uses checksums to determine the changes you've applied and their order, so a bad rebase won't allow you to accidentally apply the wrong changes.

Conclusion

I've barely scratched the surface of what you can do with sqitch. It's amazing how many other database change management systems get this wrong. When I switch teams over to sqitch, most of their database development pain just goes away. If you're having trouble with database migrations, try sqitch. You won't regret it.

Cover Photo by Tobias Fischer

Top comments (3)

Brian Wisti • Apr 5 '21

I'll keep this handy. Folks have enthusiastically recommended sqitch to me, but their pitch always goes to how it resolves the most convoluted scenarios. Which is great, but doesn't give much of a small-path rationale or starting point. This covers that gap.

jime • Apr 7 '21

I thought I would try sqitch and your setup.
From the sqitch tutorial
$ sqitch init firstsqitch --uri github.com/USERNAME/firstsqitch/ --engine pg

Then your alter
$ sqitch engine alter pg --top-dir sql
Missing Engine "pg"; use "add pg db:pg:" to add it

Thought I try
$ sqitch engine add pg --top-dir sql
Created sql/
Created sql/sqitch.plan
Created sql/deploy/
Created sql/revert/
Created sql/verify/

It added 3 lines to sqitch.conf
$ cat sqitch.conf
[core]
engine = pg
# plan_file = sqitch.plan
# top_dir = .

[engine "pg"]

# target = db:pg:
# registry = sqitch
# client = psql

[engine "pg"]
target = db:pg:
top_dir = sql

Forward I go.

jime • Apr 9 '21

On my second try with project, I put the top_dir declaration on 'sqitch init' command

sqitch init secondsqitch --uri github.com/jimeUWOSH/secondsqitch/ --engine pg --top-dir sql

my repo has notes for the OS and DB user