DEV Community

Mat
Mat

Posted on

How to manage a huge database schema with Liquibase

Over the last couple of weeks I've been working on introducing Liquibase to an existing project, so that our database schema is defined in code, and we can create copies of the database for development and testing.

This is something that often comes out of the box with a lot of web frameworks like Rails, Django, Dropwizard etc. but this is an older project where the database has grown over time, and changes have traditionally been managed by DBAs (database administrators).

The basic idea behind database migrations is that you should never change your database schema directly in the database; instead you write scripts to change it, and the tool applies that to the database in a controlled way so that databases are always in a known state, and changes are checked into git and reviewed just like regular code.

What this allows us to do

I want to get to the point where anyone can checkout the app and build their own database, and if anything goes wrong, they can always trash it and start again. At the moment we share a development database in AWS, and this is also where we run our automated tests. Nothing guarantees that this remains production-like.

If we adopt database migrations we could use a tool like CircleCI to automatically build a fresh database and run our tests against every pull request, which would give us some confidence that changes to the database haven't broken important functionality in the application, and vice versa.

Finally, if we use liquibase for production deploys, we can speed up releases and remove sources of human error such as missing out part of a migration or doing something differently on different environments.

How we structured the migrations

We chose liquibase because it's very flexible. We have a lot of stored procedures and liquibase has a runOnChange feature that allows us to automatically CREATE OR REPLACE these when their hash changes. So rather than writing a migration script for every change to these files, you just run liquibase update and it will apply the update.

With liquibase you can write all your migrations as SQL files annotated with comments, so you don't need to learn a special syntax and you can do anything that is possible in SQL.

The first thing we've done has been to export all the objects from our production DB, and get them into a liquibase migration format. We've split things into 3 directories:

  • initial_schema
  • code
  • schema_migrations

where initial_schema is all the tables, constraints and indexes we have right now.

Once we have this matching production we intend to switch to creating small, timestamped change migrations in schema_migrations.

code contains all the objects we use with runOnChange. This allows us to see the history of all our stored procedures using github.

Adding migrations to an existing project is hard

I think when we started this we understimated how much work it would be. On paper it sounded easy: export the schema, convert it to the right format, job done.

In practice, we spent a lot of time on:

  • getting our initial schema to truly reflect production (and checking this)
  • deciding exactly what's in scope (we have multiple users/schemas in our database and not all of them are needed by the app, so we had to figure out where the boundary of the app actually is)
  • working out what order Oracle will let us run things in without throwing errors

Early on, we noticed that Oracle will allow you to create objects which are "invalid" (for example, they reference other objects that don't exist). To protect against this we added a liquibase "precondition" that runs SELECT count(*) FROM all_objects WHERE status = 'INVALID' and halts the migrations if the count is nonzero.

There were also some gnarly bits of the database we deliberately excluded from the intial schema for the time being. These dealt with features like email sending and file uploads (all of which are implemented in the database). We expect that making these work in a sensible way locally would have be a lot of extra work. We can probably get most of the benefits without doing this bit, and we could always add in the missing pieces later on.

Making the app easier to work on

Ultimately, the point of this is to make it easier to work on the app. With a bit more work, we can remove the need for a shared development database entirely. We can write tests of our stored procedures that can be run at any stage of the development process. We can automate a whole deployment. All of this helps to reduce the time we waste on non-valuable tasks just to get something done.

Top comments (0)