Ingvar

Posted on Sep 18, 2020 • Edited on Oct 4, 2020

Database Migration in Production: Tips and Tricks

#database #csharp

Hello guys, today I gonna describe such complicated process as switching database on production server.

When and why is it needed

1) Sometimes currently used database stops to comply with all your requirements. It often happens when it reaches some critical size and starts to work much slower. If no performance improvements are available consider idea about changing your database.
2) Changing infrastructure. If you're using DynamoDb but decided to use Azure instead of AWS it's not an option to continue using your current db. You should switch to another database which could be used on Azure.

Used approach

Following approach was used many times and recommended itself as the most stable and clear one. It allows to migrate your databases w/o downtime what is often required. If long downtime is acceptable you can follow less strict and complicated way.
So approach is the following: add new database support into your code so it would be able to write to both destination databases and add some kind of dynamical flag. Based on this flag your app should decide what database(s) is being used for read and write operations at the moment. Start from using only old database (old mode), later switch to dual write read old mode so code starts to write to all databases. Reading from old database is used because new database is almost empty. After that you can start actual database migration, dual write will prevent possible race conditions w/o downtime. After that you can switch to dual write read new mode. In this mode you still write to both databases but use new one for read operation. You can live on this mode for some time if you want to be sure that all data is migrated correctly. Next mode is called new because it uses only new database for read/write. It's a final state of migration. Cleanup your old database code and job is done. Sounds easy? Let's go through it step by step

Initial implementation

Start implementing code before entering old mode. Your code should be able to read and write data from/to both databases. This example assumes you have something like repository pattern used. If you don't have a repository/unit of work patterns you still do following but it will be harder.

Here is an enum that shows all possible options:

public enum RepositoryMode
{
    Old,
    DualWriteReadOld,
    DualWriteReadNew,
    New
}

You can now create your repository for using new database. Lets say you have repository similar to this simple example:

public interface IRepository<T>
{
    T Get(int id);

    void Insert(int id, T data);
}

After that you can create AbRepository that reads/writes data to/from repository based on flag:

public class AbRepository<T> : IRepository<T>
{
    private readonly Func<RepositoryMode> _getRepoModeFunc;
    private readonly IRepository<T> _oldRepository;
    private readonly IRepository<T> _newRepository;

    public AbRepository(
        Func<RepositoryMode> getRepoModeFunc, // if restart is an option you can inject static configuration, options pattern is also could work here
        IRepository<T> oldRepository,
        IRepository<T> newRepository)
    {
        _getRepoModeFunc = getRepoModeFunc;
        _oldRepository = oldRepository;
        _newRepository = newRepository;
    }

    public T Get(int id)
    {
        var mode = _getRepoModeFunc();

        // we read data from current active database
        return mode == RepositoryMode.Old || mode == RepositoryMode.DualWriteReadOld
            ? _oldRepository.Get(id)
            : _newRepository.Get(id);
    }

    public void Insert(int id, T data)
    {
        var mode = _getRepoModeFunc();

        if (mode != RepositoryMode.New)
        {
            // write data to old db if needed
            _oldRepository.Insert(id, data);
        }

        if (mode != RepositoryMode.Old)
        {
            // write data to new db if needed
            _newRepository.Insert(id, data);
        }
    }
}

This class could be more complex and process writing errors etc.

Old mode

After finishing code deploy it to production in old mode. App should work as previously, only old database is used.

Dual write read old mode

In this mode app starts to write to new database too. It's useful if you can't do a downtime during migration because it prevents possible race conditions. Also you can check your write performance to new database on this step. I recommend to start migration right after switching to this step (of course don't forget to check that write to new database works as expected).

Migration

It's the hardest and the most interesting part of whole process. For migration you should write migrator program. It's mostly console application that reads data from old database and writes it to new. But good implementation is hard and not so simple. Here are things that you should do in your migrator:
1) Read/write balancing. You shouldn't read faster than you can write to new database. Also I don't recommend to read too fast because it could affect production performance. Check your read speed and read items per second(minute) is above some limit pause reading for small amount of time. Same with write operations. In write operations I recommend to write in few (limited) count of threads, it speeds up migration process.
2) Use replica for read from old storage if possible
3) Add retries everywhere. If operation fails it should be executed again after some timeout and recorded to log. It prevents you from loosing any data during migration procedure.
4) Add pausing option. If possible you should be able to stop migrator and continue your migration from saved point later. I achieved this using reading by index. You can save your latest write position (note, write to new database position, not read from old database position!) into JSON file and read it on next start.
5) Split read and write code. Read and write operations should be separated in your code and shouldn't know anything about each other. I did it following way: moved read and write to different classes. Read class writes to BlockingCollection instance and write code reads from it and submits data to new storage.

Migrator high-level design
6) Don't afraid to modify your entities from source db before writing to destination db. You can fix bugs/issues with entities during migration process, skip empty items or use completely different model in your new code. That's fine.
7) Try to reuse part of your code for migrating different entities. Create generic base classes for read/write.
8) Use write with version if possible. If your source database has records versioning option you can read item before write (don't read one by one, do bulk read instead) and try to put them with incremented version. It prevents possible race conditions. If race condition occurred you should process it in your code (retry operation or cancel if dual write mode did job).
9) Monitor databases load during migration for safety.
10) Print some statistics to console/log file. You will be able to see progress of operation in this case.

Dual write read new mode

After migration both databases have up to date data so you can finally start to use new database for read operations. This mode is required because it gives you time to check if migration was successful. It's still an option to rollback to dual write read old mode if something went wrong and do migration again if needed. I recommend to live with this mode in production for a while before switching to next mode.

New mode

In new mode only read/write to new database is used. Old code still exists but no more called. Note that you can't rollback to previous mode now because your old database has outdated data after switching to this mode so please double check that you're ready to switch to this mode.

Cleanup

After all other steps done don't forget to cleanup obsolete code that still uses your old database. It's a common mistake when people forget to do that. Migration process is finished only after this step.

Thanks for reading, hope it was useful for you. In next post for experts I will write about monitoring your services using Prometheus. Stay tuned!

Oldest comments (2)

Jonathan Marbutt • Sep 19 '20

Great job, I am planning a very similar process and this is very close to what I am doing also. Great job!

Jonathan Marbutt • Sep 20 '20

The only difference I am working on going from RDMS to Azure's CosmosDB which is document db. So I am changing my repositories to read the RDMS into a model that will eventually be saved in the document db. This way when I do dual write, they will be writing from a better model. Or at least that is what I am thinking through.