DEV Community

Aleksei Ermilov
Aleksei Ermilov

Posted on • Updated on

Migrating a geodata microservice from MS SQL to PostgreSQL

Problem Setting

It is no secret that the topic of the transition of IT to technologies that do not require expensive licensing is becoming increasingly relevant.

After analyzing the available options, we decided to transfer our projects to technologies with open licenses, in particular .NET 6 and PostgreSQL, and this opened the way for us both to optimize application performance and reduce licensing costs.

In this article, I will discuss the migration path of geographic microservice from MS SQL to PostgreSQL with a focus on spatial data types.

We will omit the issue of the cost of licenses and direct comparison of MS SQL vs PostgreSQL, because this topic is very well covered in the DotNext report (on Russian with auto-subtitles) of my colleague, Stanislav Flusov. I recommend you to watch it!

Ways of Migration

Since data types and built-in functions in MS SQL and PostgreSQL often have their own features, you can not just take a backup/script of one DBMS and deploy it to another.

The task was further complicated by the need to transfer 150 GiB of spatial data (the GEOGRAPHY and GEOMETRY types from MS SQL) into PostgreSQL.

There are already ready-made solutions on the market that allow you to transfer the schema or data from MS SQL to PostgreSQL:

  • Babelfish by AWS
    • It is installed as a plugin for PostgreSQL and makes it compatible with the T-SQL syntax, making the client application think that it works with MS SQL.
    • At the same time, this solution was considered only as a way to quickly migrate to PostgreSQL, which has some limitations in terms of compatibility with the DBMS.
    • Babelfish does not yet support the spatial data types GEOMETRY and GEOGRAPHY.
  • AWS Schema Conversion tool
    • It is a utility for generating SQL scripts with database schema for a given DBMS based on another DBMS.
    • It migrates data through special data extraction agents.
    • The data migration is designed for the AWS cloud.
  • As a matter of fact, there are a number of various utilities for importing and exporting spatial data. For example, ogr2ogr.
    • This utility focuses solely on spatial data, and we would have to separate the processes of uploading these and other types of data, which would complicate the development.
    • Part of the process would look like this: MS SQL → geodata file → PostgreSQL.
    • It's a pretty thorny path, I think.

After reviewing the various options for utilities and tools for migrating schemas and data, as well as weighing the pros and cons, we finally decided to turn to the Entity Framework technology, which had already been used in the project, and invent our own bike migrator.

About the Geography Project

This microservice was created in 2016. It is based on .NET and uses MS SQL for data storage and the Entity Framework as the ORM.

Some time ago, it was decided to adapt this project to modern realities, including updating the versions of .NET and EF Core and, finally, transferring data storage to PostgreSQL.

Modern versions of .NET offer not only the free MIT license, but also a significant increase in performance (there is a whole series of articles) on this topic), along with new development features. At the same time, PostgreSQL offers data versioning "out of the box," effectively solving the problem of blocking the level of data manipulation (DML, subject to the normal functioning of the server), free High Availability up to the cluster, not to mention read only and logical replicas. In addition, we get full compatibility with the Linux stack and a developed community. After all, who doesn't love new technology? That's the way to go.

It is worth noting that Entity Framework Core is already a fairly mature ORM technology.

The components of the solution at the start of the move to PosgtreSQL:

The components of the solution at the start of the move to PosgtreSQL

  • Api: ASP .NET web API.
  • Api.Client
  • Business Logic Services: the business logic layer.
  • EF.Sql: the EF Core database context, database schema and data migrations, repositories.
  • Migration utility: a console utility for schema and data migrations. It is used for CI/CD.
  • Projects with API, services, and integration tests.

Technology stack

  • .NET Core 3.1
  • MS SQL Server 2016
  • Entity Framework Core 5
  • All entities are configured through Data Annotations.
  • ASP .NET Core 3.1
  • Autofac as a DI container

Project Refactoring

To begin with, several layers were extracted from the database context project:

  • EF entities as POCO classes, without any EF references or annotations/attributes that describe constraints, keys, and relationships between entities.
  • Interfaces

    • For the EF context.
    • To configure the EF context.

      • To read/write to the database; UnitOfWork is part of the company's unified Data Access Layer approach Interfaces for reading / writing to the database. Fragment of the Fortis.Common.DAL library:
      // Copyright © 2022 LLC "Fortis", "Monopoly" Group
      namespace Fortis.Common.DataAccessLayer.Contracts;
      
      /// <summary>
      /// Provides a query to get entities from storage
      /// Abstracted from specific DBMS and ORM
      /// </summary>
      public interface IDbReader
      {
          /// <summary>
          /// Provides a query to retrieve entities from a repository
          /// </summary>
          IQueryable<TEntity> Read<TEntity>() where TEntity : class, IDbEntity;
      }
      
      /// <summary>
      /// Interface for creating and modifying entities in storage
      /// Abstracted from specific DBMS and ORM
      /// </summary>
      public interface IDbWriter
      {
          /// <summary>Add new entity</summary>
          void Add<TEntity>(TEntity entity) where TEntity : class, IDbEntity;
      
          /// <summary>Modify existing entity</summary>
          void Update<TEntity>(TEntity entity) where TEntity : class, IDbEntity;
      
          /// <summary>Delete existing entity</summary>
          void Delete<TEntity>(TEntity entity) where TEntity : class, IDbEntity;
      }
      
      /// <summary>UnitOfWork for DbContext</summary>
      public interface IDbUnitOfWork
      {
          /// <summary>Asynchronously save all the context changes to Db</summary>
          /// <returns>The number of entities written to the database</returns>
          Task<int> SaveChangesAsync(CancellationToken cancellationToken);
      
          /// <summary>Create a new transaction</summary>
          IDbTransaction InTransaction();
      }
      

      Interfaces implementation:

      // Copyright © 2022 LLC "Fortis", "Monopoly" Group
      using System.Linq;
      using Fortis.Common.DataAccessLayer.Contracts;
      using Fortis.GeographyEF.Abstract;
      using Microsoft.EntityFrameworkCore;
      
      namespace Fortis.GeographyEF;
      
      partial class GeographyDbContext :
          IMsSqlGeographyDbReader,
          IMsSqlGeographyDbWriter,
          IDbUnitOfWork
      {
          void IDbWriter.Add<TEntity>(TEntity entity)
              => Entry(entity).State = EntityState.Added;
      
          void IDbWriter.Update<TEntity>(TEntity entity)
              => base.Entry(entity).State = EntityState.Modified;
      
          void IDbWriter.Delete<TEntity>(TEntity entity)
              => base.Entry(entity).State = EntityState.Deleted;
      
          IQueryable<TEntity> IDbReader.Read<TEntity>()
              => base.Set<TEntity>()
                  .AsNoTracking()
                  .AsQueryable();
      
          Task<int> IDbUnitOfWork.SaveChangesWithoutAuditAsync(CancellationToken cancellationToken)
              => SaveChangesAsync(cancellationToken);
      
          Task<int> IDbUnitOfWork.SaveChangesAsync(CancellationToken cancellationToken)
              => SaveChangesWithDetachAsync(cancellationToken);
      
          IDbUnitOfWork.IDbTransaction InTransaction()
          {
              var internalTransaction = Database.BeginTransaction();
              return new GeographyDbTransaction(internalTransaction);
          }
      
          internal async Task<int> SaveChangesWithDetachAsync(CancellationToken cancellationToken)
          {
              var count = await base.SaveChangesAsync(cancellationToken);
              foreach (var entry in base.ChangeTracker.Entries().ToArray())
              {
                  entry.State = EntityState.Detached;
              }
      
              return count;
          }
      }
      
    • The DbContext for MS SQL itself, which implements the interfaces described above, as well as its migrations, and the Autofac module that registers the context in the DI container and connects it to the application's web API. Context configuration for MS SQL based on Fluent API Configuration. For each of the tables, a separate configuration class has been created that implements the interface: IEntityTypeConfiguration.

      Entity configuration example:

    
    // Copyright © 2022 LLC "Fortis", "Monopoly" Group
    namespace Fortis.GeographyEF.Entities.Configurations.MsSql;
    public class SettlementConfiguration : IEntityTypeConfiguration<Settlement>
    {
        public void Configure(EntityTypeBuilder<Settlement> builder)
        {
            builder.ToTable("Settlements");
                builder.HasKey(p => p.Id);
                builder.Property(p => p.Id).ValueGeneratedOnAdd();
    
            builder.Property(o => o.Name)
                .IsRequired()
                .HasMaxLength(100)
                .HasColumnType("NVARCHAR(100)");
            builder.Property(o => o.Abbreviation)
                .IsRequired()
                .HasMaxLength(15)
                .HasColumnType("NVARCHAR(15)");
    
            builder.HasOne(p => p.Region)
                .WithMany(p => p.Settlements)
                .HasForeignKey(p => p.RegionId)
                .OnDelete(DeleteBehavior.Cascade);
        }
    }
    
    • Connecting this kind of configuration is quite simple in the context of the database. An example of enabling entity configurations:
    
    // Copyright © 2022 LLC "Fortis", "Monopoly" Group
    namespace Fortis.GeographyEF;
    internal sealed class GeographyDbContext: DbContext, IGeographyDbContext
    {
        //...
        protected override void OnModelCreating(ModelBuilder builder)
        {
            base.OnModelCreating(builder);
    
            builder.ApplyConfigurationsFromAssembly(typeof(SettlementConfiguration).Assembly);
        }
    }
    
    • A layer of repositories that depend not on a specific context, but on an abstraction (interface).

Thus, a more beautiful and extensible arrangement of the application data layer was obtained, which opened the way to abstraction from the implementation of the Entity Framework for a specific DBMS.

Solution Components After Refactoring

Solution Components After Refactoring

Adding PostgreSQL

At this stage, two more projects were added to the application, similar to those previously implemented for MS SQL:

  • DbContext for PostgreSQL, schema migration and Autofac module.
    • Also, we created the initiating migration here.
  • PostgreSQL EF context configuration based on Fluent API Configuration.

Solution components after adding PostgreSQL

Solution components after adding PostgreSQL
In addition, for the Web API of the application, within the Startup class, the reading of a new configuration variable from IConfiguration was added, depending on the value of which the Autofac module of the database context
we need is connected.

That is, the DBMS to be used is selected at the configuration level of the executable application itself.

Of course, additional configuration of PostgreSQL itself also came into the picture. We needed to install the PostGIS extension, which includes support for spatial data types in the DBMS.

Data Migration and PostgreSQL Migration Strategy

The Geography service deployment schema uses a console migratory application to migrate the database schema. It was decided to expand its configuration by adding the ability to select the currently active DBMS (MS SQL or PostgreSQL), as well as support for a command-line argument to start the data migration process.

Since there was already a previously created initiating migration for PostgreSQL, it was deployed on an empty base and taken as a basis for future deployments.

To transfer all 150 GiB of data (there were several tables with about 10 million records each; but most of the space was occupied by values in fields with spatial data types), it was decided to iterate through each table from the MS SQL EF context and record it into the PostgreSQL context.

To do this, we used a topological sorting algorithm with an depth-first search strategy in order to iterate through all database entities in such a way as to go through all the entities once, taking into account their relationships: all dependencies are exported first, followed by dependent entities.

The topological sort algorithm implementation example:

// Copyright © 2022 LLC "Fortis", "Monopoly" Group
using System;
using System.Collections.Generic;
using System.Reflection;
using NetTopologySuite.Geometries;

namespace Fortis.Geography.Migrate.DbSync;

internal sealed class TopSortService : ITopSortService
{
    private static readonly HashSet excludedTypes = new HashSet
    {
        typeof(string), // don't analyze string properties
        typeof(Geometry), // and Geometry types
        typeof(LineString),
    };

    public IEnumerable Sort(IEnumerable typesList)
    {
        var graph = new Dictionary&gt;();
        foreach (var vertex in typesList)
        {
            var edges = new List();
            graph.Add(vertex, edges);
            foreach (var propertyInfo in vertex.GetProperties(BindingFlags.Public | BindingFlags.Instance))
            {
                var edge = propertyInfo.PropertyType;
                if (!edge.IsValueType &amp;&amp; // don't analyze primitive types
                    !edge.IsGenericType &amp;&amp; // and generic types
                    !excludedTypes.Contains(edge)
                   )
                {
                    edges.Add(edge);
                }
            }
        }

        var result = new List();
        var visited = new HashSet();
        foreach (var vertex in graph.Keys)
        {
            Dfs(vertex, graph, visited, result);
        }

        return result;
    }

    private static void Dfs(
        Type vertex,
        Dictionary&gt; graph,
        HashSet visited,
        List result)
    {
        if (!visited.Add(vertex))
        {
            return;
        }

        foreach (var edge in graph[vertex])
        {
            Dfs(edge, graph, visited, result);
        }

        result.Add(vertex);
    }
}
Enter fullscreen mode Exit fullscreen mode

Next, the migrator code itself was written.

Migrator implementation example:

// Copyright © 2022 LLC "Fortis", "Monopoly" Group
using System;
using System.Linq;
using System.Reflection;
using System.Threading;
using System.Threading.Tasks;
using Fortis.Common.DataAccessLayer.Entity.Contracts;
using Fortis.Common.Logging.Contracts;
using Fortis.GeographyEF.Abstract;
using Fortis.GeographyEF.Common;
using Fortis.GeographyEF.PostgreSql.Abstract;
using Microsoft.EntityFrameworkCore;

namespace Fortis.Geography.Migrate.DbSync;

internal sealed class DbSynchronizationService : IDbSynchronizationService
{
    private static readonly MethodInfo synchronizeEntityMethodInfo = typeof(DbSynchronizationService).GetMethod(nameof(SynchronizeEntities));

    private readonly ITopSortService topSortService;
    private readonly IMsSqlGeographyDbReader msReader;
    private readonly IGeographyEntitiesResolver entitiesResolver;
    private readonly IPgGeographyDbWriter pgWriter;
    private readonly IPgGeographyDbUnitOfWork pgUnitOfWork;
    private readonly IFortisLogger logger;

    public DbSynchronizationService(
        ITopSortService topSortService,
        IMsSqlGeographyDbReader msReader,
        IGeographyEntitiesResolver entitiesResolver,
        IPgGeographyDbWriter pgWriter,
        IPgGeographyDbUnitOfWork pgUnitOfWork,
        IFortisLogManager logManager)
    {
        logger = logManager.GetLogger();

        this.topSortService = topSortService;
        this.msReader = msReader;
        this.entitiesResolver = entitiesResolver;
        this.pgWriter = pgWriter;
        this.pgUnitOfWork = pgUnitOfWork;
    }

    async Task IDbSynchronizationService.Synchronize(CancellationToken cancellationToken)
    {
        var dbTypes = entitiesResolver.GetClrTypes();
        var sortedDbTypes = topSortService.Sort(dbTypes);
        foreach (var dbType in sortedDbTypes)
        {
            await (Task)synchronizeEntityMethodInfo
                .MakeGenericMethod(dbType)
                .Invoke(this, new object[] {cancellationToken});
        }
    }

    // don't change modifier, because we use reflection
    public async Task SynchronizeEntities(CancellationToken token)
        where T : class, IDbEntity, ICreatedAtEntity, IEntityWithId
    {
        const int batchSize = 4024;

        var baseMsQuery = msReader.Read();

        var recordsTotal = await baseMsQuery.CountAsync(token);
        var recordsProcessed = 0;

        logger.Info($"Synchronizing {typeof(T).Name}. Records total: {recordsTotal}");

        try
        {
            while (recordsProcessed &lt; recordsTotal)
            {
                var srcList = await baseMsQuery
                    .Skip(recordsProcessed)
                    .Take(batchSize)
                    .ToListAsync(token);

                foreach (var entity in srcList)
                {
                    pgWriter.Add(entity);
                }

                await pgUnitOfWork.SaveChangesAsync(token);

                recordsProcessed += srcList.Count;
            }
        }
        catch (Exception e)
        {
            logger.Error($"{typeof(T).Name} synchronization failed. Records processed: {recordsProcessed}.", e);
            throw;
        }

        logger.Info($"{typeof(T).Name} has been synchronized.");
    }
}
Enter fullscreen mode Exit fullscreen mode

It is not the database contexts themselves that are injected into the migrator class, but abstractions over them (for read, write and UnitOfWork operations). It is also worth clarifying the purpose of the IGeographyEntitiesResolver. This is an interface that provides a list of CLR types associated with a context.

IGeographyEntitiesResolver interface and its implementation:

// Copyright © 2022 LLC "Fortis", "Monopoly" Group
public interface IGeographyEntitiesResolver
{
    /// 
    /// Return all the CLR types associated with the database context
    /// 
    IReadOnlyCollection GetClrTypes();
}

partial class PgGeographyContext : IGeographyEntitiesResolver
{
    private static readonly Assembly entitiesAssembly = typeof(Location).Assembly;

    IReadOnlyCollection IGeographyEntitiesResolver.GetClrTypes() =&gt;
        Model.GetEntityTypes()
            .Select(o =&gt; o.ClrType)
            .Where(o =&gt; o.Assembly == entitiesAssembly)
            .ToList();
}
Enter fullscreen mode Exit fullscreen mode

I would like to note right away that this implementation of the migrator is not a panacea and may differ depending on the characteristics of your databases. The migrator in this example was launched on a static, separate copy of the production environment database.

It goes without saying that since the end of the first data migration to PostgreSQL, the database of the production environment is guaranteed to differ in terms of the number of rows, so it is necessary to implement a mechanism for compensating for discrepancies.

This task was facilitated by the fact that the business logic of the Geography service involves only adding new data about cities, points of interest and routes. Data modification is only possible through individual EF data migrations.

Thus, it was enough to upload several thousand missing entries at the time of release with the final transition to PostgreSQL. To do this, changes have been made to the migrator code to support defining the last row uploaded for each table and transferring only new rows. In the process of the deployment, the migrator was called, and the data was re-actualized in a short time.

Of course, alternative strategies for compensating for data discrepancies were also possible:

  • Implementation of simultaneous writing to both DBMSs via repositories.
  • Tracking changes at the trigger level in tables and notifying about them through the Outbox pattern implementation.
  • etc

All options have their pros and cons, so the most effective one for our particular situation was chosen.

Results

The Geography service was successfully migrated to the new DBMS without loss, and the accompanying refactoring improved the structure of the solution. There are no problems with query performance.

Since the migration approach based on EF and topological sorting has proven its worth, it is planned to apply it with some improvements in other projects.

I hope that the above will help you in migrating from MS SQL to PostgreSQL. If you have any questions, feel free to comment.

Top comments (0)