Migration Experience: 60 000 Users from MongoDB to SQL

#javascript #webdev #programming #tutorial

Moving data from one database to another can be a daunting task, especially when you need to do it with minimal disruption and maximum safety. This is how I tackled a migration of 60,000 users and their data from a MongoDB to a SQL database, handling approximately 13 columns across two related tables (a one-to-many relationship).

My primary goals were to ensure the migration was as easy, safe, and quick as possible. A key requirement was to keep data flowing on a secure network and execute the process within our existing application environment.

The Strategy: State Management and Controlled Execution

To manage the migration, I built a simple, in-memory state management system. This system gave me real-time visibility and control over the entire process.

The Migration State

I defined a MigrationState interface to track some metrics:

interface MigrationState {
  concurrency: number;
  stopped: boolean;
  startedAt: Date;
  count: number;
  resolvedCount: number;
  successCount: number;
  errorCount: number;
}

I used a global object, migrationsState, to store the state for each migration run:

export const migrationsState: { [key: string]: MigrationState } = {};

Core Helper Functions
A set of helper functions allowed me to manage the migration's lifecycle and monitor progress. Created functions to do the following things (simple as that):

Start a new migration run.

export const startMigration = (migrationId: string, concurrency: number, count: number): void => {
    migrationsState[migrationId] = {
        concurrency,
        stopped: false,
        startedAt: new Date(),
        count,
        resolvedCount: 0,
        successCount: 0,
        errorCount: 0
    }
}

Stop a running migration.

export const stopMigration = (migrationId: string): void => {
    migrationsState[migrationId].stopped = true
}

Change the concurrency level.

export const changeConcurrency = (migrationId: string, concurrency: number): void => {
    migrationsState[migrationId].concurrency = concurrency
}

Get the current state.

export const getMigration = (migrationId: string): MigrationState=> {
    return migrationsState[migrationId]
}

These functions were triggered via API calls, giving me an external, controllable interface to the migration process.

The Execution Logic
The core of the migration was an asynchronous function that processed each user's data, updating the in-memory state as it went.

const handleUserAction = async (migrationId: string, user: MongoUserModel): Promise<{ userId: string; }> => {
  try {
    // This function handles the actual data transfer and transformation per user
    await moveMongoUserDataToSql(user); 
    bumpSuccessCount(migrationId);
    return {
      userId: user.userId
    };
  } catch (error) {
    bumpErrorCount(migrationId);
    throw error;
  } finally {
    bumpResolvedCount(migrationId);
  }
};

This simple try...catch...finally block ensured that I always updated my counters, whether the operation succeeded or failed, giving me an accurate picture of the process.

Real-Time Monitoring and Control
Monitoring was crucial. I tracked both database and server metrics to make informed decisions.

Database (SQL): I kept a close eye on DTU % (Database Transaction Unit) to ensure the database wasn't overloaded and could still handle normal application traffic.

Server: I monitored request response time, CPU usage, memory usage, server logs. This told me how the migration was impacting my application's performance.

Lessons from Concurrency
By watching these metrics, I could quickly decide when to adjust the concurrency. A higher concurrency level sped up the migration but also increased resource consumption.

For context, these numbers refer to a medium-complex query in our system:
Normal response time: 50ms - 100ms
5 concurrency: 200ms - 500ms
10 concurrency: 200ms - 2s

Pushing concurrency too high risked impacting the live application. This highlighted a key takeaway: having real-time metrics and the ability to dynamically adjust concurrency was critical for balancing speed and stability.

For future migrations, an additional control to change the interval time between user iterations would provide even more granular control over resource usage. This would allow me to "throttle" the migration more effectively.

This approach provided me with the flexibility and insight necessary to safely and successfully migrate our users. It proved that a well-defined process, combined with real-time monitoring and dynamic control, can make a complex data migration feel manageable and secure.