DEV Community

Cover image for Speeding up MongoDB migrations with cursors and bulkWrite
Sibelius Seraphini for Woovi

Posted on

6 1 1

Speeding up MongoDB migrations with cursors and bulkWrite

Data Migrations at Scale

As Woovi grows we have to deal with more and more data in our database collections.
We are constantly evolving our product to support new use cases for our customers.
We also need to evolve our collections to support these changes.
Data migrations let us migrate older data to avoid having to deal with two data formats in our codebase.

Reading data in batch with cursor

The naive way of reading data is like this:

const users = await User.find()
Enter fullscreen mode Exit fullscreen mode

This will read all users from the database to your memory.
This is fast and works well if you have a few users.
However, if you have millions of users, this will be slow and consume a lot of memory.

You can use the cursor to fetch one item at a time, like this

const cursor = User.find();

for await (const doc of cursor) {
   await migrateItem(doc);
}
Enter fullscreen mode Exit fullscreen mode

This approach is better, but you do one network request for each read. If you have 1 million users, you are going to do 1 million requests to the database.

You can improve this using the batchSize option in the cursor

const cursor = User.find().cursor({ batchSize: 10000 });

const batched = batchCursor(batchedCursor, batchSize);

while (true) {
  const { value, done } = await batched.next();

  for (const doc of value) {
    await migrateItem(doc);
  }

  if (done) {
    break;
  }
}
Enter fullscreen mode Exit fullscreen mode

batchCursor helper definition:

export async function* batchCursor(c, n) {
  const cursor = c;

  while (true) {
    const ret = [];
    let i = 0;

    while (i < n) {
      const val = await cursor.next();

      if (val) {
        ret.push(val);
      } else {
        return ret;
      }

      i++;
    }

    yield ret;
  }
}
Enter fullscreen mode Exit fullscreen mode

This reduces from 1 million requests to the database to only 100.

Writing in batch

Even if you reduce the number of database reads, your data migrations will still be slow if you don't reduce the number of database writes.

MongoDB provides a bulkWrite API that enables you to batch writes in a single database request.

const writes = docs.map((doc) => getWriteOperation(doc));

await User.bulkWrite(writes);
Enter fullscreen mode Exit fullscreen mode

Instead of each item making its own write to the database, they will return a write operation that will be joined to be sent in bulk to the database.

Below is an example of a bulkWrite operation using updateOne, this will update the User with _id as the user._id, with the field emails as the user.email.

{
    updateOne: {
      filter: { _id: user._id },
      update: {
        $set: {
          emails: [user.email],
        },
      },
    },
  };
Enter fullscreen mode Exit fullscreen mode

To Sum Up

All these improvements are not only to make data migrations run faster but also to reduce the workload for your database.

These improvements also made our migration code better and more intuitive.


Woovi
Woovi is a Startup that enables shoppers to pay as they like. Woovi provides instant payment solutions for merchants to accept orders to make this possible.

If you want to work with us, we are hiring!


Photo by Sigmund on Unsplash

Image of Docusign

Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay