Data Migrations at Scale
As Woovi grows we have to deal with more and more data in our database collections.
We are constantly evolving our product to support new use cases for our customers.
We also need to evolve our collections to support these changes.
Data migrations let us migrate older data to avoid having to deal with two data formats in our codebase.
Reading data in batch with cursor
The naive way of reading data is like this:
const users = await User.find()
This will read all users from the database to your memory.
This is fast and works well if you have a few users.
However, if you have millions of users, this will be slow and consume a lot of memory.
You can use the cursor to fetch one item at a time, like this
const cursor = User.find();
for await (const doc of cursor) {
await migrateItem(doc);
}
This approach is better, but you do one network request for each read. If you have 1 million users, you are going to do 1 million requests to the database.
You can improve this using the batchSize
option in the cursor
const cursor = User.find().cursor({ batchSize: 10000 });
const batched = batchCursor(batchedCursor, batchSize);
while (true) {
const { value, done } = await batched.next();
for (const doc of value) {
await migrateItem(doc);
}
if (done) {
break;
}
}
batchCursor helper definition:
export async function* batchCursor(c, n) {
const cursor = c;
while (true) {
const ret = [];
let i = 0;
while (i < n) {
const val = await cursor.next();
if (val) {
ret.push(val);
} else {
return ret;
}
i++;
}
yield ret;
}
}
This reduces from 1 million requests to the database to only 100.
Writing in batch
Even if you reduce the number of database reads, your data migrations will still be slow if you don't reduce the number of database writes.
MongoDB provides a bulkWrite API that enables you to batch writes in a single database request.
const writes = docs.map((doc) => getWriteOperation(doc));
await User.bulkWrite(writes);
Instead of each item making its own write to the database, they will return a write operation that will be joined to be sent in bulk to the database.
Below is an example of a bulkWrite operation using updateOne, this will update the User with _id as the user._id
, with the field emails
as the user.email
.
{
updateOne: {
filter: { _id: user._id },
update: {
$set: {
emails: [user.email],
},
},
},
};
To Sum Up
All these improvements are not only to make data migrations run faster but also to reduce the workload for your database.
These improvements also made our migration code better and more intuitive.
Woovi
Woovi is a Startup that enables shoppers to pay as they like. Woovi provides instant payment solutions for merchants to accept orders to make this possible.
If you want to work with us, we are hiring!
Top comments (0)