The primary focus of this series is to demonstrate the significant performance gains you can achieve—and the costs you can save—by using MongoDB properly. This includes following best practices, studying your application's specific needs, and using those insights to model your data effectively.
To illustrate these potential gains, we will present a sample application. We will then develop and load-test various MongoDB implementations for this application. These implementations will cater to different levels of MongoDB expertise: beginner, intermediate, senior, and mind-blowing (🤯).
All code and supplementary information used throughout this series are available in the GitHub repository.
The Application: Finding Fraudulent Behavior in Transactions
The application's goal is to identify fraudulent behavior within a financial transaction system. It achieves this by analyzing the status of transactions for a specific user over a defined time period. The possible transaction statuses are approved, noFunds, pending, and rejected. Each user is uniquely identifiable by a 64-character hexadecimal key value.
The application receives details of each transaction through an event document. Each event document contains information for a single transaction, for one user, on a specific day. Consequently, it will include only one of the possible status fields, with this field having a numeric value of 1. For example, the following event document represents a pending transaction for the user with the key ...0001, which occurred on the date 2022-02-01:
const event = {
key: "0000000000000000000000000000000000000000000000000000000000000001",
date: new Date("2022-02-01"),
pending: 1,
};
Transaction statuses are analyzed by comparing the total counts of each status for a given user over several trailing periods: oneYear, threeYears, fiveYears, sevenYears, and tenYears. These totals are provided in a reports document, which can be requested by providing the user's key and the end date for the report.
The following is an example of a reports document for the user with key ...0001 and an end date of 2022-06-15:
export const reports = [
{
id: "oneYear",
end: new Date("2022-06-15T00:00:00.000Z"),
start: new Date("2021-06-15T00:00:00.000Z"),
totals: { approved: 4, noFunds: 1, pending: 1, rejected: 1 },
},
{
id: "threeYears",
end: new Date("2022-06-15T00:00:00.000Z"),
start: new Date("2019-06-15T00:00:00.000Z"),
totals: { approved: 8, noFunds: 2, pending: 2, rejected: 2 },
},
{
id: "fiveYears",
end: new Date("2022-06-15T00:00:00.000Z"),
start: new Date("2017-06-15T00:00:00.000Z"),
totals: { approved: 12, noFunds: 3, pending: 3, rejected: 3 },
},
{
id: "sevenYears",
end: new Date("2022-06-15T00:00:00.000Z"),
start: new Date("2015-06-15T00:00:00.000Z"),
totals: { approved: 16, noFunds: 4, pending: 4, rejected: 4 },
},
{
id: "tenYears",
end: new Date("2022-06-15T00:00:00.000Z"),
start: new Date("2012-06-15T00:00:00.000Z"),
totals: { approved: 20, noFunds: 5, pending: 5, rejected: 5 },
},
];
Load Testing Methodology
To evaluate the performance of each application version, two functions were designed to run concurrently under load:
-
Bulk Upsert: Inserts event documents. -
Get Reports: Generatesreportsdocument for a specific userkeyanddate.
Parallel execution of these functions was achieved using worker threads, with 20 workers allocated to each. Each application version was tested for 200 minutes, with varying execution parameters applied throughout this period.
Bulk Upsert Function
The Bulk Upsert function receives batches of 250 event documents for registration. As its name suggests, these registrations are performed using MongoDB's upsert functionality, which attempts to update a document or creates a new one if it doesn't exist, using the data from the update operation. Each Bulk Upsert iteration is timed and its duration is recorded in a secondary database.
The batch processing rate is divided into four 50-minute phases, totaling 200 minutes. The rate begins at one batch insert per second and is incremented by one batch insert per second every 50 minutes, ultimately reaching four batch inserts per second (equivalent to 1,000 event documents per second).
Get Reports Function
The Get Reports function generates one reports document per execution. The duration of each execution is timed and recorded in the secondary database.
The rate of reports generation is divided into 40 phases, distributed as 10 sub-phases within each of the four Bulk Upsert phases. Within each Bulk Upsert phase, the Get Reports rate starts at 25 report requests per second and increases by 25 requests per second every five minutes. This culminates in 250 report requests per second by the end of that Bulk Upsert phase.
The following graph depicts the target rates for Bulk Upsert and Get Reports throughout the test scenario:
Initial Scenario and Data Generation
For a fair comparison across application versions, the initial dataset (working set) for the tests was designed to be larger than the available memory on the MongoDB server. This approach ensures significant cache activity and prevents the entire working set from residing in memory.
The following parameters were established for the initial dataset:
- Data spanning 10 years: from
2010-01-01to2020-01-01. - 50 million events per year, resulting in a total working set of 500 million events.
- An average of 60 events per user (
key) per year.
Given 50 million events per year and 60 events per user per year, the total number of unique users is approximately 833,333 (50,000,000 / 60). The user's key generator was configured to produce keys following an approximately normal (Gaussian) distribution. This simulates a real-world scenario where some users generate more events than others. The following graph illustrates the distribution of 50 million keys generated:
To further simulate a real-world scenario, the distribution of event statuses was set as follows:
- 80%
approved - 10%
noFunds - 7.5%
pending - 2.5%
rejected
Initial Scenario Collection Statistics
| Collection | Documents | Data Size | Avg. Document Size | Storage Size | Indexes | Index Size |
|---|---|---|---|---|---|---|
| appV1 | 359,639,622 | 39.58GB | 119B | 8.78GB | 2 | 20.06GB |
| appV2 | 359,614,536 | 41.92GB | 126B | 10.46GB | 2 | 16.66GB |
| appV3 | 359,633,376 | 28.7GB | 86B | 8.96GB | 2 | 16.37GB |
| appV4 | 359,615,279 | 19.66GB | 59B | 6.69GB | 1 | 9.5GB |
| appV5R0 | 95,350,431 | 19.19GB | 217B | 5.06GB | 1 | 2.95GB |
| appV5R1 | 33,429,649 | 15.75GB | 506B | 4.04GB | 1 | 1.09GB |
| appV5R2 | 33,429,649 | 11.96GB | 385B | 3.26GB | 1 | 1.16GB |
| appV5R3 | 33,429,492 | 11.96GB | 385B | 3.24GB | 1 | 1.11GB |
| appV5R4 | 33,429,470 | 12.88GB | 414B | 3.72GB | 1 | 1.24GB |
| appV6R0 | 95,350,319 | 11.1GB | 125B | 3.33GB | 1 | 3.13GB |
| appV6R1 | 33,429,366 | 8.19GB | 264B | 2.34GB | 1 | 1.22GB |
| appV6R2 | 33,429,207 | 9.11GB | 293B | 2.8GB | 1 | 1.26GB |
| appV6R3 | 33,429,694 | 9.53GB | 307B | 2.56GB | 1 | 1.19GB |
| appV6R4 | 33,429,372 | 9.53GB | 307B | 1.47GB | 1 | 1.34GB |
Infrastructure Configuration
MongoDB Server Instance
The MongoDB server ran on an AWS EC2 c7a.large instance, equipped with 2 vCPUs and 4GB of memory. Two disks were attached:
- A 15GB GP3 disk for the operating system.
- A 300GB IO2 disk with 10,000 IOPS for MongoDB data storage.
The instance ran Ubuntu 22.04, fully updated at the time of testing. All recommended production settings were applied to optimize MongoDB performance on the available hardware.
Application Server Instance
The application server ran on an AWS EC2 c6a.xlarge instance, featuring 4 vCPUs and 8GB of memory. Two disks were attached:
- A 10GB GP3 disk for the operating system.
- A 10GB GP3 disk for a secondary MongoDB server, used for storing load test metrics.
This instance also ran Ubuntu 22.04, fully updated. Recommended production settings were applied to optimize its performance.


Top comments (0)