Table Of Contents
- Application Version 6 Revision 0: A Dynamic Monthly Bucket Document
- Application Version 6 Revision 1: A Dynamic Quarter Bucket Document
- Application Version 6 Revision 2: A Dynamic Bucket and Computed Document
- Application Version 6 Revision 3: Getting everything at once
- Application Version 6 Revision 4: The
zstdCompression Algorithm
Article Introduction
Welcome to the third and final part of the series, "The Cost of Not Knowing MongoDB". Building upon the foundational optimizations explored in Part 1 and Part 2, this article delves into advanced MongoDB design patterns that can dramatically transform application performance.
In the Part 1, we improved application performance by concatenating fields, changing data types, and shortening field names. In the Part 2, we implemented the Bucket Pattern and Computed Patterns and optimized the aggregation pipeline to achieve even better performance.
In this final article, we address the Issues and Improvements identified in AppV5R4. Specifically, we focus on reducing the document size in our application to alleviate the disk throughput bottleneck on the MongoDB server. This reduction will be accomplished by adopting a Dynamic Schema and modifying the storage compression algorithm.
All the application versions and revisions from this article would have been developed by a senior MongoDB developer, as it's built on all the previous versions and utilizes the Dynamic Schema pattern, which isn't very common to see.
Application Version 6 Revision 0 (appV6R0): A Dynamic Monthly Bucket Document
Introduction
As mentioned in the Issues and Improvements of appV5R4 from the previous article, the primary limitation of our MongoDB server is its disk throughput. To address this, we need to reduce the size of the documents being stored.
Consider the following document from appV5R3, which has provided the best performance so far:
const document = {
_id: Buffer.from("...01202202"),
items: [
{ date: new Date("2022-06-05"), a: 10, n: 3 },
{ date: new Date("2022-06-16"), p: 1, r: 1 },
{ date: new Date("2022-06-27"), a: 5, r: 1 },
{ date: new Date("2022-06-29"), p: 1 },
],
};
The items array in this document contains only four elements, but on average, it will have around 10 elements, and in the worst-case scenario, it could have up to 90 elements. These elements are the primary contributors to the document size, so they should be the focus of our optimization efforts.
One commonality among the elements is the presence of the date field and part of its value, year and month, for the previous document. By rethinking how this field and its value could be stored, we can reduce storage requirements.
An unconventional solution we could use is:
- Changing the
itemsfield type from an array to a document. - Using the
datevalue as the field name in theitemsdocument. - Storing the status totals as the value for each
datefield.
Here is the previous document represented using the new schema idea:
const document = {
_id: Buffer.from("...01202202"),
items: {
20220605: { a: 10, n: 3 },
20220616: { p: 1, r: 1 },
20220627: { a: 5, r: 1 },
20220629: { p: 1 },
},
};
While this schema may not significantly reduce the document size compared to appV5R3, we can further optimize it by leveraging the fact that the year is already embedded in the _id field. This eliminates the need to repeat the year in the field names of the items document.
With this approach, the items document adopts a Dynamic Schema, where field names encode information and are not predefined.
To demonstrate various implementation possibilities, we will revisit all the bucketing criteria used in the appV5RX implementations, starting with appV5R0.
For appV6R0, which builds upon appV5R0 but uses a dynamic schema, data is bucketed by year and month. The field names in the items document represent only the day of the date, as the year and month are already stored in the _id field.
A detailed explanation of the bucketing logic and functions used to implement the current application can be found in the appV5R0 introduction.
The following document stores data for January 2022 (2022-01-XX), applying the newly presented idea:
const document = {
_id: Buffer.from("...01202201"),
items: {
"05": { a: 10, n: 3 },
16: { p: 1, r: 1 },
27: { a: 5, r: 1 },
29: { p: 1 },
},
};
Schema
The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0:
export type SchemaV6R0 = {
_id: Buffer;
items: Record<
string,
{
a?: number;
n?: number;
p?: number;
r?: number;
}
>;
};
Bulk Upsert
Based on the specification presented, we have the following updateOne operation for each event generated by this application version:
const DD = getDD(event.date); // Extract the `day` from the `event.date`
const operation = {
updateOne: {
filter: { _id: buildId(event.key, event.date) }, // key + year + month
update: {
$inc: {
[`items.${DD}.a`]: event.approved,
[`items.${DD}.n`]: event.noFunds,
[`items.${DD}.p`]: event.pending,
[`items.${DD}.r`]: event.rejected,
},
},
upsert: true,
},
};
filter:
- Target the document where the
_idfield matches the concatenated value ofkey,year, andmonth. - The
buildIdfunction converts thekey+year+monthinto a binary format.
update:
- Uses the $inc operator to increment the fields corresponding to the same
DDas theeventby the status values provided. - If a field does not exist in the
itemsdocument and theeventprovides a value for it,$inctreats the non-existent field as having a value of 0 and performs the operation. - If a field exists in the
itemsdocument but theeventdoes not provide a value for it (i.e.,undefined),$inctreats it as 0 and performs the operation.
upsert:
- Ensures a new document is created if no matching document exists.
Get Reports
To fulfill the Get Reports operation, five aggregation pipelines are required, one for each date interval. Each pipeline follows the same structure, differing only in the filtering criteria in the $match stage:
const pipeline = [
{ $match: docsFromKeyBetweenDate },
{ $addFields: buildTotalsField },
{ $group: groupSumTotals },
{ $project: { _id: 0 } },
];
The complete code for this aggregation pipeline is quite complicated. Because of that, we will have just a pseudocode for it here.
{ $match: docsFromKeyBetweenDate }
- Range-filters documents by
_idto retrieve only buckets within the report date range. It has the exact same logic asappV5R0.
{ $addFields: buildTotalsField }
- The logic is similar to the one used in the
Get ReportsofappV5R3. - The
$objectToArrayoperator is used to convert theitemsdocument into an array, enabling a$reduceoperation. - Filtering the
itemsfields within the report's range involves extracting theyearandmonthfrom the_idfield and thedayfrom the field names in theitemsdocument. -
The following JavaScript code is logic equivalent to the real aggregation pipeline code.
// Equivalent JavaScript logic: const [MM] = _id.slice(-2).toString(); // Get month from _id const [YYYY] = _id.slice(-6, -2).toString(); // Get year from _id const items_array = Object.entries(items); // Convert the object to an array of [key, value] const totals = items_array.reduce( (accumulator, [DD, status]) => { let statusDate = new Date(`${YYYY}-${MM}-${DD}`); if (statusDate >= reportStartDate && statusDate < reportEndDate) { accumulator.a += status.a || 0; accumulator.n += status.n || 0; accumulator.p += status.p || 0; accumulator.r += status.r || 0; } return accumulator; }, { a: 0, n: 0, p: 0, r: 0 } );
{ $group: groupCountTotals }
- Group the
totalsof each document in the pipeline into final status totals using$sumoperations.
{ $project: { _id: 0 } }
- Format the resulting document to has the
reportsformat.
Indexes
No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation.
Initial Scenario Statistics
Collection Statistics
To evaluate the performance of appV6R0, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions:
| Collection | Documents | Data Size | Document Size | Storage Size | Indexes | Index Size |
|---|---|---|---|---|---|---|
| appV5R0 | 95,350,431 | 19.19GB | 217B | 5.06GB | 1 | 2.95GB |
| appV5R3 | 33,429,492 | 11.96GB | 385B | 3.24GB | 1 | 1.11GB |
| appV6R0 | 95,350,319 | 11.1GB | 125B | 3.33GB | 1 | 3.13GB |
Event Statistics
To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total Data Size and Index Size by the 500 million events.
| Collection | Data Size/events | Index Size/events | Total Size/events |
|---|---|---|---|
| appV5R0 | 41.2B | 6.3B | 47.5B |
| appV5R3 | 25.7B | 2.4B | 28.1B |
| appV6R0 | 23.8B | 6.7B | 30.5B |
It is challenging to make a direct comparison between appV6R0 and appV5R0 from a storage perspective. The appV5R0 implementation is the simplest bucketing possible, where event documents were merely appended to the items array without bucketing by day, as is done in appV6R0.
However, we can attempt a comparison between appV6R0 and appV5R3, the best solution so far. In appV6R0, data is bucketed by month, whereas in appV5R3, it is bucketed by quarter. Assuming document size scales linearly with the bucketing criteria (though this is not entirely accurate), the appV6R0 document would be approximately 3 * 125 = 375 bytes, which is 9.4% smaller than appV5R3.
Another indicator of improvement is the Data Size/events metric in the Event Statistics table. For appV6R0, each event uses an average of 23.8 bytes, compared to 27.7 bytes for appV5R3, representing a 14.1% reduction in size.
Load Test Results
Executing the load test for appV6R0 and plotting it alongside the results for appV5R0 and Desired rates, we have the following results for Get Reports and Bulk Upsert.
Get Reports Rates
The two versions have very similar rate performance, with appV6R0 being slightly better in the second and third quarter, while appV5R0 is better in the first and fourth quarter.
Get Reports Latency
The two versions have very similar latency performance, with appV6R0 being slightly better in the second and third quarter, while appV5R0 is better in the first and fourth quarter.
Bulk Upsert Rates
Both versions have similar rate values, but it can be seen that appV6R0 has a small edge compared to appV5R0.
Bulk Upsert Latency
Although both versions have similar latency values for the first quarter of the test, for the final three-quarters, appV6R0 has a clear advantage over appV5R0.
Performance Summary
Despite the significant reduction in document and storage size achieved by appV6R0, the performance improvement was not as substantial as expected. This suggests that the bottleneck in the application when bucketing data by month may not be related to disk throughput.
Examining the collection stats table reveals that the index size for both versions is close to 3GB. This is near the 4GB of available memory on the machine running the database and exceeds the 1.5GB allocated by WiredTiger for cache. Therefore, it is likely that the limiting factor in this case is memory/cache rather than document size, which explains the lack of a significant performance improvement.
Issues and Improvements
To address the limitations observed in appV6R0, we propose adopting the same line of improvements applied from appV5R0 to appV5R1. Specifically, we will bucket the events by quarter in appV6R1. This approach not only follows the established pattern of enhancements but also aligns with the need to optimize performance further.
As highlighted in the Load Test Results, the current bottleneck lies in the size of the index relative to the available cache/memory. By increasing the bucketing interval from month to quarter, we can reduce the number of documents by approximately a factor of three. This reduction will, in turn, decrease the number of index entries by the same factor, leading to a smaller index size.
Application Version 6 Revision 1 (appV6R1): A Dynamic Quarter Bucket Document
Introduction
As discussed in the previous Issues and Improvements section, the primary bottleneck in appV6R0 was the index size nearing the memory capacity of the machine running MongoDB. To mitigate this issue, we propose to increase the bucketing interval from month to quarter for appV6R1, the same way we did in appV5R1.
This adjustment aims to reduce the number of documents and index entries by approximately a factor of three, thereby decreasing the overall index size. By adopting a quarter-based bucketing strategy, we align with the established pattern of enhancements applied in appV5R1 versions while addressing the specific memory/cache constraints identified in appV6R0.
The implementation of appV6R1 retains most of the code from appV6R0, with the following key differences:
- The
_idfield will now be composed ofkey+year+quarter. - The field names in the
itemsdocument will encode bothmonthandday, as this information is necessary for filtering date ranges in theGet Reportsoperation.
The following example demonstrates how data for June 2022 (2022-06-XX), within the second quarter (Q2), is stored using the new schema:
const document = {
_id: Buffer.from("...01202202"),
items: {
"0605": { a: 10, n: 3 },
"0616": { p: 1, r: 1 },
"0627": { a: 5, r: 1 },
"0629": { p: 1 },
},
};
Schema
The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0:
export type SchemaV6R0 = {
_id: Buffer;
items: Record<
string,
{
a?: number;
n?: number;
p?: number;
r?: number;
}
>;
};
Bulk Upsert
Based on the specification presented, we have the following updateOne operation for each event generated by this application version:
const MMDD = getMMDD(event.date); // Extract the month (MM) and day(DD) from the `event.date`
const operation = {
updateOne: {
filter: { _id: buildId(event.key, event.date) }, // key + year + quarter
update: {
$inc: {
[`items.${MMDD}.a`]: event.approved,
[`items.${MMDD}.n`]: event.noFunds,
[`items.${MMDD}.p`]: event.pending,
[`items.${MMDD}.r`]: event.rejected,
},
},
upsert: true,
},
};
This updateOne operation has a similar logic to the one in appV6R0, with the only differences being the filter and update criteria.
filter:
- Target the document where the
_idfield matches the concatenated value ofkey,year, andquarter. - The
buildIdfunction converts thekey+year+quarterinto a binary format.
update:
- Uses the
$incoperator to increment the fields corresponding to the sameMMDDas theeventby the status values provided.
Get Reports
To fulfill the Get Reports operation, five aggregation pipelines are required, one for each date interval. Each pipeline follows the same structure, differing only in the filtering criteria in the $match stage:
const pipeline = [
{ $match: docsFromKeyBetweenDate },
{ $addFields: buildTotalsField },
{ $group: groupSumTotals },
{ $project: { _id: 0 } },
];
This aggregation operation has a similar logic to the one in appV6R0, with the only differences being the implementation in the $addFields stage.
{ $addFields: itemsReduceAccumulator }:
- A similar implementation to the one in
appV6R0 - The difference relies on extracting the value of year (
YYYY) from the_idfield and the month and day (MMDD) from the field name -
The following JavaScript code is logic equivalent to the real aggregation pipeline code.
const [YYYY] = _id.slice(-6, -2).toString(); // Get year from _id const items_array = Object.entries(items); // Convert the object to an array of [key, value] const totals = items_array.reduce( (accumulator, [MMDD, status]) => { let [MM, DD] = [MMDD.slice(0, 2), MMDD.slice(2, 4)]; let statusDate = new Date(`${YYYY}-${MM}-${DD}`); if (statusDate >= reportStartDate && statusDate < reportEndDate) { accumulator.a += status.a || 0; accumulator.n += status.n || 0; accumulator.p += status.p || 0; accumulator.r += status.r || 0; } return accumulator; }, { a: 0, n: 0, p: 0, r: 0 } );
Indexes
No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation.
Initial Scenario Statistics
Collection Statistics
To evaluate the performance of appV6R1, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions:
| Collection | Documents | Data Size | Document Size | Storage Size | Indexes | Index Size |
|---|---|---|---|---|---|---|
| appV5R3 | 33,429,492 | 11.96GB | 385B | 3.24GB | 1 | 1.11GB |
| appV6R0 | 95,350,319 | 11.1GB | 125B | 3.33GB | 1 | 3.13GB |
| appV6R1 | 33,429,366 | 8.19GB | 264B | 2.34GB | 1 | 1.22GB |
Event Statistics
To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total Data Size and Index Size by the 500 million events.
| Collection | Data Size/events | Index Size/events | Total Size/events |
|---|---|---|---|
| appV5R3 | 25.7B | 2.4B | 28.1B |
| appV6R0 | 23.8B | 6.7B | 30.5B |
| appV6R1 | 17.6B | 2.6B | 20.2B |
In the previous Initial Scenario Statistics analysis, we assumed that document size would scale linearly with the bucketing range. However, this assumption proved inaccurate. The average document size in appV6R1 is approximately twice as large as in appV6R0, even though it stores three times more data. Already a win for this new implementation.
Since appV6R1 buckets data by quarter at the document level and by day within the items sub-document, a fair comparison would be with appV5R3, the best-performing version so far. From the tables above, we observe a significant improvement in Document Size and consequently Data Size when transitioning from appV5R3 to appV6R1. Specifically, there was a 31.4% reduction in Document Size. From an index size perspective, there was no change, as both versions bucket events by quarter.
Load Test Results
Executing the load test for appV6R0 and plotting it alongside the results for appV5R0 and Desired rates, we have the following results for Get Reports and Bulk Upsert.
Get Reports Rates
For the first three-quarters of the test, both versions have similar rate values, but, for the final quarter, appV6R1 has a notable edge over appV5R3.
Get Reports Latency
As happened in the rates graph, both versions have similar values for the first three-quarters, with appV6R1 being better than appV5R3 for the final quarter.
Bulk Upsert Rates
Both versions have very similar rate values throughout the test, but appV6R1 is able to get better values than appV5R3 in the final 20 minutes, but still not able to reach the desired rate.
Bulk Upsert Latency
Even though both versions have similar rate values, we can see that appV6R1 has considerably better latency values than appV5R3, being almost two times faster for the last three quarters of the test.
Issues and Improvements
Looking at the Get Reports graphs in the last Load Test Results, we're still not being able to reach the desired rates for this functionality. One way we could try to improve these operations is by using our well-known and old friend, the Computed Pattern.
Applying the Computed Pattern in the current version would be the same improvement tried from appV5R3 to appV5R4, which, instead of improving the performance, made it worse. Why would this solution work this time? The only way to know if it will work or not is to try, but before cracking our fingers and starting to work on the implementation, it's always a good idea to make a sanity check and see if there is at least one good reason to believe that this time things will be different (cof - cof).
When we applied the Computed Pattern from appV5R3 to appV5R4, we got a 8.2% increase in the document size and a slight degradation in performance in the Bulk Upsert functionality, with no performance gains in Get Reports. From appV5R3 to appV6R2, we got a 31.4% reduction in the document size, it could make sense to trade some of this reduction in favor of storing some pre-computed values. Another point is that the Bulk Upsert functionality in appV6R2 has its best performance so far, so maybe the extra cost of pre-computing the documents totals for this version is not a big of a deal as it was for appV5R4.
With these two "maybes" and a scientific spirit of always trying to test things to see where they'll break, let's give the Computed Pattern another chance.
Application Version 6 Revision 2 (appV6R2): A Dynamic Bucket and Computed Document
Introduction
As discussed in the previous Issues and Improvements section, in this revision we'll give another try to the Computed Pattern and pre-compute the status totals for each document. This implementation is practically equal to the one tried in appV5R4, with the only difference being that we are using a Dynamic Schema for the items field instead of an array.
Schema
The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R1:
export type SchemaV6R1 = {
_id: Buffer;
totals: {
a?: number; // Quarter total approved
n?: number; // Quarter total noFunds
p?: number; // Quarter total pending
r?: number; // Quarter total rejected
};
items: Record<
string,
{
a?: number; // Daily total approved
n?: number; // Daily total noFunds
p?: number; // Daily total pending
r?: number; // Daily total rejected
}
>;
};
Indexes
No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation.
Bulk Upsert
Based on the specifications, the following bulk updateOne operation is used for each event generated by the application:
const MMDD = getMMDD(event.date); // Extract the month (MM) and day(DD) from the `event.date`
const operation = {
updateOne: {
filter: { _id: buildId(event.key, event.date) }, // key + year + quarter
update: {
$inc: {
"totals.a": event.approved,
"totals.n": event.noFunds,
"totals.p": event.pending,
"totals.r": event.rejected,
[`items.${MMDD}.a`]: event.approved,
[`items.${MMDD}.n`]: event.noFunds,
[`items.${MMDD}.p`]: event.pending,
[`items.${MMDD}.r`]: event.rejected,
},
},
upsert: true,
},
};
This updateOne has almost the same logic as the one for appV6R1, with the differences being that we also increment the totals to pre-compute the quarter totals for the document. From a logic perspective, this operation is equal to the Bulk Upsert of appV5R4, but from an implementation perspective, it's way easier to write and understand, and from an execution perspective, it's less costly/intensive for having fewer stages and operations. This simplicity may also contribute to a better performance of the Computed Pattern when compared to appV5R4.
Get Reports
To fulfill the Get Reports operation, five aggregation pipelines are required, one for each date interval. Each pipeline follows the same structure, differing only in the filtering criteria in the $match stage:
const pipeline = [
{ $match: docsFromKeyBetweenDate },
{ $addFields: buildTotalsField },
{ $group: groupSumTotals },
{ $project: { _id: 0 } },
];
This aggregation operation has a similar logic to the one in appV5R4, because of the pre-computed totals field, and the one in appV6R1, because the items field is of type document. The difference when compared to appV6R1 relies only on the $addFields stage. The complete code for this aggregation pipeline is quite complicated. Because of that, we will have just a pseudocode for it here.
{ $addFields: buildTotalsField }:
- A similar implementation to the one in
appV6R1 - The main difference is if the quarter’s date range is within the limits of the report’s date range, we can use the pre-computed
totalsinstead of calculating the value through a$reduceoperation. -
The following JavaScript code is logic equivalent to the real aggregation pipeline code.
let totals; if (documentQuarterWithinReportDateRange) { // Use pre-computed quarterly totals totals = document.totals; } else { // Fall back to item-level aggregation const [YYYY] = _id.slice(-6, -2).toString(); // Get year from _id const items_array = Object.entries(items); // Convert the object to an array of [key, value] const totals = items_array.reduce( (accumulator, [MMDD, status]) => { let [MM, DD] = [MMDD.slice(0, 2), MMDD.slice(2, 4)]; let statusDate = new Date(`${YYYY}-${MM}-${DD}`); if (statusDate >= reportStartDate && statusDate < reportEndDate) { accumulator.a += status.a || 0; accumulator.n += status.n || 0; accumulator.p += status.p || 0; accumulator.r += status.r || 0; } return accumulator; }, { a: 0, n: 0, p: 0, r: 0 } ); }
Indexes
No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation.
Initial Scenario Statistics
Collection Statistics
To evaluate the performance of appV6R2, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions:
| Collection | Documents | Data Size | Document Size | Storage Size | Indexes | Index Size |
|---|---|---|---|---|---|---|
| appV5R3 | 33,429,492 | 11.96GB | 385B | 3.24GB | 1 | 1.11GB |
| appV6R1 | 33,429,366 | 8.19GB | 264B | 2.34GB | 1 | 1.22GB |
| appV6R2 | 33,429,207 | 9.11GB | 293B | 2.8GB | 1 | 1.26GB |
Event Statistics
To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total Data Size and Index Size by the 500 million events.
| Collection | Data Size/events | Index Size/events | Total Size/events |
|---|---|---|---|
| appV5R3 | 25.7B | 2.4B | 28.1B |
| appV6R1 | 17.6B | 2.6B | 20.2B |
| appV6R2 | 19.6B | 2.7B | 22.3B |
As expected, we had a 11.2% increase in the Document Size by adding a totals field in each document of appV6R2. When comparing to appV5R3, we still have a reduction of 23.9% in the Document Size. Let's go to the Load Test Results and see if the trade-off between storage and computation cost will be worth it.
Load Test Results
Executing the load test for appV6R2 and plotting it alongside the results for appV6R1 and Desired rates, we have the following results for Get Reports and Bulk Upsert.
Get Reports Rates
We can clearly see that appV6R2 has better rates than appV6R1 throughout the test, but still not reaching the top rate of 250 reports per second.
Get Reports Latency
As happened in the rates graph, appV6R2 provides lower latency than appV6R1 throughout the test.
Bulk Upsert Rates
Both versions have very similar rates values throughout the test, with appV6R2 being slightly better than appV6R1 in the final 20 minutes of the test, but still not being able to reach the desired rate.
Bulk Upsert Latency
Even though appV6R2 had better rates values than appV6R1, when looking at their latency it's not possible to point a winner, with appV6R2 being better in the first and final quartes and appV6R1 being better in the second and third quarters.
Performance Summary
The two "maybes" from the previous Issues and Improvements made up for their promises, and we got the best performance for appV6R2 when comparing to appV6R1. This is the redemption of the Computed Pattern applied on a document level. This revision is one of my favorites because it shows that the same optimization on very similar applications can lead to different results. In our case, the difference was caused by the application being very bottlenecked by the disk throughput.
Issues and Improvements
Let's tackle the last improvement on an application level. Those paying a close attention through the application versions may have already questioned it. In every Get Reports section, we have "To fulfill the Get Reports operation, five aggregation pipelines are required, one for each date interval". Do we really need to run five aggregation pipelines to generate the reports document? Isn't there a way to calculate everything in just one operation? The answer is "Yes", there is.
The reports document is composed of the fields oneYear, threeYears, fiveYears, sevenYears, and tenYears, where, until now, each one was generated by its own aggregation pipeline. Generating the reports this way is a waste of processing power because we are doing some part of the calculation multiple times. For example, to calculate the status totals for tenYears, we will also have to calculate the status totals for the others fields, as from a date range perspective, they are all contained in the tenYears date range.
So, for our next application revision, we'll condense the Get Reports five aggregation pipelines into one, avoiding wasting processing power on repeated calculation.
Application Version 6 Revision 3 (appV6R3): Getting Everything at Once
Introduction
As discussed in the previous Issues and Improvements section, in this revision, we'll improve the performance of our application by changing the Get Reports functionality to generate the reports document using only one aggregation pipeline instead of five.
The rationale behind this improvement is that when we generate the tenYears totals, we have also calculated the other totals, oneYear, threeYears, fiveYears, and sevenYears. As an example, when we make a request to Get Reports with the key ...0001 with the date 2022-01-01, the totals will be calculated with the following date range:
-
oneYear: From2021-01-01to2022-01-01 -
threeYears: From2020-01-01to2022-01-01 -
fiveYears: From2018-01-01to2022-01-01 -
sevenYears: From2016-01-01to2022-01-01 -
tenYear: From2013-01-01to2022-01-01
As we can see from the list above, the date range for tenYears includes all the other date ranges.
Although we have successfully implemented the Computed Pattern in the previous revision, appV6R2, and got better results than appV6R1, we won't be using it as a base for this revision. There were two reasons for that:
- Based on the results of our previous implementation of the
Computed Patternon a document level, fromappV5R3toappV5R4, I didn't expect it to get better results. - The implementation of the
Get Reportsto get thereportsdocument through just one aggregation pipeline and also using the pre-computed fieldtotalsgenerated by theComputed Patternwould require a lot of work, and by the time of the latest versions of this series, I just wanted to finish it.
So, this revision will be built based on the appV6R1.
Schema
The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0:
export type SchemaV6R0 = {
_id: Buffer;
items: Record<
string,
{
a?: number;
n?: number;
p?: number;
r?: number;
}
>;
};
Bulk Upsert
Based on the specifications, the following bulk updateOne operation is used for each event generated by the application:
const YYYYMMDD = getYYYYMMDD(event.date); // Extract the year(YYYY), month(MM), and day(DD) from the `event.date`
const operation = {
updateOne: {
filter: { _id: buildId(event.key, event.date) }, // key + year + quarter
update: {
$inc: {
[`items.${YYYYMMDD}.a`]: event.approved,
[`items.${YYYYMMDD}.n`]: event.noFunds,
[`items.${YYYYMMDD}.p`]: event.pending,
[`items.${YYYYMMDD}.r`]: event.rejected,
},
},
upsert: true,
},
};
This updateOne has almost exactly the same logic as the one for appV6R1. The difference is that the name of the fields in the items document will be created based on year, month, and day (YYYYMMDD) instead of just month and day (MMDD). This change was made to reduce the complexity of the aggregation pipeline of the Get Reports.
Get Reports
To fulfill the Get Reports operation, one aggregation pipeline is required,
const pipeline = [
{ $match: docsFromKeyBetweenDate },
{ $addFields: buildTotalsField },
{ $group: groupCountTotals },
{ $project: format },
];
This aggregation operation has a similar logic to the one in appV6R1, with the only differences being the implementation in the $addFields stage.
{ $addFields: buildTotalsField }
- It has a similar logic to the previous revision, where we first convert the
itemsdocument into an array using the$objectToArrayand then we use thereducefunction to iterate over the array, accumulating the status. - The difference lies in the initial value and the logic of the
reducefunction. - The initial value in this case is an object/document with one field for each of the report date ranges. These fields for each report date range are also an object/document, with their fields being the possible status set to zero, as this is the initial value.
- The logic in this case will check in which date range the item is, and based on that, increment the totals. If the item
isInOneYearDateRange(...), it is also in all the other date ranges: three, five, seven, and ten years. If the itemisInThreeYearsDateRange(...), it is also in all the other wide date ranges, five, seven, and ten years. -
The following JavaScript code is logic equivalent to the real aggregation pipeline code. Senior developers could make the argument that this implementation could be less verbose or more optimized, but due to how MongoDB aggregation pipeline operators are specified, this is how it was implemented.
const itemsArray = Object.entries(items); // Convert the object to an array of [key, value] const totals = itemsArray.reduce( (totals, [YYYYMMDD, status]) => { const [YYYY] = YYYYMMDD.slice(0, 4).toString(); // Get year const [MM] = YYYYMMDD.slice(4, 6).toString(); // Get month const [DD] = YYYYMMDD.slice(6, 8).toString(); // Get day let statusDate = new Date(`${YYYY}-${MM}-${DD}`); if isInOneYearDateRange(statusDate) { totals.oneYear = incrementTotals(totals.oneYear, status); totals.threeYears = incrementTotals(totals.threeYears, status); totals.fiveYears = incrementTotals(totals.fiveYears, status); totals.sevenYears = incrementTotals(totals.sevenYears, status); totals.tenYears = incrementTotals(totals.tenYears, status); } else if isInThreeYearsDateRange(statusDate) { totals.threeYears = incrementTotals(totals.threeYears, status); totals.fiveYears = incrementTotals(totals.fiveYears, status); totals.sevenYears = incrementTotals(totals.sevenYears, status); totals.tenYears = incrementTotals(totals.tenYears, status); } else if isInFiveYearsDateRange(statusDate) { totals.fiveYears = incrementTotals(totals.fiveYears, status); totals.sevenYears = incrementTotals(totals.sevenYears, status); totals.tenYears = incrementTotals(totals.tenYears, status); } else if isInSevenYearsDateRange(statusDate) { totals.sevenYears = incrementTotals(totals.sevenYears, status); totals.tenYears = incrementTotals(totals.tenYears, status); } else if isInTenYearsDateRange(statusDate) { totals.tenYears = incrementTotals(totals.tenYears, status); } return totals; }, { oneYear: { a: 0, n: 0, p: 0, r: 0 }, threeYears: { a: 0, n: 0, p: 0, r: 0 }, fiveYears: { a: 0, n: 0, p: 0, r: 0 }, sevenYears: { a: 0, n: 0, p: 0, r: 0 }, tenYears: { a: 0, n: 0, p: 0, r: 0 }, }, );
Indexes
No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation.
Initial Scenario Statistics
Collection Statistics
To evaluate the performance of appV6R3, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions:
| Collection | Documents | Data Size | Document Size | Storage Size | Indexes | Index Size |
|---|---|---|---|---|---|---|
| appV6R1 | 33,429,366 | 8.19GB | 264B | 2.34GB | 1 | 1.22GB |
| appV6R2 | 33,429,207 | 9.11GB | 293B | 2.8GB | 1 | 1.26GB |
| appV6R3 | 33,429,694 | 9.53GB | 307B | 2.56GB | 1 | 1.19GB |
Event Statistics
To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total Data Size and Index Size by the 500 million events.
| Collection | Data Size/events | Index Size/events | Total Size/events |
|---|---|---|---|
| appV6R1 | 17.6B | 2.6B | 20.2B |
| appV6R2 | 19.6B | 2.7B | 22.3B |
| appV6R3 | 20.5B | 2.6B | 23.1B |
Because we are adding the year (YYYY) information in the name of each items document field, we got a 16.3% increase in storage size when compared to appV6R1 and a 4.8% increase in storage size when compared to appV6R2. This increase in storage size may be compensated by the gains in the Get Reports function, as we saw when going from appV6R1 to appV6R2.
Load Test Results
Executing the load test for appV6R3 and plotting it alongside the results for appV6R2, we have the following results for Get Reports and Bulk Upsert.
Get Reports Rate
We have a huge improvement here when going from appV6R2 to appV6R3, for the first time, the application was able to reach all the desired rates in one phase.
Get Reports Latency
The latency also got huge improvements, with the peak value being reduced by 71% in the first phase, 67% in the second phase, 47% in the third phase, and 30% in the fourth phase.
Bulk Upsert Rate
As had happened in the previous version, the application was able to reach all the desired rates.
Bulk Upsert Latency
Here we have one of the biggest gains we had in this series, the latency went from being measured in seconds to being measured in milliseconds. We went from a peak of 1.8 seconds to 250ms in the first phase, from 2.3 seconds to 400ms in the second phase, from 2 seconds to 600ms in the third phase, and from 2.2 seconds to 800ms in the fourth phase
Issues and Improvements
The main bottleneck in our MongoDB server is still the disk throughput. As informed in the previous Issues and Improvements, this was the last improvement on an application level, so how can we extract more from our current hardware?
If we take a closer look at the MongoDB documentation, we'll find out that by default it uses block compression with the snappy compression library for all collections. Before the data is written to disk, it'll be compressed using the snappy library to reduce its size and speed up the writing process.
Would it be possible to use a different and more effective compression library to reduce the size of the data even further and, as a consequence, reduce the load on the server's disk? Yes, it's, and in the next application revision, we will use the zstd compression library instead of the default snappy compression library.
Application Version 6 Revision 4 (appV6R4): The zstd Compression Algorithm
Introduction
As discussed in the previous Issues and Improvements section, the performance gains of this version will be provided by changing the algorithm of the collection block compressor. By default, MongoDB uses the snappy, which we will change to zstd to have a better compression performance on the expense of more CPU usage.
All the schemas, functions, and code from this version are exactly the same as the appV6R3.
To create a collection that uses the zstd compression algorithm, the following command can be used.
db.createCollection("<collection-name>", {
storageEngine: { wiredTiger: { configString: "block_compressor=zstd" } },
});
Schema
The application implementation presented above would have the following TypeScript document schema denominated SchemaV6R0:
export type SchemaV6R0 = {
_id: Buffer;
items: Record<
string,
{
a?: number;
n?: number;
p?: number;
r?: number;
}
>;
};
Bulk Upsert
Based on the specifications, the following bulk updateOne operation is used for each event generated by the application:
const YYYYMMDD = getYYYYMMDD(event.date); // Extract the year(YYYY), month(MM), and day(DD) from the `event.date`
const operation = {
updateOne: {
filter: { _id: buildId(event.key, event.date) }, // key + year + quarter
update: {
$inc: {
[`items.${YYYYMMDD}.a`]: event.approved,
[`items.${YYYYMMDD}.n`]: event.noFunds,
[`items.${YYYYMMDD}.p`]: event.pending,
[`items.${YYYYMMDD}.r`]: event.rejected,
},
},
upsert: true,
},
};
This updateOne has exactly the same logic as the one for appV6R3.
Get Reports
Based on what was presented in the Introduction, we have the following aggregation pipeline to generate the reports document.
const pipeline = [
{ $match: docsFromKeyBetweenDate },
{ $addFields: buildTotalsField },
{ $group: groupCountTotals },
{ $project: format },
];
This pipeline has exactly the same logic as the one for appV6R3.
Indexes
No additional indexes are required, maintaining the single _id index approach established in the appV4 implementation.
Initial Scenario Statistics
Collection Statistics
To evaluate the performance of appV6R4, we inserted 500 million event documents into the collection using the schema and Bulk Upsert function described earlier. For comparison, the tables below also include statistics from previous comparable application versions:
| Collection | Documents | Data Size | Document Size | Storage Size | Indexes | Index Size |
|---|---|---|---|---|---|---|
| appV6R3 | 33,429,694 | 9.53GB | 307B | 2.56GB | 1 | 1.19GB |
| appV6R4 | 33,429,372 | 9.53GB | 307B | 1.47GB | 1 | 1.34GB |
Event Statistics
To evaluate the storage efficiency per event, the Event Statistics are calculated by dividing the total Data Size and Index Size by the 500 million events.
| Collection | Storage Size/events | Index Size/events | Total Storage Size/events |
|---|---|---|---|
| appV6R3 | 5.5B | 2.6B | 8.1B |
| appV6R4 | 3.2B | 2.8B | 6.0B |
As the application implementation of appV6R4 is the same as appV5R3, the values for Data Size, Document Size, and Index Size are the same. The difference lies in Storage Size, which represents the Data Size after compression. Going from snappy to zstd decreased the Storage Size in a jaw-dropping 43%. Looking at the Event Statistics, there was a reduction of 26% of the storage required to register each event, going from 8.1 bytes to 6 bytes. These considerable reductions in size will probably translate to better performance on this version, as our main bottleneck is disk throughput.
Load Test Results
Executing the load test for appV6R4 and plotting it alongside the results for appV6R3, we have the following results for Get Reports and Bulk Upsert.
Get Reports Rate
Even though we weren't able to reach all the desired rates, we got another huge improvement when going from appV6R3 to appV6R4, we could almost consider that in this revision, we were also able to reach the desired rates in the first, second and third quarters.
Get Reports Latency
The latency also got huge improvements, with the peak value being reduced by 30% in the first phase, 57% in the second phase, 61% in the third phase, and 57% in the fourth phase.
Bulk Upsert Rate
As had happened in the previous version, the application was able to reach all the desired rates.
Bulk Upsert Latency
Here we also got considerable improvements, with the peak value being reduced by 48% in the first phase, 39% in the second phase, 43% in the third phase, and 47% in the fourth phase.
Issues and Improvements
Although this is the last version and revision of the series, there is still room for improvement. For those willing to try them by themselves, here are the ones that I was able to think of:
- Use the
Computed Patternin theappV6R4. - Optimize the aggregation pipeline logic for
Get Reportsin theappV6R4. - Change the
zstdcompression level from its default value 6 to a higher value.
Conclusion
This final part of "The Cost of Not Knowing MongoDB" series has explored the ultimate evolution of MongoDB application optimization, demonstrating how revolutionary design patterns and infrastructure-level improvements can transcend traditional performance boundaries. The journey through appV6R0 to appV6R4 represents the culmination of sophisticated MongoDB development practices, achieving performance levels that seemed impossible with the baseline appV1 implementation.
Series Transformation Summary
From Foundation to Revolution:
The complete series showcases a remarkable transformation across three distinct optimization phases:
-
Part 1 (
appV1-appV4): Document-level optimizations achieving 51% storage reduction through schema refinement, data type optimization, and strategic indexing -
Part 2 (
appV5R0-appV5R4): Advanced pattern implementation with Bucket and Computed patterns, delivering 89% index size reduction and first-time achievement of target rates -
Part 3 (
appV6R0-appV6R4): Revolutionary Dynamic Schema Pattern with infrastructure optimization, culminating in sub-second latencies and comprehensive target rate achievement
Performance Evolution:
The progression reveals exponential improvements across all metrics:
-
Get Reports Latency: From 6.5 seconds (
appV1) to 200-800ms (appV6R4) - a 92% improvement -
Bulk Upsert Latency: From 62 seconds (
appV1) to 250-800ms (appV6R4) - a 99% improvement -
Storage Efficiency: From 128.1B per event (
appV1) to 6.0B per event (appV6R4) - a 95% reduction - Target Rate Achievement: From consistent failures to sustained success across all operational phases
Architectural Paradigm Shifts
The Dynamic Schema Pattern Revolution:
appV6R0 through appV6R4 introduced the most sophisticated MongoDB design pattern explored in this series. The Dynamic Schema Pattern fundamentally redefined data organization by:
- Eliminating Array Overhead: Replacing MongoDB arrays with computed object structures to minimize storage and processing costs
- Single-Pipeline Optimization: Consolidating five separate aggregation pipelines into one optimized operation, reducing computational overhead by 80%
-
Infrastructure-Level Optimization: Implementing
zstdcompression, achieving 43% additional storage reduction over defaultsnappycompression
Query Optimization Breakthroughs:
The implementation of intelligent date range calculation within aggregation pipelines eliminated redundant operations while maintaining data accuracy. This approach demonstrates senior-level MongoDB development by leveraging advanced aggregation framework capabilities to achieve both performance and maintainability.
Critical Technical Insights
Performance Bottleneck Evolution:
Throughout the series, we observed how optimization focus shifted as bottlenecks were resolved:
- Initial Phase: Index size and query inefficiency dominated performance
- Intermediate Phase: Document retrieval count became the limiting factor
- Advanced Phase: Aggregation pipeline complexity constrained throughput
- Final Phase: Disk I/O emerged as the ultimate hardware limitation
Pattern Application Maturity:
The series demonstrates the progression from junior to senior MongoDB development practices:
-
Junior Level: Schema design without understanding indexing implications (
appV1) -
Intermediate Level: Applying individual optimization techniques (
appV2-appV4) -
Advanced Level: Implementing established MongoDB patterns (
appV5RX) -
Senior Level: Creating custom patterns and infrastructure optimization (
appV6RX)
Production Implementation Guidelines
When to Apply Each Pattern:
Based on the comprehensive analysis, the following guidelines emerge for production implementations:
- Document-Level Optimizations: Essential for all MongoDB applications, providing 40-60% improvement with minimal complexity
- Bucket Pattern: Optimal for time-series data with 10:1 or greater read-to-write ratios
- Computed Pattern: Most effective in read-heavy scenarios with predictable aggregation requirements
- Dynamic Schema Pattern: Reserved for high-performance applications where development complexity trade-offs are justified
Infrastructure Considerations:
The zstd compression implementation in appV6R4 demonstrates that infrastructure-level optimizations can provide substantial benefits (40%+ storage reduction) with minimal application changes. However, these optimizations require careful CPU utilization monitoring and may not be suitable for CPU-constrained environments.
The True Cost of Not Knowing MongoDB
This series reveals that the "cost" extends far beyond mere performance degradation:
Quantifiable Impacts:
- Resource Utilization: Up to 20x more storage requirements for equivalent functionality
- Infrastructure Costs: Potentially 10x higher hardware requirements due to inefficient patterns
- Developer Productivity: Months of optimization work that could be avoided with proper initial design
- Scalability Limitations: Fundamental architectural constraints that become exponentially expensive to resolve
Hidden Complexities:
More critically, the series demonstrates that MongoDB's apparent simplicity can mask sophisticated optimization requirements. The transition from appV1 to appV6R4 required a deep understanding of:
- Aggregation framework internals and optimization strategies
- Index behavior with different data types and query patterns
- Storage engine compression algorithms and trade-offs
- Memory management and cache utilization patterns
Final Recommendations
For Development Teams:
- Invest in MongoDB Education: The performance differences documented in this series justify substantial training investments
- Establish Pattern Libraries: Codify successful patterns like those demonstrated to prevent anti-pattern adoption
- Implement Performance Testing: Regular load testing reveals optimization opportunities before they become production issues
- Plan for Iteration: Schema evolution is inevitable; design systems that accommodate architectural improvements
For Architectural Decisions:
- Start with Fundamentals: Proper indexing and schema design provide the foundation for all subsequent optimizations
- Measure Before Optimizing: Each optimization phase in this series was guided by comprehensive performance measurement
- Consider Total Cost of Ownership: The development complexity of advanced patterns must be weighed against performance requirements
- Plan Infrastructure Scaling: Understanding that hardware limitations will eventually constrain software optimizations
Closing Reflection
The journey from appV1 to appV6R4 demonstrates that MongoDB mastery requires understanding not just the database itself, but the intricate relationships between schema design, query patterns, indexing strategies, aggregation frameworks, and infrastructure capabilities. The 99% performance improvements documented in this series are achievable, but they demand dedication to continuous learning and sophisticated engineering practices.
For organizations serious about MongoDB performance, this series provides both a roadmap for optimization and a compelling case for investing in advanced MongoDB expertise. The cost of not knowing MongoDB extends far beyond individual applications—it impacts entire technology strategies and competitive positioning in data-driven markets.
The patterns, techniques, and insights presented throughout this three-part series offer a comprehensive foundation for building high-performance MongoDB applications that can scale efficiently while maintaining operational excellence. Most importantly, they demonstrate that with proper knowledge and application, MongoDB can deliver extraordinary performance that justifies its position as a leading database technology for modern applications.




















Top comments (0)