Vacuum Seal Your MongoDB: Cuts Cost Down 50% Today

This article was written by Darshan Jayarama.

I was recently packing for a short trip, and I personally hate carrying 2–3 pieces of luggage. One for my laptop, another for clothes. So a person like me has definitely used a “vacuum sealer,” allowing me to pack as many clothes as I want.

During the trip, I had an idea. Which vacuum sealers do we have to help store efficiently and reduce billing? Here, I have jotted down some…

Online Archive

Does your dataset contain cold data? For a supply chain company, data older than three months is considered infrequently accessed. For an e-commerce website or a courier company, data that has already been delivered is accessed by users very rarely. We can group them using a query, such as lastAccess > 30, and create the Online archive store in cloud object storage. Worried about how to access this when needed? Relax — it’s all covered, there are no access restrictions. Storage costs drop from $4/GB/Month to $0.02/GB/Month.

Table Compact

In a write-intensive application where you frequently delete data, storage watermarks always stay high. You can run the below command to check:

db.COLLNAME.stats().freeStorageSize

The above command outputS how many bytes are available to be reused. Even though wiredTiger uses this space organically over time, if the storage utilization is effectively essential, you can run compact on the collection.

We can run compact with the following syntax:

use mydb // switch to the database 
db.runCommand( { compact: "myCollection" } ) // mention the collection name

As it also has many other options, I would recommend visiting the official documentation of compact.

Resync Secondary:

Running compact is fine if you’ve got just a handful of collections. But when you’re dealing with many collections with high watermarks, especially large ones, running compact isn't ideal.

You can resync a member in rolling fashion so that all members can release that space back to the operating system and utilize the storage efficiently.

TTL Index:

So far, we have discussed the reactive approach for storage utilization. We are preparing the plan once the storage hits the ceiling. But do we have any such proactive approach? Yes, here is where TTL comes into play. If you believe the data is no longer needed after a certain time, set the TTL index to expire, and the record will be deleted.

This approach will be useful for the application that generates large logs. Set the TTL index for 3 days, and the log entry will be removed from the collection.

Query Optimization:

Being a query enthusiast, I circle back to query optimization, as a single bad query can eat up your IOPS, Network, and Compute.

Tune the query.
Create a compound index covering the query and the projection
Reduce the network round-trip and IOPS
If possible, precompute the value and store it in a collection.

Don't know where to start? Use/Abuse the Atlas Performance Advisor for index recommendations, inefficient queries, and inefficient schema advice. Or visit my previous blogs:

Flex Cluster:

Planning to perform some dev/API test? Go with the Flex cluster rather than the M10+ cluster for non-prod deployment. Same API, essentially Pause/Resume when not needed.

Conclusion:

MongoDB gives tons of storage-saving options — but only smart DBAs use them. Your cluster is bleeding $$$ right now. Online Archive, TTL indexes, compaction, proper sizing = 60% instant savings.

Pick 3 techniques → Deploy → Watch bill shrink. Simple as that. 💰