MongoDB is a popular document-oriented NoSQL database known for its flexibility, scalability, and high performance. However, to achieve optimal performance from MongoDB, you need to follow some key optimization strategies. In this article, we will explore some tips for optimizing MongoDB.
Index appropriately
Indexing allows MongoDB to efficiently execute queries by rapidly searching indexed fields only. Without indexes, MongoDB would have to scan every document of a collection to select matching documents.
Proper indexes are critical for fast query performance. You should create indexes on fields that are frequently queried.
// Create index on `name` field
db.products.createIndex({name: 1})
// Create compound index on `name` and `price`
db.products.createIndex({name: 1, price: -1})
Use covered queries
Covered queries are queries where the indexes contain all the fields scanned by the query. This allows MongoDB to get all the queried data from the index, without having to look up the documents at all.
// Covered query
db.products.find(
{price: {$gt: 30}},
{name: 1, price: 1} // fields
)
Here the index contains both name and price fields, so no document lookups are needed.
Embed related data
MongoDB uses a flexible schema design that allows embedding related data in documents. Embedded related data results in fewer queries and joins.
// Embed 'comments' array in product document
{
name: "Product 1",
price: 100,
comments: [
{user: "user1", text: "Nice!"},
{user: "user2", text: "Lovely"}
]
}
Now retrieving comments just needs a single query on products collection, avoiding joins.
Use sharding for horizontal scaling
Sharding distributes data across multiple servers called shards. This provides horizontal scalability, as well as improves read/write performance.
// Enable sharding for 'products' collection
sh.enableSharding("mydb")
sh.shardCollection("mydb.products", {name: "hashed"})
Sharding routes reads/writes intelligently to appropriate shards based on the shard key.
These are some key techniques for optimizing MongoDB performance. Proper indexing, covered queries, embedding related data, and sharding allow realizing the full potential of MongoDB.
Use connection pooling
MongoDB connections can be expensive to establish and tear down repeatedly. Opening a new connection for every database operation can result in significant performance overhead.
Connection pooling helps mitigate this by maintaining a pool of connections that can be reused, rather than opening and closing connections constantly.
In MongoDB, connection pooling is handled by the driver. Here is example code using the official Node.js driver:
// Create MongoClient with connection pooling
const MongoClient = require('mongodb').MongoClient;
const client = new MongoClient(uri, {
poolSize: 10, // maintain up to 10 connections
ssl: true,
auth: {
user: 'user',
password: 'pass'
}
});
// Get a connected client from the pool
async function run() {
const db = client.db('mydb');
// Reuse connections from the pool
await db.collection('customers').insertOne({name: 'John'});
await db.collection('orders').find({}).toArray();
client.close();
}
run().catch(console.error);
The key points are:
- Set a
poolSize
to control max connections - Use
client.db()
to get DB connections from the pool - The driver handles efficiently reusing connections
- Donβt forget to call
client.close()
to clean up So in this way, we can reduce connection overhead and improve MongoDB performance using connection pooling.
Use replication for redundancy
Replication provides redundancy and high availability by maintaining multiple copies of data. MongoDB replicates data across replica sets which contain primary and secondary nodes.
Here is an example replica set configuration:
var replSet = new ReplSet([
new Mongos({
_id: 0,
host: "mongodb1.example.com",
priority: 2
}),
new Mongos({
_id: 1,
host: "mongodb2.example.com"
}),
new Mongos({
_id: 2,
host: "mongodb3.example.com"
})
]);
replSet.initiate();
This defines a replica set with:
- Primary node (priority 2) at mongodb1.example.com
- Two secondary nodes at mongodb2 and mongodb3
- Node IDs 0, 1, and 2 To use this in a Node.js app:
// Connect to replica set
const MongoClient = require('mongodb').MongoClient;
const uri = "mongodb://mongodb1.example.com,mongodb2.example.com,mongodb3.example.com/?replicaSet=replSet";
MongoClient.connect(uri, function(err, client) {
const db = client.db("mydb");
// Read/write to primary
db.collection("customers").find(...);
db.collection("orders").insertOne(...);
});
This connects to the replica set and directs reads/writes automatically to the primary node. The driver handles failover if the primary goes down.
Read more: Mongo Indexes you should be known
Happy coding!
Top comments (1)
Hi, great article! I just have a dilemma when dealing with embed related data. In your example, comments are included in the schema. What if the comments grow to hunderds or millions or more?