Jacky

Posted on Oct 11, 2023

Boost Your MongoDB Performance: Indexing, Embedding, and Sharding Techniques

#webdev #mongodb #sharding #indexing

MongoDB is a popular document-oriented NoSQL database known for its flexibility, scalability, and high performance. However, to achieve optimal performance from MongoDB, you need to follow some key optimization strategies. In this article, we will explore some tips for optimizing MongoDB.

Index appropriately

Indexing allows MongoDB to efficiently execute queries by rapidly searching indexed fields only. Without indexes, MongoDB would have to scan every document of a collection to select matching documents.

Proper indexes are critical for fast query performance. You should create indexes on fields that are frequently queried.

// Create index on `name` field 
db.products.createIndex({name: 1})

// Create compound index on `name` and `price`
db.products.createIndex({name: 1, price: -1})

Use covered queries

Covered queries are queries where the indexes contain all the fields scanned by the query. This allows MongoDB to get all the queried data from the index, without having to look up the documents at all.

// Covered query
db.products.find(
    {price: {$gt: 30}}, 
    {name: 1, price: 1} // fields 
)

Here the index contains both name and price fields, so no document lookups are needed.

Embed related data

MongoDB uses a flexible schema design that allows embedding related data in documents. Embedded related data results in fewer queries and joins.

// Embed 'comments' array in product document
{
   name: "Product 1",
   price: 100,
   comments: [
      {user: "user1", text: "Nice!"},
      {user: "user2", text: "Lovely"}
   ]
}

Now retrieving comments just needs a single query on products collection, avoiding joins.

Use sharding for horizontal scaling

Sharding distributes data across multiple servers called shards. This provides horizontal scalability, as well as improves read/write performance.

// Enable sharding for 'products' collection
sh.enableSharding("mydb")  
sh.shardCollection("mydb.products", {name: "hashed"})

Sharding routes reads/writes intelligently to appropriate shards based on the shard key.

These are some key techniques for optimizing MongoDB performance. Proper indexing, covered queries, embedding related data, and sharding allow realizing the full potential of MongoDB.

Use connection pooling

MongoDB connections can be expensive to establish and tear down repeatedly. Opening a new connection for every database operation can result in significant performance overhead.

Connection pooling helps mitigate this by maintaining a pool of connections that can be reused, rather than opening and closing connections constantly.

In MongoDB, connection pooling is handled by the driver. Here is example code using the official Node.js driver:

// Create MongoClient with connection pooling
const MongoClient = require('mongodb').MongoClient;

const client = new MongoClient(uri, {
  poolSize: 10, // maintain up to 10 connections
  ssl: true,
  auth: {
    user: 'user',
    password: 'pass'
  }
});

// Get a connected client from the pool 
async function run() {
  const db = client.db('mydb');

  // Reuse connections from the pool
  await db.collection('customers').insertOne({name: 'John'});

  await db.collection('orders').find({}).toArray();

  client.close(); 
}

run().catch(console.error);

The key points are:

Set a poolSize to control max connections
Use client.db() to get DB connections from the pool
The driver handles efficiently reusing connections
Don’t forget to call client.close() to clean up So in this way, we can reduce connection overhead and improve MongoDB performance using connection pooling.

Use replication for redundancy

Replication provides redundancy and high availability by maintaining multiple copies of data. MongoDB replicates data across replica sets which contain primary and secondary nodes.

Here is an example replica set configuration:

var replSet = new ReplSet([
  new Mongos({
    _id: 0, 
    host: "mongodb1.example.com",
    priority: 2
  }),
  new Mongos({
    _id: 1,
    host: "mongodb2.example.com"  
  }),
  new Mongos({
    _id: 2,
    host: "mongodb3.example.com"
  })
]);

replSet.initiate();

This defines a replica set with:

Primary node (priority 2) at mongodb1.example.com
Two secondary nodes at mongodb2 and mongodb3
Node IDs 0, 1, and 2 To use this in a Node.js app:

// Connect to replica set 
const MongoClient = require('mongodb').MongoClient;
const uri = "mongodb://mongodb1.example.com,mongodb2.example.com,mongodb3.example.com/?replicaSet=replSet";

MongoClient.connect(uri, function(err, client) {

  const db = client.db("mydb");

  // Read/write to primary
  db.collection("customers").find(...); 
  db.collection("orders").insertOne(...);

});

This connects to the replica set and directs reads/writes automatically to the primary node. The driver handles failover if the primary goes down.

Happy coding!

Top comments (1)

farizmamad • Jan 16 '24

Hi, great article! I just have a dilemma when dealing with embed related data. In your example, comments are included in the schema. What if the comments grow to hunderds or millions or more?

DEV Community

Boost Your MongoDB Performance: Indexing, Embedding, and Sharding Techniques

Index appropriately

Use covered queries

Embed related data

Use sharding for horizontal scaling

Use connection pooling

Use replication for redundancy

Top comments (1)

Read next

Building an AI-Powered Git Commit Report Generator: Dev Log 1

🚀 Generate Dynamic PDFs in Laravel with DomPDF

3D Text Art Showcase: Gladiators Battle ⚔️

Mastering SQlite Commands: A Beginner's Guide