This article was written by Karen Zhang.
MongoDB is built for scale. Its distributed architecture combines replica sets, sharding, and intelligent query routing to handle high traffic and large datasets without requiring you to endlessly upgrade a single server.
This article covers how each of these mechanisms works, when to apply them, and what you need to get right to avoid performance problems in production.
Horizontal Scaling in MongoDB
1. The limits of vertical scaling
When a database starts slowing down, the instinct is to add more hardware: more CPU cores, more RAM, faster storage. This is vertical scaling, and it works up to a point.
The problem is that a single server can only get so large. Beyond a certain threshold, the cost grows exponentially while the performance gains diminish. More importantly, a single powerful server is still a single point of failure. If it goes down, your entire application goes with it.
Horizontal scaling solves this differently. Instead of making one machine larger, you distribute the work across multiple machines. MongoDB is designed for this from the ground up.
2. Replica sets for availability and read scalability
The foundation of MongoDB’s horizontal architecture is the replica set — a group of MongoDB instances that all hold the same data, kept in sync through continuous replication.
A replica set consists of:
- Primary: The only member that accepts write operations. All writes go here.
- Secondaries: Members that replicate from the primary’s oplog (operation log) and maintain a copy of the data.
- Arbiter (optional): Participates in elections but holds no data. Used to break ties in smaller deployments.
Primary ──oplog──► Secondary 1
──oplog──► Secondary 2
──oplog──► Secondary 3 (hidden, used for backups)
When the primary becomes unavailable, the remaining members automatically elect a new primary — typically within a few seconds — with no manual intervention required. For deployment architecture options, refer to replica set deployment architectures.
3. Handling concurrent reads and writes at scale
By default, all reads and writes go through the primary. This keeps things simple and guarantees you’re always reading the latest data. For many applications, this is the right choice.
For read-heavy workloads where some replication lag is acceptable — analytics dashboards, reporting queries, batch exports — you can offload reads to secondaries. This distributes read traffic across members and reduces pressure on the primary.
Please note that secondary reads are not appropriate for anything requiring read-your-own-writes consistency. If a user writes a record and immediately reads it back, routing that read to a secondary may return stale data.
4. Traffic distribution using read preferences
MongoDB gives you five read preference modes to control where read operations are sent:
| Mode | Behavior |
|---|---|
| primary | All reads from primary (default). |
| primaryPreferred | Primary when available, fallback to secondary. |
| secondary | Always read from a secondary. |
| secondaryPreferred | Secondary, when available, fallback to primary. |
| nearest | Member with the lowest network latency, regardless of role. Does not guarantee reads stay within a region — latency measurements can change and may cross region boundaries. For strict regional routing, use tags with readPref('secondary', [{ region: 'us-east' }]) rather than relying on nearest alone. |
For multi-region deployments, nearest is particularly useful. Users in Europe read from European replicas, users in Asia read from Asian replicas — MongoDB’s server selection algorithm handles this automatically based on measured network latency.
// Route reads to a secondary tagged for a specific region
db.orders.find({ status: "completed" }).readPref("secondary", [{ region: "eu-west" }])
For a detailed breakdown of when each mode is appropriate, refer to read preference use cases.
Sharding for Large Datasets
1. When and why sharding is required
Replica sets solve availability and read scalability, but they do not solve storage capacity or write throughput. Every replica holds a complete copy of your data, so storage scales with the number of nodes rather than across them. Write throughput is still bounded by what the primary can handle alone.
Sharding addresses this by distributing data across multiple replica sets, each holding a subset of the data. You should consider sharding when:
- Your dataset is too large for a single server to store efficiently
- Write throughput has saturated your primary node
- Your working set no longer fits in memory, causing excessive disk I/O
Do not shard prematurely. A well-indexed, well-resourced replica set handles far more load than most applications will ever generate. Sharding adds operational complexity, and reversing it is painful. Only shard when you have clear evidence you need it. Review operational restrictions in sharded clusters before committing.
2. Sharded cluster components
A sharded MongoDB cluster is made up of three types of components:
Shards: Each shard is a replica set that holds a subset of your sharded data. With three shards, each holds roughly one-third of the collection. Because each shard is itself a replica set, you retain fault tolerance at the shard level.
Config servers: A replica set that stores the cluster’s metadata — specifically, the mapping of which chunk ranges live on which shard. Config servers must always be available; if they go down, the cluster cannot route new operations.
mongos: A lightweight query router that sits between your application and the shards. Your application connects to mongos exactly as it would a standalone MongoDB instance — the sharding complexity is invisible. mongos caches routing metadata from the config servers and forwards operations to the appropriate shard or shards.
Application
│
▼
mongos (query router)
│
├──► Shard 1 (replica set: primary + 2 secondaries)
├──► Shard 2 (replica set: primary + 2 secondaries)
└──► Shard 3 (replica set: primary + 2 secondaries)
Config Servers (replica set)
3. How data is distributed across shards
MongoDB distributes data at the collection level using data partitioning with chunks. When you shard a collection, you specify a shard key — a field or combination of fields that MongoDB uses to divide data into chunks and assign those chunks to shards.
Each chunk covers a contiguous range of shard key values. As data grows, MongoDB’s balancer process monitors chunk distribution and migrates chunks between shards automatically to maintain an even distribution.
Shard Key Design and Performance
1. Importance of shard key selection
Shard key selection is the single most consequential architectural decision in a sharded deployment. The shard key determines how data is distributed across shards, which queries can be routed efficiently, and whether write traffic is balanced or concentrated on a single shard.
Please note that changing a shard key after the fact is extremely disruptive. MongoDB added support for resharding in recent versions, but it is a costly operation. Get this right before you shard.
2. Range vs. hashed shard keys
Range sharding partitions data into contiguous ranges of shard key values. This makes range queries efficient — a query filtering on a date range, for example, can be routed to one or two shards instead of all of them.
The downside is that monotonically increasing shard keys (timestamps, auto-incrementing IDs) cause all new writes to land on the same shard — the one holding the highest range. This creates a write hotspot and defeats the purpose of sharding.
Hashed sharding applies a hash function to the shard key value before distribution. This scatters writes evenly across shards, eliminating hotspots. The trade-off is range queries — because the hash destroys value locality, a range query must be sent to all shards. Note that hashed sharding distributes writes evenly only when the shard key has high cardinality. A hashed boolean field, for example, still concentrates data on two shards at most.
// Range shard key — efficient range queries, risk of write hotspots on monotonic fields
sh.shardCollection("mydb.events", { timestamp: 1 })
// Hashed shard key — even write distribution, no range query efficiency
sh.shardCollection("mydb.events", { _id: "hashed" })
3. Avoiding write hotspots and uneven distribution
A good shard key satisfies three criteria:
- High cardinality: Enough distinct values to distribute data across many chunks. A boolean field, for example, can only ever produce two chunks regardless of how many documents you have.
- Low frequency: No single value that dominates the dataset. If 40% of your documents share the same shard key value, 40% of your data lands on one shard.
- Non-monotonically increasing (for write-heavy workloads): Avoid timestamps or sequential IDs as standalone shard keys. They route all new writes to the maximum chunk, creating a hotspot.
When no single field satisfies all three criteria, a compound shard key can help. Pairing a high-cardinality field with a monotonically increasing field — like { user_id: 1, timestamp: 1 } — can give you even distribution while preserving some query locality.
Query Routing and Scalability
1. Targeted vs. scatter-gather queries
When mongos receives a query, it inspects the filter to determine whether the shard key is included.
- Targeted query: The filter includes the shard key. mongos routes the query directly to the one (or occasionally two) shards that hold the relevant data. This is fast and efficient.
- Scatter-gather query: The filter does not include the shard key. mongos must broadcast the query to every shard, collect all results, and merge them. Latency is bounded by the slowest shard, and cost scales with the number of shards.
// Targeted — shard key (user_id) is included in the filter
db.orders.find({ user_id: "user_123", status: "pending" })
// Scatter-gather — no shard key, every shard is queried
db.orders.find({ status: "pending" })
This is why your access patterns should drive shard key selection. Before sharding, audit your most frequent and most latency-critical queries. The shard key should align with what those queries filter on.
2. Indexing considerations in sharded environments
Each shard maintains its own independent set of indexes. When you create an index on a sharded collection, MongoDB creates it on every shard. A few rules apply:
- Unique indexes must include the shard key as a prefix. MongoDB cannot enforce global uniqueness on a field that isn’t the shard key, because there is no global index spanning all shards. If you need global uniqueness across all shards, the shard key must be a prefix of the unique index. Non-unique indexes have no such restriction.
- The shard key is automatically indexed. You do not need to create an index on it separately.
- Compound indexes that lead with the shard key field work well for targeted queries and can serve as covered indexes, eliminating document fetches entirely.
3. Impact on latency and throughput
Targeted queries in a sharded cluster can be faster than the equivalent query on a single large server, because each shard operates on a smaller dataset. Scatter-gather queries, however, are slower than on a single server — the overhead of fan-out, parallel execution, and result merging adds latency that doesn’t exist when querying one node.
The practical implication: sharding improves throughput and capacity, but it does not automatically improve query latency. Latency improves only when your queries are targeted. Design your shard key accordingly.
Elasticity and Fault Tolerance
1. Automatic balancing and chunk migration
As data grows and new shards are added, chunks become unevenly distributed. MongoDB’s background balancer detects this and migrates chunks between shards automatically to restore balance.
Chunk migration is a live operation. MongoDB copies data to the destination shard, then atomically updates the routing table in the config servers. Reads and writes continue normally throughout — your application sees no interruption.
For write-heavy workloads, migration traffic can compete with application traffic. You can restrict balancing to a scheduled window during off-peak hours:
use config
db.settings.updateOne(
{ _id: "balancer" },
{ $set: { activeWindow: { start: "02:00", stop: "06:00" } } },
{ upsert: true }
)
2. Scaling out with minimal downtime
Adding a new shard to a running cluster is an online operation. You provision the new replica set, register it with sh.addShard(), and the balancer begins migrating chunks to it automatically. No application configuration changes are needed — mongos picks up the updated routing table from the config servers.
sh.addShard("rs3/mongodb-shard3-1:27017,mongodb-shard3-2:27017,mongodb-shard3-3:27017")
If your access patterns change significantly and your current shard key is no longer appropriate, MongoDB’s resharding feature lets you migrate to a new shard key without rebuilding the cluster from scratch — a significant improvement over earlier versions of MongoDB, where the shard key was effectively permanent.
3. High availability through replica sets
Every layer of a sharded cluster is replicated:
- Each shard is a replica set. If a shard’s primary fails, its replica set holds an automatic election and promotes a secondary within seconds.
- The config servers are a replica set. Losing one config server does not take the cluster offline.
- mongos instances are stateless. You can run multiple mongos instances behind a load balancer, and losing one affects only the connections it was actively handling.
There is no single point of failure at any layer of the architecture. For a detailed look at failure scenarios and how MongoDB handles each one, refer to sharded cluster high availability.
Summary and Next Steps
MongoDB’s scaling approach is layered. Replica sets provide read scalability and fault tolerance as a baseline. Sharding extends this to write throughput and storage capacity. mongos keeps the distributed complexity invisible to application code, and automatic balancing ensures the cluster adapts to growth without manual intervention.
Key takeaways for designing scalable MongoDB workloads:
- Do not shard until you have evidence you need to. A well-tuned replica set handles more than most applications require. Shard only when you’ve identified a clear bottleneck that sharding solves.
- Let your access patterns drive shard key selection. The queries you run most frequently should determine the shard key — not your schema or your data model in isolation.
- Use hashed sharding for write-heavy workloads, range sharding when range queries matter. Know which trade-off your application can live with before you commit.
- Every scatter-gather query is a cost that scales with your cluster size. Design for targeted queries wherever possible.
- Use read preferences deliberately. Secondary reads are a useful tool for the right workloads, but they introduce replication lag — make sure that trade-off is acceptable for the feature using them.
For hands-on practice, MongoDB University offers free courses on sharding and replica set administration. The official sharding documentation is also one of the more thorough references available for production deployment considerations.
Top comments (0)