DEV Community

Mouloud hasrane
Mouloud hasrane

Posted on

2 1

Understanding Replication and Sharding in System Design

When discussing system design, terms like sharding, partitioning, replication, and master-slave relationships frequently come up. If these concepts seem confusing, don't worry! This article will clarify the differences between replication and sharding to help you build a solid foundation in distributed database systems.
ps: images are from building data intensive systems book be sure to check it out

Replication

Replication involves creating multiple instances of a database that contain the same data as the main instance. This allows requests to be distributed among replicas, improving system performance and availability. But how do we decide which requests go to which instance? Replication can be categorized into different types, but in practice, two are most commonly used:

1. Master-Slave Replication (Leader-Follower Replication)

In this model, a leader (master) database handles both read and write operations, while followers (slaves) handle only read operations. Any write operation made to the leader is replicated to the followers, either synchronously or asynchronously (we’ll discuss these approaches in detail in a future article).

Leader follower replication

Why is this useful?

Most systems are read-heavy rather than write-heavy, meaning they perform far more read operations than writes. By distributing read queries to multiple followers, we can significantly reduce the load on the leader instance, improving performance and scalability.

2. Master-Master Replication (Leader-Leader Replication)

Now, imagine our system has grown to the point where we need multiple data centers worldwide to minimize latency. If we rely on leader-follower replication, choosing a single leader across all data centers would lead to slow write operations for users far from that leader.

Leader-leader replication

With leader-leader replication, each data center has its own leader. When a write operation is performed at one leader, it is propagated to all other leaders, which then replicate it to their followers, in other words each leader acts as a follower to the leader that just got the write operations and then as a leader to send the new data to his followers
Challenges:

While this approach improves performance and fault tolerance, it introduces challenges such as write conflicts, where two leaders handle conflicting writes at the same time. Handling such conflicts requires careful resolution strategies, such as conflict-free replicated data types (CRDTs) or last-write-wins (LWW) mechanisms.

Sharding

Unlike replication, which creates copies of the same database, sharding (also called partitioning) splits the database into smaller subsets, called shards, each containing a portion of the data. This allows for better scalability by distributing requests across multiple shards.
How do we determine which shard a request should go to?

Two common techniques for sharding are:

1. Key Range Sharding

In this approach, data is divided into shards based on a predefined range of values. Think of it like an encyclopedia where different volumes contain different alphabetic sections of words.

Key range sharding

Example: User IDs from 1–1000 go to Shard A, 1001–2000 go to Shard B, and so on.

Advantage: Simple to implement.

Disadvantage: Can lead to hot spots—if a specific range receives more requests than others, that shard will become overloaded.

2. Hash-Based Sharding

Here, a hash function is applied to a data attribute (e.g., User ID) to determine which shard should store the data. A well-designed hash function distributes data evenly across shards.

hash based sharding

say for example If hash(UserID) % 3 == 0, the request is sent to Shard A; if hash(UserID) % 3 == 1, it goes to Shard B, and so on.

Advantage: Ensures even distribution and prevents hot spots.

Disadvantage: Can make it difficult to perform range queries, as data is spread across multiple shards unpredictably.
Enter fullscreen mode Exit fullscreen mode

Rebalancing Shards

As data grows and system demands change, sharded databases must be rebalanced to maintain efficiency. This involves redistributing data and queries among nodes while minimizing disruptions. Rebalancing is crucial to:

Prevent overload on specific shards.

Maintain system availability for read/write operations.

Minimize data movement to reduce network and disk I/O overhead.
Enter fullscreen mode Exit fullscreen mode

We will discuss different rebalancing strategies and common pitfalls in a future article.

why bother using any of those techniques :

as the system scales up or single instance database will not be able to handle all of the request efficiently, which will cause reduce the response time , that will affect the user experience which in of itself will affect our sales and user count, you may think no problem let's just get a stronger pc that can handle those request , but as we keep scaling up the hardware will be costly and hard to find (you'll reach a point when you have the most powerful disk out there to run you db on ),for that reason we use this strategies to keep or response time and overall experience smooth no matter how much our app scales.
Final Thoughts

Now that you have a fundamental understanding of replication and sharding, you can explore their trade-offs and edge cases in greater detail. Each technique has its own benefits and challenges, and the best approach depends on your system’s requirements.

Stay tuned for more in-depth articles on advanced replication strategies, handling conflicts in distributed databases, and optimizing sharding techniques!

Happy learning! 🚀

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

Billboard image

Try REST API Generation for Snowflake

DevOps for Private APIs. Automate the building, securing, and documenting of internal/private REST APIs with built-in enterprise security on bare-metal, VMs, or containers.

  • Auto-generated live APIs mapped from Snowflake database schema
  • Interactive Swagger API documentation
  • Scripting engine to customize your API
  • Built-in role-based access control

Learn more

AWS GenAI LIVE!

GenAI LIVE! is a dynamic live-streamed show exploring how AWS and our partners are helping organizations unlock real value with generative AI.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️