DEV Community

丁久
丁久

Posted on • Originally published at dingjiu1989-hue.github.io

Database Sharding: Strategies and Trade-offs

This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

Database Sharding: Strategies and Trade-offs

What is Sharding?

Sharding splits a database across multiple servers horizontally. Each shard holds a subset of data, allowing linear scalability.

Key-Based Sharding

Hash the shard key to determine the target shard:

class KeyBasedShardManager:

def init(self, num_shards=4):

self.num_shards = num_shards

self.shards = [Shard(i) for i in range(num_shards)]

def get_shard(self, shard_key):

hash_val = int(hashlib.sha256(str(shard_key).encode()).hexdigest(), 16)

shard_id = hash_val % self.num_shards

return self.shards[shard_id]

Range-Based Sharding

Partition by value ranges:

CREATE TABLE orders (

id BIGSERIAL, order_date DATE, total DECIMAL(10,2),

PRIMARY KEY (id, order_date)

) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2026_01 PARTITION OF orders

FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

Directory-Based Sharding

Use a lookup table for shard mapping:

class DirectoryShardManager:

def init(self):

self.directory = {}

def map_key_to_shard(self, shard_key, shard_id):

self.directory[shard_key] = shard_id

def get_shard(self, shard_key):

return self.directory.get(shard_key)

Rebalancing

When adding or removing shards, data must be redistributed. Use consistent hashing to minimize data movement. Tools like Vitess and Citus automate this process.

Conclusion

Choose key-based sharding for even distribution, range-based for time-series data, and directory-based for maximum flexibility. Design shard keys carefully for even distribution. Plan for rebalancing from the start. Avoid cross-shard queries where possible.

See also: Database Testing Strategies for Developers, Database Normalization Explained, Database Migration Tools and Strategies.

See also: Database Testing Strategies for Developers, Database Normalization Explained, Database Migration Tools and Strategies

See also: Database Testing Strategies for Developers, Database Normalization Explained, Database Migration Tools and Strategies


Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.

Found this useful? Check out more developer guides and tool comparisons on AI Study Room.

Top comments (0)