The Complete Guide to Database Sharding Strategies in 2026
When vertical scaling hits its ceiling, sharding is the next step. But getting it wrong means migrating data under pressure with zero margin for error.
When to Shard
Signs you need sharding:
- Single primary DB CPU > 70% sustained
- DB disk I/O at capacity
- Replication lag > 1 second
- Table exceeds 500GB with heavy writes
Sharding too early adds complexity; too late means downtime.
Horizontal Sharding Patterns
1. Range-Based Sharding
Split by ID ranges or date:
-- Shard by user_id range
CREATE TABLE users_0 (CHECK (user_id BETWEEN 0 AND 999999)) INHERITS (users);
CREATE TABLE users_1 (CHECK (user_id BETWEEN 1000000 AND 1999999)) INHERITS (users);
Pros: Simple, sequential reads are efficient
Cons: Hot spots on newer shards, uneven distribution
2. Hash-Based Sharding
def get_shard(user_id: int, num_shards: int = 4) -> int:
return hash(str(user_id)) % num_shards
# Consistent hashing for reshard without full remapping
def consistent_hash(key: str, nodes: list, replicas: int = 150) -> str:
circle = {}
for node in nodes:
for i in range(replicas):
circle[hash(f"{node}:{i}")] = node
sorted_keys = sorted(circle.keys())
return circle[min(sorted_keys, key=lambda k: abs(k - hash(key)))]
Pros: Even distribution, no hot spots
Cons: Range queries span multiple shards
3. Directory-Based Sharding
Lookup table maps keys to shards:
CREATE TABLE shard_directory (
entity_type VARCHAR(50),
entity_id BIGINT,
shard_id INT,
PRIMARY KEY (entity_type, entity_id)
);
Pros: Flexible, can change assignment without rebalancing
Cons: Directory becomes a single point of failure
Cross-Shard Queries
The hardest problem. Solutions:
- Scatter-gather: Query all shards, merge results (slow)
- Denormalization: Store redundant copies (complex updates)
- Search engine: Use Elasticsearch for queries, DB for writes
- Read replicas: Dedicated reporting replica that receives all data
Rebalancing Without Downtime
Live resharding:
# Dual-write during transition
def write_sharded(key, value, phase="migrate"):
if phase in ("migrate", "write"):
old_shard = get_old_shard(key)
write_to_shard(old_shard, key, value)
if phase in ("migrate", "write"):
new_shard = get_new_shard(key)
write_to_shard(new_shard, key, value)
PostgreSQL Sharding Extensions
- Citus: Native horizontal sharding for PostgreSQL
- PostgreSQL FDW: Foreign data wrappers for cross-shard queries
- pg_partman: Automatic partition management
Conclusion
Sharding is a last resort. Exhaust caching, read replicas, and vertical scaling first. When you do shard, start with hash-based and plan for at least 2x your expected data.
Need a simpler scaling path? Systeme.io handles infrastructure scaling automatically β so you can focus on your application logic.
Top comments (0)