DEV Community

ZNY
ZNY

Posted on

The Complete Guide to Database Sharding Strategies in 2026

The Complete Guide to Database Sharding Strategies in 2026

When vertical scaling hits its ceiling, sharding is the next step. But getting it wrong means migrating data under pressure with zero margin for error.

When to Shard

Signs you need sharding:

  • Single primary DB CPU > 70% sustained
  • DB disk I/O at capacity
  • Replication lag > 1 second
  • Table exceeds 500GB with heavy writes

Sharding too early adds complexity; too late means downtime.

Horizontal Sharding Patterns

1. Range-Based Sharding

Split by ID ranges or date:

-- Shard by user_id range
CREATE TABLE users_0 (CHECK (user_id BETWEEN 0 AND 999999)) INHERITS (users);
CREATE TABLE users_1 (CHECK (user_id BETWEEN 1000000 AND 1999999)) INHERITS (users);
Enter fullscreen mode Exit fullscreen mode

Pros: Simple, sequential reads are efficient
Cons: Hot spots on newer shards, uneven distribution

2. Hash-Based Sharding

def get_shard(user_id: int, num_shards: int = 4) -> int:
    return hash(str(user_id)) % num_shards

# Consistent hashing for reshard without full remapping
def consistent_hash(key: str, nodes: list, replicas: int = 150) -> str:
    circle = {}
    for node in nodes:
        for i in range(replicas):
            circle[hash(f"{node}:{i}")] = node
    sorted_keys = sorted(circle.keys())
    return circle[min(sorted_keys, key=lambda k: abs(k - hash(key)))]
Enter fullscreen mode Exit fullscreen mode

Pros: Even distribution, no hot spots
Cons: Range queries span multiple shards

3. Directory-Based Sharding

Lookup table maps keys to shards:

CREATE TABLE shard_directory (
    entity_type VARCHAR(50),
    entity_id BIGINT,
    shard_id INT,
    PRIMARY KEY (entity_type, entity_id)
);
Enter fullscreen mode Exit fullscreen mode

Pros: Flexible, can change assignment without rebalancing
Cons: Directory becomes a single point of failure

Cross-Shard Queries

The hardest problem. Solutions:

  1. Scatter-gather: Query all shards, merge results (slow)
  2. Denormalization: Store redundant copies (complex updates)
  3. Search engine: Use Elasticsearch for queries, DB for writes
  4. Read replicas: Dedicated reporting replica that receives all data

Rebalancing Without Downtime

Live resharding:

# Dual-write during transition
def write_sharded(key, value, phase="migrate"):
    if phase in ("migrate", "write"):
        old_shard = get_old_shard(key)
        write_to_shard(old_shard, key, value)
    if phase in ("migrate", "write"):
        new_shard = get_new_shard(key)
        write_to_shard(new_shard, key, value)
Enter fullscreen mode Exit fullscreen mode

PostgreSQL Sharding Extensions

  • Citus: Native horizontal sharding for PostgreSQL
  • PostgreSQL FDW: Foreign data wrappers for cross-shard queries
  • pg_partman: Automatic partition management

Conclusion

Sharding is a last resort. Exhaust caching, read replicas, and vertical scaling first. When you do shard, start with hash-based and plan for at least 2x your expected data.

Need a simpler scaling path? Systeme.io handles infrastructure scaling automatically — so you can focus on your application logic.

Top comments (0)