Sreekar Reddy

Posted on Apr 25 • Originally published at sreekarreddy.com

🍕 Database Sharding Explained Like You're 5

#eli5 #database #scaling #tutorial

Splitting data across servers

Day 121 of 149

👉 Full deep-dive with code examples

The Library Card Catalog Analogy

One huge card catalog is hard to use:

Long lines at one catalog
If it breaks, no one can find books

Split by letters:

A-F catalog in room 1
G-M catalog in room 2
N-Z catalog in room 3

Search is faster, and if one catalog goes down, you can still use the others (but you lose access to that section until it comes back).

Sharding splits your database into pieces across servers!

Why Shard?

One server has limits:

Storage → Runs out of disk space
Speed → Too many queries to handle
Memory → Can't fit all data in RAM

Sharding spreads the load across multiple servers.

How It Works

Split data by some key:

Users A-H → Server 1
Users I-P → Server 2
Users Q-Z → Server 3

Each piece is a "shard." Each shard handles its portion.

Sharding Strategies

By key range:

Users 1-1000 → Shard 1
Users 1001-2000 → Shard 2

By hash:

hash(user_id) % 3 → determines shard
Often spreads data more evenly

By geography:

US users → US shard
EU users → EU shard

The Trade-offs

Complexity:

Queries across shards are hard
Need to route requests correctly

Availability:

Sharding doesn't automatically make you highly available
If a shard goes down, data on that shard is unavailable unless you also use replication

Joins:

Can't easily join data across shards
Might need to restructure queries

Rebalancing:

Adding new shard? Need to move data

In One Sentence

Database Sharding splits your data across multiple servers so each handles a smaller piece, allowing your database to scale beyond one machine's limits.

🔗 Enjoying these? Follow for daily ELI5 explanations!

Making complex tech concepts simple, one day at a time.

DEV Community