DEV Community

Sreekar Reddy
Sreekar Reddy

Posted on • Originally published at sreekarreddy.com

🍕 Database Sharding Explained Like You're 5

Splitting data across servers

Day 121 of 149

👉 Full deep-dive with code examples


The Library Card Catalog Analogy

One huge card catalog is hard to use:

  • Long lines at one catalog
  • If it breaks, no one can find books

Split by letters:

  • A-F catalog in room 1
  • G-M catalog in room 2
  • N-Z catalog in room 3

Search is faster, and if one catalog goes down, you can still use the others (but you lose access to that section until it comes back).

Sharding splits your database into pieces across servers!


Why Shard?

One server has limits:

  • Storage → Runs out of disk space
  • Speed → Too many queries to handle
  • Memory → Can't fit all data in RAM

Sharding spreads the load across multiple servers.


How It Works

Split data by some key:

Users A-H → Server 1
Users I-P → Server 2
Users Q-Z → Server 3
Enter fullscreen mode Exit fullscreen mode

Each piece is a "shard." Each shard handles its portion.


Sharding Strategies

By key range:

  • Users 1-1000 → Shard 1
  • Users 1001-2000 → Shard 2

By hash:

  • hash(user_id) % 3 → determines shard
  • Often spreads data more evenly

By geography:

  • US users → US shard
  • EU users → EU shard

The Trade-offs

Complexity:

  • Queries across shards are hard
  • Need to route requests correctly

Availability:

  • Sharding doesn't automatically make you highly available
  • If a shard goes down, data on that shard is unavailable unless you also use replication

Joins:

  • Can't easily join data across shards
  • Might need to restructure queries

Rebalancing:

  • Adding new shard? Need to move data

In One Sentence

Database Sharding splits your data across multiple servers so each handles a smaller piece, allowing your database to scale beyond one machine's limits.


🔗 Enjoying these? Follow for daily ELI5 explanations!

Making complex tech concepts simple, one day at a time.

Top comments (0)