ποΈ Sharding - Architecture Series: Part 5
βοΈ WHAT is Sharding?
Sharding = Horizontally splitting one huge database into many smaller databases (shards), each living on separate servers.
Each shard stores a slice of the whole dataset and handles a slice of total traffic.
Visual:
Single DB (Overloaded) β Sharded DB (Distributed)
βββββββββββββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ
β 1TB Data β β Shard 1 β β Shard 2 β β Shard 3 β
β 15K QPS β β Users A-F β β Users G-M β β Users N-Z β
β π₯ Slow / Choking β β 3K QPS β β 4K QPS β β 3K QPS β
βββββββββββββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ
π¨ WHEN Do You Need Sharding? (Red Flags)
π Dataset too large for a single server (100GBβTB scale)
π QPS (queries/sec) exceeding hardware limits
π One table growing billions of rows
π Vertical scaling becomes too expensive πΈ
π Read/write traffic causing slow queries
When βadd more RAM/CPUβ stops helping β
It's sharding time.
π± Real Example: How Instagram Shards
Instagram has 1B+ users, petabytes of posts, reels, feed data.
They shard based on hashed user ID:
shard_id = user_id % 1000
So:
user_id 123456 β 123456 % 1000 β Shard #456
Everything related to that user (posts, followers, comments) lives on Shard 456, forever.
Why they use hashing?
β
Perfect load distribution (no hot shards)
β
No manual range management
β
Each user always hits same shard β FAST
βοΈ Sharding Strategies (Choose Your Weapon)
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Range | ID 1β100K β Shard 1 |
Easy | Hotspots (popular ranges) |
| Hash (Instagram) | ID % N |
Balanced | No range queries |
| Consistent Hash | Hash ring | Minimal reshuffling | Complex |
Quick snippets:
// Hash Sharding
function getShard(userId) {
return userId % 1000;
}
// Range Sharding
function getShard(userId) {
return Math.floor(userId / 100000);
}
ποΈ Architecture (How Apps Route to Shards)
ββββββββββββββββββββββββ βββββββββββββββββββ ββββββββββββββ
β App Server (API) β ββββΆβ Shard Router β ββββΆβ Shard #456 β
β user_id=123456 β β (calculates ID%) β β User Data β
ββββββββββββββββββββββββ βββββββββββββββββββ ββββββββββββββ
If a query needs data from multiple shards β the routing layer handles fan-out + aggregation.
π₯ Why Sharding Is So Powerful
π’ Linear scalability (add more shards β handle more users)
π’ Faster queries (smaller DB = faster indexes)
π’ Fault isolation (Shard 456 down β whole app down)
π’ Geographic distribution (EU users on EU shards)
π’ Infinite scaling (theoretically)
This is how Instagram, YouTube, TikTok, Uber handle global scale.
β οΈ The Dark Side of Sharding (Things people donβt tell you)
π Cross-shard JOINs = slow and painful
π Rebalancing shards = data migration nightmare
π Monitoring 1000 shards = complex ops
π Schema changes = do it 1000Γ
π Picking the wrong shard key = disaster
Which is why companies denormalize heavily to avoid cross-shard joins.
π― Pro Tips from Real Distributed Systems Engineers
1. Start with 64 or 256 shards, not thousands.
2. Hash your primary keys (best distribution).
3. Never shard on fields that change.
4. Build a routing layer between app β DB.
5. Avoid JOINs across shards β duplicate data instead.
6. Monitor shard imbalance regularly.
7. Plan for re-sharding from day 1.
π Modern Solutions (2025)
These databases handle sharding automatically:
π¦ Vitess (YouTube scale)
π¦ PlanetScale (MySQL + global)
π¦ YugabyteDB (PostgreSQL + distributed)
π¦ CockroachDB (ACID + auto-shard)
β Final Summary
Sharding = breaking one big database into many small databases so your system can scale horizontally.
It gives:
- Infinite scalability
- Faster queries
- Better performance
- Global distribution
- Instagram/Twitter-level architecture
BUTβ¦
It requires planning, a routing layer, and avoiding cross-shard joins.
π Missed Previous Parts? Catch Up Here!
If youβve joined this series recently or missed any of the earlier deep-dives, no worries bro β Iβve linked all previous architecture topics below. Each part is designed to build your understanding step-by-step, from caching to replication to sharding. Take your time, go through them in order, and youβll get a rock-solid grasp of real-world system design fundamentals.
π Architecture Series β Index
| # | Topic |
|---|---|
| 1 | Pagination β Architecture Series: Part 1 |
| 2 | Indexing β Architecture Series: Part 2 |
| 3 | Virtualization β Architecture Series: Part 3 |
| 4 | Caching β Architecture Series: Part 4 |
Top comments (0)