Ayush Shrivastav

Posted on Mar 11

Why Production Databases Break Normalization (And Why That's Okay)

#architecture #database #sql #systemdesign

If you've taken any database course, you've been taught that normalization is the right way to design a schema. No duplicate data. Clean relationships. Every table with a single responsibility.

Then you get a job at a real company, open the codebase, and see this:

messages
--------
message_id
channel_id
user_id
user_name
user_avatar
message_body
created_at

user_name? user_avatar? Right there in the messages table?

Your first instinct might be to flag it as a bug. But here's the thing — this is not a mistake. Systems like this run at Slack, Discord, Instagram, and Twitter. These are deliberate decisions made by engineers who understand normalization perfectly. They just chose not to use it.

This piece explains why.

First, what normalization actually solves

Normalization isn't just an academic exercise. It solves a real problem: when the same information lives in more than one place, updates become dangerous.

Say you store a user's name in the users table and also inside each message they've sent. If the user changes their name, you now have to update two places. If one update fails halfway through, half your messages show the old name and half show the new one. That's a data consistency bug — and it's genuinely bad.

A normalized schema avoids this entirely:

users          messages
-----          --------
id             id
name           sender_id → users.id
avatar         body

Now there's one source of truth. Loading a message with a username requires a JOIN:

SELECT m.body, u.name, u.avatar
FROM messages m
JOIN users u ON u.id = m.sender_id
WHERE m.channel_id = 42
ORDER BY m.created_at DESC
LIMIT 50;

This is correct, clean, and totally fine — until you're running it 300,000 times per second.

The actual bottleneck at scale

Here's something most database courses don't tell you: production systems are not evenly split between reads and writes. Most systems are overwhelmingly read-heavy.

Twitter's public architecture data puts this into perspective. At 150 million active users, they were handling roughly 300,000 read requests per second for home timelines, versus about 6,000 write requests per second for new tweets. That's a 50-to-1 read/write ratio. For something like a message inbox or social feed, it can be even more extreme.

When reads outnumber writes by that margin, every millisecond you can shave off a read multiplies across a huge volume. A JOIN that costs 10ms doesn't sound bad in isolation. But when you need to execute it 300,000 times a second across a distributed database cluster, that cost becomes the bottleneck.

And JOINs are expensive in ways that aren't obvious. They require two table scans (or index lookups), loading data from potentially different disk pages, merging the results in memory, and sorting them. At modest table sizes this is fine. At hundreds of millions of rows spread across a sharded cluster, the database is doing significant work for every single request.

What denormalization actually is

Denormalization is the decision to intentionally store the same data in more than one place to make reads faster.

Going back to that messages table:

messages
--------
message_id
channel_id
user_id
user_name       ← duplicated from users table
user_avatar     ← duplicated from users table
message_body
created_at

Now loading a channel's message history looks like this:

SELECT message_id, user_name, user_avatar, message_body
FROM messages
WHERE channel_id = 42
ORDER BY created_at DESC
LIMIT 50;

No join. One table. One index scan. The database does a fraction of the work.

This is faster, cheaper in CPU and disk I/O, easier to cache, and simpler to scale horizontally — because no cross-table coordination is needed.

Discord: 4 trillion messages and zero joins

Discord is probably the most documented example of this pattern at scale.

When Discord launched in 2015, they stored messages in MongoDB. By late 2015, with 100 million messages in storage, queries were becoming slow and unpredictable as data grew beyond what fit in RAM. They migrated to Apache Cassandra, a database that forces you into denormalized design — there are no server-side JOINs in Cassandra at all.

Their message schema looked something like this:

CREATE TABLE messages (
    channel_id  bigint,
    bucket      int,
    message_id  bigint,
    author_id   bigint,
    content     text,
    PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

The key design decision is (channel_id, bucket) as the composite partition key. All messages in a channel, within a ~10-day time window (the "bucket"), live on the same database node. A query for "last 50 messages in channel X" hits exactly one node, reads from one partition, and returns. No joins, no cross-node coordination.

By 2022, Discord had grown to 177 Cassandra nodes handling trillions of messages. JVM garbage collection pauses were causing unpredictable latency spikes. They migrated to ScyllaDB (a C++ reimplementation of Cassandra with no JVM), dropping from 177 nodes to 72 while also cutting p99 insert latency from a variable 5–70ms range down to a steady 5ms.

The schema design stayed fundamentally the same. Denormalized. Query-driven. No joins.

Twitter: precomputing your entire timeline

Twitter's approach to denormalization is even more aggressive.

In a normalized system, showing you your home timeline would work like this: query all users you follow, fetch their recent tweets, merge and sort by time. At scale, this is a scatter-gather operation — hitting potentially thousands of small queries per timeline load. Completely impractical.

What Twitter does instead is called fan-out on write. When you post a tweet, a background process immediately pushes your tweet ID into the Redis timeline cache of every one of your followers. If you have 20,000 followers, that's 20,000 cache writes triggered by your single post.

The read path becomes trivially simple. When you open your timeline, Twitter reads your pre-built list from Redis. Your 800-tweet timeline is already there, pre-sorted, waiting. The read takes about 5 milliseconds.

This is denormalization taken to its logical extreme. The tweet ID is physically duplicated across N followers' data stores — not in the database sense, but in the cache sense. Write time is more expensive. Read time is nearly free.

One obvious problem: Lady Gaga had 31 million followers at one point. A single tweet from her would require 31 million cache writes. Twitter handles this with a hybrid approach — for accounts above a certain follower threshold, they skip the fan-out and instead merge the celebrity's tweets at read time. Most users get the precomputed timeline. Celebrities are handled as a special case.

Instagram's like count problem

Instagram ran into a specific version of this problem around engagement metrics.

In a normalized schema, counting likes looks like this:

SELECT COUNT(*) FROM likes WHERE post_id = 12345;

On a busy post, the likes table has millions of rows. Even with an index, counting them is work the database has to do on every request — and Instagram serves hundreds of millions of users who are all loading posts and seeing like counts constantly.

When a celebrity posted something, the spike in traffic would hammer this query on every load. The database struggled.

The fix was straightforward: add a like_count column directly to the posts table. When someone likes a post, you write to likes and you also increment like_count in posts. Now the read path for a like count is a single indexed lookup returning a single value.

The tradeoff: writes are slightly more complex, and the like_count column is technically derived data (it duplicates what you could compute from the likes table). But the query went from being expensive to being instant. That's a tradeoff Instagram very deliberately made.

Microservices make denormalization unavoidable

There's another context where denormalization isn't really optional: distributed microservice architectures.

If your user data lives in a User Service and your message data lives in a Message Service, there is no JOIN. You can't run a SQL query that spans two independently deployed, separately scaled services with separate databases.

You have a few options:

Load the message, then make an HTTP call to the User Service to get the username. This adds a network round-trip to every message load — not great when loading 50 messages at once.
Batch the calls. Slightly better but still adds latency and coupling.
Store the username inside the message when it's created. No cross-service call needed at read time.

Option 3 is denormalization, and it's the most common approach in real microservice architectures. The Message Service stores enough user data to render messages without calling anyone else. When a user changes their name, the message records don't immediately reflect it — but that's an acceptable tradeoff for most systems. Yesterday's chat messages showing your old username for a few seconds is not a critical business failure.

The real cost of denormalization

Denormalization is not free. The engineering tradeoff is real, and it's worth being honest about it.

Update complexity. If a user changes their avatar, and their avatar URL is embedded in thousands of message records, you now have to update all of those records. In some systems this is acceptable (the old avatar is just stale, not wrong). In others you need a propagation mechanism — typically an event stream that broadcasts the change and downstream services update their own copies.

Eventual consistency. In distributed systems, that propagation doesn't happen instantly. There's a window where your data is inconsistent — the users table has the new name, but message records still show the old one. Systems that use this pattern explicitly accept eventual consistency as a design choice.

Storage cost. Storing the same data multiple times costs more. At small scale this barely matters. At Discord's scale (petabytes of messages), duplicating even a small amount of data per message adds up.

Schema evolution. If a denormalized field changes meaning or format, you have to update it everywhere it's been copied — which might mean migrating billions of rows.

How production systems handle both

Most well-designed systems don't pick one approach and abandon the other. They use normalization where correctness matters most and denormalization where read performance matters most.

A common architecture:

Normalized database (source of truth)
           ↓
     Event stream (Kafka)
           ↓
  Denormalized read tables
           ↓
       API queries

The normalized database is where you write to. It's correct, consistent, and acts as the ground truth for your data. When something changes here, an event gets published.

Downstream services consume those events and update their own denormalized projections — tables or caches optimized for specific read patterns. These projections might live in Redis, Cassandra, Elasticsearch, or a separate PostgreSQL schema. They're not the source of truth, but they're what your APIs query because they're fast.

This pattern has a name — CQRS (Command Query Responsibility Segregation) — and it's what Netflix, LinkedIn, Uber, and most large-scale platforms use in one form or another. Netflix processes 140 million hours of viewing data per day through Kafka, with multiple downstream consumers each building their own optimized projection for recommendations, billing, and analytics.

The insight worth taking away

Database theory teaches you how to design schemas that are correct. Production engineering teaches you how to design schemas that are fast under real load.

Normalization is right in the sense that it prevents a class of consistency bugs. It belongs in your schema wherever data correctness is non-negotiable.

Denormalization is right in the sense that JOINs at scale are expensive, network round-trips add up, and precomputed data is faster than computed data. It belongs in your read paths, your caches, and your projections.

The engineers at Discord, Twitter, Instagram, and Netflix aren't ignoring normalization. They understand it well enough to know when breaking it is the right call.

That's the difference between knowing the rules and knowing when to apply them.

Real-world data referenced in this post comes from Discord's engineering blog (message storage architecture), Twitter's 2012 architecture documentation, Instagram's engineering posts on sharding and denormalization, and Netflix's TechBlog on viewing data storage.

DEV Community