When you start building an app, everyone gives the same advice:
“Normalize your database.”
And they’re right… at the beginning.
But once your app starts scaling?
That same “clean design” can become your biggest bottleneck.
This is where denormalization (especially JSONB in PostgreSQL) becomes a powerful tool.
Let’s break this down with a real-world example.
Table of Contents
- The Scenario: A High-Traffic Food Delivery App
- Part 1: Normalization — The Clean Approach
- The Problem with JOINs at Scale
- Part 2: Denormalization (JSONB) — The Fast Approach
- Better Query Example (Real-World)
- The Trade-Off (This Is Important)
- When Should You Use JSONB
- When to Use Normalization
- Hybrid Approach
- JSONB Indexing
- Final Thoughts
The Scenario: A High-Traffic Food Delivery App
Imagine you built:
bitedash.com
At scale:
- Millions of users
- Millions of orders
- Heavy traffic during peak hours
Now a user opens:
“Order History”
Seems simple, right?
But under the hood, your database might be struggling.
Part 1: Normalization — The Clean Approach
Normalization is like a perfectly organized filing system.
Every piece of data has exactly one place.
Typical Schema
You design your database like this:
-
orders -
order_items -
products -
users
Everything is clean, structured, and reusable.
The Problem with JOINs at Scale
To show one "Order History" page, PostgreSQL has to:
- Join orders
- Join order_items
- Join products
- Join users
That’s a 4-table JOIN
At Small Scale
Works perfectly.
At Large Scale
When your database grows:
- Orders → 10 million rows
- Order_items → 50 million rows
Every request now requires expensive JOIN operations
The Cost
- High CPU usage
- Slower queries
- Increased latency
- Poor user experience
Mental Model
Normalization = Clean but computationally expensive at scale
Part 2: Denormalization (JSONB) — The Fast Approach
Now let’s flip the approach.
Instead of storing data separately…
Store everything together.
What is JSONB?
In PostgreSQL:
- Binary JSON format
- Faster than plain JSON
- Supports indexing
- Efficient querying
New Schema
Instead of multiple tables:
You store everything inside one table:
orders
With a column:
order_details (JSONB)
Example Data
{
"items": [
{ "name": "Pizza", "price": 15 },
{ "name": "Coke", "price": 2 }
],
"user_name": "John Doe",
"user_id": 123,
"total": 17.00
}
The Result
SELECT order_details FROM orders WHERE user_id = 123;
- No JOINs
- Single query
- Extremely fast
Why It’s Fast
- One row fetch
- No table stitching
- Less CPU work
Mental Model
JSONB = Pre-packaged data ready to serve
Better Query Example (Real-World)
Let’s say your product team asks:
“Show all orders where the user bought Pizza and total is greater than $10”
With Normalization
SELECT o.id, o.total
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE p.name = 'Pizza'
AND o.total > 10;
Problem
- Multiple JOINs
- Large tables
- Slower at scale
With JSONB (use numeric cast for totals)
SELECT order_details
FROM orders
WHERE (order_details->>'total')::numeric > 10
AND order_details @> '{"items":[{"name":"Pizza"}]}';
More robust match using jsonb_path_exists (recommended for arrays of objects)
SELECT order_details
FROM orders
WHERE (order_details->>'total')::numeric > 10
AND jsonb_path_exists(
order_details,
'$.items[*] ? (@.name == "Pizza")'
);
jsonb_path_exists is more flexible for matching nested arrays/objects than the simple @> containment operator.
Add Index for Speed
-- GIN index for general JSONB containment/search
CREATE INDEX idx_orders_order_details_gin ON orders USING GIN (order_details);
-- Expression index for numeric total (fast range/greater-than queries)
CREATE INDEX idx_orders_total_numeric ON orders (((order_details->>'total')::numeric));
Result
- Fast filtering
- No joins
- Efficient queries
The Trade-Off (This Is Important)
| Feature | Normalization | JSONB (Denormalization) |
|---|---|---|
| Read Speed | Slower (JOINs) | Very fast (single read) |
| Write Speed | Faster (small rows) | Slower (large JSON writes) |
| Data Integrity | Strong (FK constraints) | Risk of inconsistency |
| Storage | Efficient | Larger (data duplication) |
There is no “perfect” solution. It’s always a trade-off.
Part 3: When Should You Use JSONB?
Use JSONB strategically, not everywhere
Use JSONB for: “Snapshots of the Past”
Example:
A user places an order.
Tomorrow:
- Pizza price changes
- Product name changes
If you rely on JOINs:
Old orders will show new values and its Wrong
Solution
Store order as JSONB:
- Captures exact state at that moment
- Keeps historical data correct
- Makes reads extremely fast
Benefits
- Accurate history
- Faster dashboards
- Better scalability
Part 4: When to Use Normalization
Normalization is still critical.
Use it for live, changing data.
Examples
- User accounts
- Passwords
- Emails
- Inventory
- Account balances
Why?
You need a single source of truth
You cannot afford:
- Duplicate data
- Inconsistent updates
Part 5: The Hybrid Approach (Best of Both Worlds)
At scale, real systems use both approaches together.
Example Architecture
- Users → Normalized
- Products → Normalized
- Orders → JSONB snapshot
Practical performance note
If you frequently filter by fields inside the JSONB (for example user_id or total), store those as first-class columns too:
- orders.user_id (bigint)
- orders.total (numeric)
Then keep order_details (JSONB) as the immutable snapshot. This gives you indexable scalars for fast filters/ranges and a JSON snapshot for state and flexible reads.
Result
- Fast reads
- Data consistency
- Scalability
Pro Tip: Indexing JSONB
You can still make JSONB queries fast.
CREATE INDEX idx_order_user_name
ON orders ((order_details->>'user_name'));
What this does:
- Fast search inside JSON
- Works like a normal column index
Combine GIN indexes for general JSONB containment with expression indexes for frequently filtered scalars.
Mental Model
- Normalization = Clean + consistent
- JSONB = Fast + flexible
Final Thoughts
Database design is always a balancing act.
Early stage:
Normalize everything
At scale:
Denormalize strategically
The best systems aren’t fully normalized or fully denormalized.
They’re carefully designed hybrids.
Top comments (0)