Abdullah al Mubin

Posted on Apr 21

PostgreSQL Design at Scale: Normalization vs JSONB (A Real-World Guide)

#architecture #database #performance #postgres

When you start building an app, everyone gives the same advice:

“Normalize your database.”

And they’re right… at the beginning.

But once your app starts scaling?

That same “clean design” can become your biggest bottleneck.

This is where denormalization (especially JSONB in PostgreSQL) becomes a powerful tool.

Let’s break this down with a real-world example.

The Scenario: A High-Traffic Food Delivery App
Part 1: Normalization — The Clean Approach
The Problem with JOINs at Scale
Part 2: Denormalization (JSONB) — The Fast Approach
Better Query Example (Real-World)
The Trade-Off (This Is Important)
When Should You Use JSONB
When to Use Normalization
Hybrid Approach
JSONB Indexing
Final Thoughts

The Scenario: A High-Traffic Food Delivery App

Imagine you built:

bitedash.com

At scale:

Millions of users
Millions of orders
Heavy traffic during peak hours

Now a user opens:

“Order History”

Seems simple, right?

But under the hood, your database might be struggling.

Part 1: Normalization — The Clean Approach

Normalization is like a perfectly organized filing system.

Every piece of data has exactly one place.

Typical Schema

You design your database like this:

orders
order_items
products
users

Everything is clean, structured, and reusable.

The Problem with JOINs at Scale

To show one "Order History" page, PostgreSQL has to:

Join orders
Join order_items
Join products
Join users

That’s a 4-table JOIN

At Small Scale

Works perfectly.

At Large Scale

When your database grows:

Orders → 10 million rows
Order_items → 50 million rows

Every request now requires expensive JOIN operations

The Cost

High CPU usage
Slower queries
Increased latency
Poor user experience

Mental Model

Normalization = Clean but computationally expensive at scale

Part 2: Denormalization (JSONB) — The Fast Approach

Now let’s flip the approach.

Instead of storing data separately…

Store everything together.

What is JSONB?

In PostgreSQL:

Binary JSON format
Faster than plain JSON
Supports indexing
Efficient querying

New Schema

Instead of multiple tables:

You store everything inside one table:

orders

With a column:

order_details (JSONB)

Example Data

{
  "items": [
    { "name": "Pizza", "price": 15 },
    { "name": "Coke", "price": 2 }
  ],
  "user_name": "John Doe",
  "user_id": 123,
  "total": 17.00
}

The Result

SELECT order_details FROM orders WHERE user_id = 123;

No JOINs
Single query
Extremely fast

Why It’s Fast

One row fetch
No table stitching
Less CPU work

Mental Model

JSONB = Pre-packaged data ready to serve

Better Query Example (Real-World)

Let’s say your product team asks:

“Show all orders where the user bought Pizza and total is greater than $10”

With Normalization

SELECT o.id, o.total
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE p.name = 'Pizza'
AND o.total > 10;

Problem

Multiple JOINs
Large tables
Slower at scale

With JSONB (use numeric cast for totals)

SELECT order_details
FROM orders
WHERE (order_details->>'total')::numeric > 10
  AND order_details @> '{"items":[{"name":"Pizza"}]}';

More robust match using jsonb_path_exists (recommended for arrays of objects)

SELECT order_details
FROM orders
WHERE (order_details->>'total')::numeric > 10
  AND jsonb_path_exists(
        order_details,
        '$.items[*] ? (@.name == "Pizza")'
      );

jsonb_path_exists is more flexible for matching nested arrays/objects than the simple @> containment operator.

Add Index for Speed

-- GIN index for general JSONB containment/search
CREATE INDEX idx_orders_order_details_gin ON orders USING GIN (order_details);

-- Expression index for numeric total (fast range/greater-than queries)
CREATE INDEX idx_orders_total_numeric ON orders (((order_details->>'total')::numeric));

Result

Fast filtering
No joins
Efficient queries

The Trade-Off (This Is Important)

Feature	Normalization	JSONB (Denormalization)
Read Speed	Slower (JOINs)	Very fast (single read)
Write Speed	Faster (small rows)	Slower (large JSON writes)
Data Integrity	Strong (FK constraints)	Risk of inconsistency
Storage	Efficient	Larger (data duplication)

There is no “perfect” solution. It’s always a trade-off.

Part 3: When Should You Use JSONB?

Use JSONB strategically, not everywhere

Use JSONB for: “Snapshots of the Past”

Example:

A user places an order.

Tomorrow:

Pizza price changes
Product name changes

If you rely on JOINs:

Old orders will show new values and its Wrong

Solution

Store order as JSONB:

Captures exact state at that moment
Keeps historical data correct
Makes reads extremely fast

Benefits

Accurate history
Faster dashboards
Better scalability

Part 4: When to Use Normalization

Normalization is still critical.

Use it for live, changing data.

Examples

User accounts
Passwords
Emails
Inventory
Account balances

Why?

You need a single source of truth

You cannot afford:

Duplicate data
Inconsistent updates

Part 5: The Hybrid Approach (Best of Both Worlds)

At scale, real systems use both approaches together.

Example Architecture

Users → Normalized
Products → Normalized
Orders → JSONB snapshot

Practical performance note

If you frequently filter by fields inside the JSONB (for example user_id or total), store those as first-class columns too:

orders.user_id (bigint)
orders.total (numeric)

Then keep order_details (JSONB) as the immutable snapshot. This gives you indexable scalars for fast filters/ranges and a JSON snapshot for state and flexible reads.

Result

Fast reads
Data consistency
Scalability

Pro Tip: Indexing JSONB

You can still make JSONB queries fast.

CREATE INDEX idx_order_user_name 
ON orders ((order_details->>'user_name'));

What this does:

Fast search inside JSON
Works like a normal column index

Combine GIN indexes for general JSONB containment with expression indexes for frequently filtered scalars.

Mental Model

Normalization = Clean + consistent
JSONB = Fast + flexible

Final Thoughts

Database design is always a balancing act.

Early stage:

Normalize everything

At scale:

Denormalize strategically

The best systems aren’t fully normalized or fully denormalized.
They’re carefully designed hybrids.

Table of Contents

The Scenario: A High-Traffic Food Delivery App

Part 1: Normalization — The Clean Approach

Typical Schema

The Problem with JOINs at Scale

At Small Scale

At Large Scale

The Cost

Mental Model

Part 2: Denormalization (JSONB) — The Fast Approach

What is JSONB?

New Schema

Example Data

The Result

Why It’s Fast

Mental Model

Better Query Example (Real-World)

With Normalization

Problem

With JSONB (use numeric cast for totals)

More robust match using jsonb_path_exists (recommended for arrays of objects)

Add Index for Speed

Result

The Trade-Off (This Is Important)

Part 3: When Should You Use JSONB?

Use JSONB for: “Snapshots of the Past”

Solution

Benefits

Part 4: When to Use Normalization

Examples

Why?

Part 5: The Hybrid Approach (Best of Both Worlds)

Example Architecture

Practical performance note

Result

Pro Tip: Indexing JSONB

Mental Model

Final Thoughts