DEV Community: Vikas Maheshwari

One Big Table vs the star schema: I think everyone's arguing about the wrong thing

Vikas Maheshwari — Wed, 01 Jul 2026 10:53:14 +0000

Every few months the "One Big Table vs star schema" argument flares up again on data Twitter, and every time I watch two groups of smart people talk completely past each other. It took me a while to figure out why — and once I did, the whole debate kind of dissolved.

Short version: they're not disagreeing about modeling. They're optimizing for different things and not saying so out loud.

Let me lay it out.

The two camps

If you've somehow avoided this one so far: One Big Table (OBT) means you flatten your fact and all its dimensions into a single wide, denormalized table. No joins. Every order row already carries the customer name, the product category, the store region, the date attributes — everything, inline.

The star schema keeps facts and dimensions separate and joins them at query time. Fact in the middle, dimension tables around it, assembled per question.

That's the whole fork: OBT joins once, in the pipeline, ahead of time. The star joins every time, at query time.

Why OBT people love OBT

And honestly, they're not wrong:

No joins for the analyst. Every question is a single-table SELECT ... GROUP BY. Nobody fat-fingers a join. Self-serve BI users stop filing tickets.
Predictable performance. One wide columnar table scans fast and consistently, which matters a lot if your BI tool generates clumsy join SQL or you've got high dashboard concurrency.
It fits columnar storage. Wide, repetitive tables compress beautifully. "Electronics" repeated a million times costs almost nothing after compression. The old penalty for very wide tables mostly isn't a thing anymore.

I used to roll my eyes at OBT. I don't anymore. For the right job it's genuinely great.

Why it bites you

The costs are real too, they just show up later, which is the dangerous part:

It's built for the questions you already thought of. New question that needs an attribute you didn't flatten in? That's a pipeline change, not a new query. The star's flexibility is exactly the thing OBT trades away.
History gets awkward. Handling slowly changing dimensions in a flat table is clumsy compared to a proper dimension with validity dates.
Combinatorial sprawl. Because each OBT is shaped for a set of questions, teams build lots of them — one per dashboard — and now "revenue" is computed a dozen slightly-different ways across a dozen wide tables that quietly drift apart. Congrats, you've reinvented the "three different revenue numbers" problem, one big table at a time.

Here's the comparison I wish someone had handed me earlier:

	Star schema	One Big Table
Joins at query time	Yes	None
Flexibility for new questions	High	Low (rebuild pipeline)
Query simplicity for analysts	Medium	High
Handling SCDs / history	Clean	Awkward
Risk at scale	Complexity	Definitions drift across many OBTs
Best as	Core model	Serving layer

Where I actually landed

The framing that fixed it for me: they're layers, not rivals.

Keep a star schema as your core model — the flexible, governed, single-source-of-truth layer where facts and dimensions live cleanly and history is handled properly. Then, where a specific high-traffic dashboard or a join-averse tool needs it, build a One Big Table on top as a serving layer — a denormalized projection derived from the star.

You get both properties without the trap. The star keeps flexibility and one definition of each metric. The OBT delivers join-free speed to the consumers who benefit. And because the OBT is derived, it can't invent its own private definition of revenue — it inherits the star's.

So I stopped asking "star or OBT?" and started asking "which layer am I building right now?" That question usually answers itself.

The one exception: if you genuinely have one dashboard and a small team, an OBT alone might be all you ever need. Don't build a star schema to feed a single report. Match the tool to the actual job, not the tribe.

I wrote a longer version of this with the full reasoning over on my site if you want it: One Big Table vs the Star Schema. It's part of a set of pieces I've been writing on dimensional modeling — the star vs snowflake one and normalization vs denormalization are the closest companions.

But mostly I'm curious about your experience: has anyone here run OBT as the only layer at real scale? What broke first — the rebuild-for-every-new-question cost, or the "wait, why does each dashboard have a different revenue number" problem? Or did it just... work, and I'm overthinking it?

Star Schema vs Snowflake Schema: Which to Use and When

Vikas Maheshwari — Thu, 25 Jun 2026 17:37:31 +0000

The difference between a star schema and a snowflake schema is smaller than the
debate around it suggests. Both are dimensional models — a central fact table
surrounded by dimensions — and the entire distinction is one decision: do you
keep each dimension in a single flat table (star), or normalize it into related
sub-tables (snowflake)? For analytics on a modern cloud warehouse, the star is
almost always the better default. Here's why, with a worked example and a diagram.

Star vs snowflake, at a glance

	Star schema	Snowflake schema
Dimensions	Denormalized — one flat table each	Normalized into sub-tables
Joins per query	Fewer (fact → dimension)	More (fact → dimension → sub-tables)
Query simplicity	High — easy to read and write	Lower — must traverse the hierarchy
Storage	Slightly more (repeated values)	Slightly less (values stored once)
Query speed (columnar)	Usually faster	Usually slower
Maintenance	Simpler	More tables to keep in sync
Best for	Most analytics on cloud warehouses	Very large or compliance-bound dimensions

The one real difference

In a star schema, each dimension is a single, wide, denormalized table — the
product dimension holds the product, its category, its brand, and its supplier all in
one place, even though "Electronics" repeats across many rows. In a snowflake
schema, you normalize that dimension into a branching hierarchy: product points to a
separate category table, which points to a department table, and so on. The single
dimension "snowflakes" out into smaller related tables, which is where the name comes
from.

        STAR SCHEMA                          SNOWFLAKE SCHEMA

         dim_date                                dim_date
            |                                        |
 dim_customer — fact_sales — dim_product   dim_customer — fact_sales — dim_product
            |                                        |                    |
         dim_store                                dim_store          (category)
                                                                         |
                                                                      (brand)

 Dimensions sit directly on        A dimension (product) is normalized
 the fact table.                    into further sub-tables.

If you understand why dimensional models split measurements from
context, you already understand both —
snowflaking is just normalization applied
to the dimension tables.

A worked example

Say you want sales by product category. In a star, category lives right on the
product dimension, so it's one join:

-- STAR: one join, category is on the dimension
SELECT p.category, SUM(f.net_amount) AS revenue
FROM fact_sales f
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY p.category;

In a snowflake, category has been normalized into its own table, so the same
question now traverses the hierarchy:

-- SNOWFLAKE: an extra hop to reach category
SELECT c.category, SUM(f.net_amount) AS revenue
FROM fact_sales f
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_category c ON p.category_key = c.category_key
GROUP BY c.category;

Every level of normalization is another join the analyst must write and the engine
must execute. Multiply that across a real schema and the snowflake's "tidiness"
becomes a steady tax on every query.

When to use a star schema

For analytics on a columnar cloud warehouse — which is most analytics today —
default to the star. Denormalize your dimensions. The storage cost is negligible
because columnar engines compress repeated values away to almost nothing, queries are
dramatically simpler, and performance is typically better than the snowflake, not
worse. Optimizing for storage by normalizing is solving a 1998 problem with a 2026
bill.

When to use a snowflake schema

Reach for snowflaking only in specific cases, and even then only for the dimension
that needs it:

A dimension is genuinely enormous (tens of millions of rows) and a shared attribute is large and highly repetitive, so the storage saving is material.
A rapidly changing shared attribute is meaningfully cheaper and safer to update in one normalized place.
A compliance or governance rule forces a single authoritative table for an entity.

Mixing is fine — a mostly-star model with one snowflaked dimension is a perfectly
reasonable, pragmatic design. You don't owe the schema purity.

The thing underneath the choice

"Star vs snowflake" is really a proxy for an older question: normalize for
write-efficiency, or denormalize for read-efficiency? A warehouse is overwhelmingly
read-heavy — written by a few pipelines, queried by everyone — so it should optimize
for reads, which means denormalizing, which means the star. (If you want the deeper
version of that trade-off, see normalization vs
denormalization; if you want the even more
aggressive end of denormalization, see one big table vs the star
schema.)

Pick the star by default. Snowflake a dimension only when you can name the specific
problem it solves. And don't lose an afternoon to the debate — it was only ever one
decision wearing two names.

FAQ

What is the difference between a star schema and a snowflake schema?
A star schema keeps each dimension in a single flat, denormalized table. A snowflake schema normalizes those dimensions into multiple related sub-tables. That one choice — denormalized versus normalized dimensions — is the entire distinction; the fact table is the same in both.

Which is faster, star schema or snowflake schema?
On modern columnar warehouses, usually the star. Denormalized dimensions mean fewer joins at query time, and columnar compression shrinks the repeated values that normalization was meant to eliminate, so the snowflake's storage saving rarely outweighs its extra join cost.

When should you use a snowflake schema?
When a dimension is genuinely enormous and a shared attribute is large and highly repetitive, when a rapidly changing shared attribute is cheaper to update in one normalized place, or when a compliance rule forces a single authoritative table. Even then, snowflake only the dimension that needs it.

Is the snowflake schema related to the Snowflake data warehouse?
No. The schema pattern is decades older than the vendor and unrelated to it — you can build star or snowflake schemas on any warehouse, including Snowflake, BigQuery, or Redshift.

This post was originally published on dataarchitect.studio, where I write about data architecture, dimensional modeling, and the lakehouse.