Star Schema vs Snowflake Schema: Which to Use and When

#sql #database #datascience #dataengineering

The difference between a star schema and a snowflake schema is smaller than the
debate around it suggests. Both are dimensional models — a central fact table
surrounded by dimensions — and the entire distinction is one decision: do you
keep each dimension in a single flat table (star), or normalize it into related
sub-tables (snowflake)? For analytics on a modern cloud warehouse, the star is
almost always the better default. Here's why, with a worked example and a diagram.

Star vs snowflake, at a glance

	Star schema	Snowflake schema
Dimensions	Denormalized — one flat table each	Normalized into sub-tables
Joins per query	Fewer (fact → dimension)	More (fact → dimension → sub-tables)
Query simplicity	High — easy to read and write	Lower — must traverse the hierarchy
Storage	Slightly more (repeated values)	Slightly less (values stored once)
Query speed (columnar)	Usually faster	Usually slower
Maintenance	Simpler	More tables to keep in sync
Best for	Most analytics on cloud warehouses	Very large or compliance-bound dimensions

The one real difference

In a star schema, each dimension is a single, wide, denormalized table — the
product dimension holds the product, its category, its brand, and its supplier all in
one place, even though "Electronics" repeats across many rows. In a snowflake
schema, you normalize that dimension into a branching hierarchy: product points to a
separate category table, which points to a department table, and so on. The single
dimension "snowflakes" out into smaller related tables, which is where the name comes
from.

        STAR SCHEMA                          SNOWFLAKE SCHEMA

         dim_date                                dim_date
            |                                        |
 dim_customer — fact_sales — dim_product   dim_customer — fact_sales — dim_product
            |                                        |                    |
         dim_store                                dim_store          (category)
                                                                         |
                                                                      (brand)

 Dimensions sit directly on        A dimension (product) is normalized
 the fact table.                    into further sub-tables.

If you understand why dimensional models split measurements from
context, you already understand both —
snowflaking is just normalization applied
to the dimension tables.

A worked example

Say you want sales by product category. In a star, category lives right on the
product dimension, so it's one join:

-- STAR: one join, category is on the dimension
SELECT p.category, SUM(f.net_amount) AS revenue
FROM fact_sales f
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY p.category;

In a snowflake, category has been normalized into its own table, so the same
question now traverses the hierarchy:

-- SNOWFLAKE: an extra hop to reach category
SELECT c.category, SUM(f.net_amount) AS revenue
FROM fact_sales f
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_category c ON p.category_key = c.category_key
GROUP BY c.category;

Every level of normalization is another join the analyst must write and the engine
must execute. Multiply that across a real schema and the snowflake's "tidiness"
becomes a steady tax on every query.

When to use a star schema

For analytics on a columnar cloud warehouse — which is most analytics today —
default to the star. Denormalize your dimensions. The storage cost is negligible
because columnar engines compress repeated values away to almost nothing, queries are
dramatically simpler, and performance is typically better than the snowflake, not
worse. Optimizing for storage by normalizing is solving a 1998 problem with a 2026
bill.

When to use a snowflake schema

Reach for snowflaking only in specific cases, and even then only for the dimension
that needs it:

A dimension is genuinely enormous (tens of millions of rows) and a shared attribute is large and highly repetitive, so the storage saving is material.
A rapidly changing shared attribute is meaningfully cheaper and safer to update in one normalized place.
A compliance or governance rule forces a single authoritative table for an entity.

Mixing is fine — a mostly-star model with one snowflaked dimension is a perfectly
reasonable, pragmatic design. You don't owe the schema purity.

The thing underneath the choice

"Star vs snowflake" is really a proxy for an older question: normalize for
write-efficiency, or denormalize for read-efficiency? A warehouse is overwhelmingly
read-heavy — written by a few pipelines, queried by everyone — so it should optimize
for reads, which means denormalizing, which means the star. (If you want the deeper
version of that trade-off, see normalization vs
denormalization; if you want the even more
aggressive end of denormalization, see one big table vs the star
schema.)

Pick the star by default. Snowflake a dimension only when you can name the specific
problem it solves. And don't lose an afternoon to the debate — it was only ever one
decision wearing two names.

FAQ

What is the difference between a star schema and a snowflake schema?
A star schema keeps each dimension in a single flat, denormalized table. A snowflake schema normalizes those dimensions into multiple related sub-tables. That one choice — denormalized versus normalized dimensions — is the entire distinction; the fact table is the same in both.

Which is faster, star schema or snowflake schema?
On modern columnar warehouses, usually the star. Denormalized dimensions mean fewer joins at query time, and columnar compression shrinks the repeated values that normalization was meant to eliminate, so the snowflake's storage saving rarely outweighs its extra join cost.

When should you use a snowflake schema?
When a dimension is genuinely enormous and a shared attribute is large and highly repetitive, when a rapidly changing shared attribute is cheaper to update in one normalized place, or when a compliance rule forces a single authoritative table. Even then, snowflake only the dimension that needs it.

Is the snowflake schema related to the Snowflake data warehouse?
No. The schema pattern is decades older than the vendor and unrelated to it — you can build star or snowflake schemas on any warehouse, including Snowflake, BigQuery, or Redshift.

This post was originally published on dataarchitect.studio, where I write about data architecture, dimensional modeling, and the lakehouse.