DEV Community

Cover image for ⭐ Star vs. ❄️ Snowflake: Designing the Data Warehouse
De' Clerke
De' Clerke

Posted on • Edited on

⭐ Star vs. ❄️ Snowflake: Designing the Data Warehouse

Once you have a data warehouse, how you organize data inside it determines whether analysts can write their own queries or need a data engineer for every report. This is where dimensional modeling comes in, and the two dominant approaches are the star schema and the snowflake schema.


The Building Blocks: Facts and Dimensions

Every dimensional model starts with the same two concepts.

Fact tables store measurable, quantitative events. Think trades, sales, page views, transactions. They're wide, contain many rows, and hold foreign keys that link to dimension tables.

Dimension tables store the descriptive context. They answer who, what, where, and when. Think symbols, sectors, dates, locations.


Star Schema

In a star schema, each dimension table connects directly to the fact table with a single join. There are no relationships between dimension tables. The shape looks like a star with the fact table at the center.

-- Fact table: one row per trade
CREATE TABLE fact_trades (
    trade_id    BIGSERIAL PRIMARY KEY,
    symbol_id   INT REFERENCES dim_symbol(symbol_id),
    date_id     INT REFERENCES dim_date(date_id),
    close_price NUMERIC(12, 4),
    volume      BIGINT,
    market_cap  NUMERIC(18, 2)
);

-- Dimension: symbol (sector is embedded directly — denormalized)
CREATE TABLE dim_symbol (
    symbol_id   SERIAL PRIMARY KEY,
    ticker      VARCHAR(20),
    company     VARCHAR(100),
    sector      VARCHAR(50),
    market      VARCHAR(20)
);

-- Dimension: date (pre-populated calendar table)
CREATE TABLE dim_date (
    date_id    SERIAL PRIMARY KEY,
    trade_date DATE,
    year       INT,
    quarter    INT,
    month      INT,
    week       INT,
    is_weekday BOOLEAN
);
Enter fullscreen mode Exit fullscreen mode

A query joining these tables needs exactly two joins regardless of how many dimensions you add:

SELECT
    s.sector,
    d.year,
    SUM(f.volume)       AS total_volume,
    AVG(f.close_price)  AS avg_price
FROM fact_trades f
JOIN dim_symbol s ON f.symbol_id = s.symbol_id
JOIN dim_date   d ON f.date_id   = d.date_id
WHERE d.year = 2025
GROUP BY s.sector, d.year
ORDER BY total_volume DESC;
Enter fullscreen mode Exit fullscreen mode

That simplicity is the main advantage. BI tools like Tableau and Power BI are optimized for exactly this join pattern. Non-technical analysts can build reports on a star schema without understanding normalization.


Snowflake Schema

The snowflake schema extends the star schema by normalizing the dimension tables. Instead of embedding sector directly in dim_symbol, you break it out into its own table and reference it with a foreign key.

-- dim_symbol references dim_sector instead of embedding it
CREATE TABLE dim_symbol (
    symbol_id SERIAL PRIMARY KEY,
    ticker    VARCHAR(20),
    company   VARCHAR(100),
    sector_id INT REFERENCES dim_sector(sector_id),
    market    VARCHAR(20)
);

-- dim_sector is now a separate normalized table
CREATE TABLE dim_sector (
    sector_id   SERIAL PRIMARY KEY,
    sector_name VARCHAR(50),
    industry    VARCHAR(100)
);
Enter fullscreen mode Exit fullscreen mode

The same query now requires three joins:

SELECT
    sec.sector_name,
    d.year,
    SUM(f.volume)      AS total_volume,
    AVG(f.close_price) AS avg_price
FROM fact_trades f
JOIN dim_symbol s   ON f.symbol_id  = s.symbol_id
JOIN dim_sector sec ON s.sector_id  = sec.sector_id
JOIN dim_date   d   ON f.date_id    = d.date_id
WHERE d.year = 2025
GROUP BY sec.sector_name, d.year
ORDER BY total_volume DESC;
Enter fullscreen mode Exit fullscreen mode

The extra join is the tradeoff for cleaner data. If a sector name changes, you update one row in dim_sector. In the star schema, you'd update every row in dim_symbol that references that sector.


Comparison

Feature Star Schema Snowflake Schema
Join depth One join per dimension Multiple joins per normalized chain
Data redundancy Higher (denormalized) Lower (normalized)
Query speed Generally faster Can be slower at scale
Ease for analysts High Lower
Maintenance Bulk updates if dimension attributes change Single-row updates
Storage Higher Lower

Which One to Choose

Star schema works well for most analytical workloads. The query simplicity matters, especially when analysts write their own reports. Most dbt projects and self-serve analytics warehouses use star schema because fewer joins mean fewer mistakes.

Snowflake schema makes sense when dimension tables are very large and storage is a real cost concern, or when dimension attributes change frequently and you can't afford bulk updates. It also fits better when data integrity needs to be enforced strictly at the database level rather than in application code.

In practice, most modern data stacks land on star schema by default and normalize specific dimensions only when there's a clear reason. The storage savings from normalization are rarely significant compared to the operational cost of more complex queries.


Follow me on dev.to for more data engineering content, or browse the project code at github.com/declerke.

Top comments (0)