Cyrus Ndungu

Posted on Feb 1

A pianist’s take on Power BI: Schemas & data modelling made musical 🎹

#powerbi #data

Hi — I’m someone who spends more than a little time at the keyboard. When I arrange a tune I think about structure (intro, verse, chorus, bridge) and how the parts fit together so the melody breathes. Data modelling in Power BI is the same kind of craft: if the foundation is good, the report performs and the insights sing. Below I’ll walk you through schemas, fact & dimension tables, relationships, and why good modelling matters — in plain, friendly language with practical tips you can use right away.

What you’ll get from this article

Clear definitions of fact tables, dimension tables, star and snowflake schemas
How relationships work in Power BI (direction, cardinality, many-to-many)
Why modelling affects performance and correctness (real-world examples)
A practical, step-by-step recipe to design a clean Power BI model
Quick checklist and troubleshooting tips

Facts and dimensions — the melody and the harmony

Think of a report like a song:

The fact table is the melody — the events you measure (sales, clicks, shipments).
The dimension tables are the harmonies — the context (dates, customers, products, regions).

Example: a simple sales model

FactSales (the melody)

OrderID, OrderLineID, DateKey, CustomerKey, ProductKey, Quantity, Revenue

DimDate (harmony)

DateKey, FullDate, Month, Quarter, Year

DimCustomer (harmony)

CustomerKey, CustomerName, Segment, Region

DimProduct (harmony)

ProductKey, ProductName, Category, Brand

Key characteristics:

Fact = many rows, numeric measures, foreign keys to dims.
Dimension = relatively few rows, descriptive attributes, primary key.

When fact and dims are aligned by consistent keys and grain, queries are simple and correct.

Star schema — the classic pop song (simple & fast)

A star schema has one central fact table with dimension tables radiating out. It’s the most common and recommended pattern for Power BI.

Visual (ASCII):

DimDate
         |
DimCustomer — FactSales — DimProduct
         |
      DimRegion

Why star schema works well in Power BI

Fewer joins → faster queries in VertiPaq (the in-memory engine).
DAX measures are simpler because relationships are straightforward.
Good for aggregation (rollups by date, product, customer).
Easy for report consumers to understand.

When to use it: Most analytics and reporting scenarios where speed and simplicity matter.

Snowflake schema — the classical piece (normalized)

A snowflake schema normalizes dimensions into multiple tables. Example: DimProduct → DimCategory → DimSubcategory.

Why you might choose snowflake:

Less redundancy; easier to maintain when attributes change frequently across many items.
Smaller dimension tables (sometimes saving storage).

Why it can slow things down in Power BI:

More joins increase query complexity and can slow VertiPaq queries.
DAX can become more complex when traversing normalized hierarchies.

Rule of thumb: prefer star for analytical models in Power BI. Use snowflake only when normalization gives clear maintenance or governance benefits.

Relationships — the chord progressions of your model

Relationships tell Power BI how tables connect. Important concepts:

Cardinality: One-to-many (1:), many-to-one (:1), many-to-many (:). Most common is 1:* (dimension → fact).
Cross-filter direction: single or both. Single is safer and faster; both (bidirectional) can be convenient but may introduce ambiguity and performance issues.
Active vs inactive relationships: only active relationships filter by default. USERELATIONSHIP in DAX can activate an inactive relationship in a calculation.
Relationship keys: use surrogate numeric keys (integers) for best performance; avoid text-based keys for relationships if possible.

Examples (DAX)

Basic measure

Total Revenue = SUM(FactSales[Revenue])
Use an alternate date relationship

Total Sales by Ship Date =
CALCULATE(
[Total Revenue],
USERELATIONSHIP(DimDate[DateKey], FactSales[ShipDateKey])
)

Many-to-many: use a bridge table or composite model to avoid ambiguous filters and double counting.

Grain matters — set the right level for facts

The “grain” of a fact table defines what a single row represents:

Order line (one row per SKU per order)
Order header (one row per order)
Daily aggregated sales (one row per product per day)

If granularity is inconsistent across tables or measures, you’ll get wrong numbers (double counts, weird averages). Always:

Decide the grain early.
Keep the fact table at the lowest necessary grain for your reports.
Use aggregated tables for faster summary reports if needed.

Performance — why modelling makes or breaks speed

Power BI uses VertiPaq: a columnar, in-memory engine with dictionary encoding and compression. Good modelling optimizes those internals.

Practical performance rules

Remove unnecessary columns (they increase memory).
Prefer numeric surrogate keys — smaller dictionaries and faster joins.
Reduce cardinality where possible (high-cardinality columns are expensive).
Use star schema so queries join fewer tables.
Keep dimension attributes that you use in visuals; move rarely used attributes to a separate table.
Use Import mode for best performance; DirectQuery has runtime dependency on source and limits optimization.
Use incremental refresh for large fact tables.

Advanced tools

Aggregations: create pre-aggregated summary tables for high-level reports and let Power BI route queries to them.
Composite models & Dual storage mode: combine Import and DirectQuery, use dual to optimize lookup tables.
VertiPaq Analyzer or Power BI Performance Analyzer to find bottlenecks.

Concrete benefit: a well-designed star model can reduce dataset size drastically and cut query times from minutes to seconds.

Accuracy — avoid the false harmonies

Bad modelling doesn’t just slow you down — it misleads.

Common accuracy pitfalls

Duplicate or inconsistent dimension keys (e.g., “John Smith” vs “john smith”) → wrong joins and inflated counts.
Mixing granularities in measures (summing order lines then counting orders without DISTINCT) → double counts.
Using many-to-many without careful bridging → incorrect aggregations.
Relying on bi-directional filters to “fix” an issue — it may mask poor model design.

How to validate

Do spot checks: compare totals in your fact table vs a simple SUM in the model.
Verify DISTINCTCOUNT(OrderID) between source and model.
Use small test measures to assert expected behavior before building complex visuals.

Practical recipe: build a clean Power BI model (step-by-step)

In Power Query: clean & shape

- Remove unused columns.

- Standardize keys and text (trim, clean, proper case).

- Ensure dates are real date types.

- Aggregate if you can reduce grain safely.

- Create surrogate keys if necessary (e.g., ProductKey).

Create a Date table (essential)

- Generate a full date dimension with Year, Quarter, Month, Fiscal columns.

- Mark it as Date table in Model view (Modeling → Mark as Date table).

Load fact(s) and dimension(s)

- FactSales should have foreign keys to dims.

- Ensure keys are the right data type (whole number for IDs).

In Model view:

- Create one-to-many relationships from dimension → fact.

- Set cross-filter to single direction unless you have a specific reason.

- Hide technical key columns from report view.

Create measures (not calculated columns) where possible

- Measures calculate on the fly and are memory efficient.


Total Revenue = SUM(FactSales[Revenue])
Orders = DISTINCTCOUNT(FactSales[OrderID])
Average Order Value = DIVIDE([Total Revenue], [Orders])

Test for correctness

- Compare totals to source extracts.

- Validate a few sample customers/products/dates.

Optimize

- Remove unused columns/tables.

- Consider aggregations for very large datasets.

- Use incremental refresh for historical fact data.

Handling special cases

Slowly Changing Dimensions (SCD)

Type 1: overwrite attributes (current view only)
Type 2: store history with row-effective dates or version keys — useful when you need historical reporting at the same grain as facts.

Role-playing dimensions

Date can be order date, ship date, invoice date. Use separate foreign keys in fact and USERELATIONSHIP for alternate measures.

Many-to-many

Use a bridge (junction) table or composite model; avoid ad-hoc bidirectional relationships.

DirectQuery & Composite models

DirectQuery keeps data at source (good for real-time but slower).
Composite models allow mixing Import and DirectQuery to get the best of both worlds.

Quick checklist — tuneup before publishing

Grain of fact table defined and documented
Star schema (or justified snowflake) in place
Date table present and marked as such
Relationships 1:* with single direction (unless required)
Numeric surrogate keys used for joins
Unused columns removed & hidden from report view
Measures created for aggregation (not unnecessary calculated columns)
Test totals match source system for several samples
Performance validated (Performance Analyzer)

Troubleshooting common issues (quick tips)

Wrong totals? Check relationship direction and active relationships.
Duplicate counts? Check grain and use DISTINCTCOUNT.
Slow visuals? Remove high-cardinality columns from visuals, consider aggregation.
Many-to-many confusion? Introduce a bridge table and use measures carefully.

Final chord — why this matters

Good modelling is the sheet music for your data. When you model well:

Reports are fast and responsive (your audience stays engaged).
Numbers are correct and trustworthy (your stakeholders have confidence).
DAX stays readable and maintainable (you can iterate quickly).
Future changes are easier — like modulating into a new key without breaking the song.

Start simple: build a clean star schema, treat the date table as sacred, use measures, and optimize only where you need to. As a keyboard player, I know how freeing it feels to have the skeleton of a good chord progression — you can improvise wonders on top. The same is true for your data: solid structure unlocks creativity.

DEV Community

A pianist’s take on Power BI: Schemas & data modelling made musical 🎹

Top comments (0)