DEV Community

Cover image for A pianist’s take on Power BI: Schemas & data modelling made musical 🎹
Cyrus Ndungu
Cyrus Ndungu

Posted on

A pianist’s take on Power BI: Schemas & data modelling made musical 🎹

Hi — I’m someone who spends more than a little time at the keyboard. When I arrange a tune I think about structure (intro, verse, chorus, bridge) and how the parts fit together so the melody breathes. Data modelling in Power BI is the same kind of craft: if the foundation is good, the report performs and the insights sing. Below I’ll walk you through schemas, fact & dimension tables, relationships, and why good modelling matters — in plain, friendly language with practical tips you can use right away.


What you’ll get from this article

  • Clear definitions of fact tables, dimension tables, star and snowflake schemas

  • How relationships work in Power BI (direction, cardinality, many-to-many)

  • Why modelling affects performance and correctness (real-world examples)

  • A practical, step-by-step recipe to design a clean Power BI model

  • Quick checklist and troubleshooting tips


Facts and dimensions — the melody and the harmony

Think of a report like a song:

  • The fact table is the melody — the events you measure (sales, clicks, shipments).

  • The dimension tables are the harmonies — the context (dates, customers, products, regions).

Example: a simple sales model

FactSales (the melody)

  • OrderID, OrderLineID, DateKey, CustomerKey, ProductKey, Quantity, Revenue

DimDate (harmony)

  • DateKey, FullDate, Month, Quarter, Year

DimCustomer (harmony)

  • CustomerKey, CustomerName, Segment, Region

DimProduct (harmony)

  • ProductKey, ProductName, Category, Brand

Key characteristics:

  • Fact = many rows, numeric measures, foreign keys to dims.

  • Dimension = relatively few rows, descriptive attributes, primary key.

When fact and dims are aligned by consistent keys and grain, queries are simple and correct.


Star schema — the classic pop song (simple & fast)

A star schema has one central fact table with dimension tables radiating out. It’s the most common and recommended pattern for Power BI.

Visual (ASCII):

DimDate
         |
DimCustomer — FactSales — DimProduct
         |
      DimRegion
Enter fullscreen mode Exit fullscreen mode

Why star schema works well in Power BI

  • Fewer joins → faster queries in VertiPaq (the in-memory engine).

  • DAX measures are simpler because relationships are straightforward.

  • Good for aggregation (rollups by date, product, customer).

  • Easy for report consumers to understand.

When to use it: Most analytics and reporting scenarios where speed and simplicity matter.


Snowflake schema — the classical piece (normalized)

A snowflake schema normalizes dimensions into multiple tables. Example: DimProduct → DimCategory → DimSubcategory.

Why you might choose snowflake:

  • Less redundancy; easier to maintain when attributes change frequently across many items.

  • Smaller dimension tables (sometimes saving storage).

Why it can slow things down in Power BI:

  • More joins increase query complexity and can slow VertiPaq queries.

  • DAX can become more complex when traversing normalized hierarchies.

Rule of thumb: prefer star for analytical models in Power BI. Use snowflake only when normalization gives clear maintenance or governance benefits.


Relationships — the chord progressions of your model

Relationships tell Power BI how tables connect. Important concepts:

  • Cardinality: One-to-many (1:), many-to-one (:1), many-to-many (:). Most common is 1:* (dimension → fact).

  • Cross-filter direction: single or both. Single is safer and faster; both (bidirectional) can be convenient but may introduce ambiguity and performance issues.

  • Active vs inactive relationships: only active relationships filter by default. USERELATIONSHIP in DAX can activate an inactive relationship in a calculation.

  • Relationship keys: use surrogate numeric keys (integers) for best performance; avoid text-based keys for relationships if possible.

Examples (DAX)

  • Basic measure

    Total Revenue = SUM(FactSales[Revenue])

  • Use an alternate date relationship

    Total Sales by Ship Date =
    CALCULATE(
    [Total Revenue],
    USERELATIONSHIP(DimDate[DateKey], FactSales[ShipDateKey])
    )

Many-to-many: use a bridge table or composite model to avoid ambiguous filters and double counting.


Grain matters — set the right level for facts

The “grain” of a fact table defines what a single row represents:

  • Order line (one row per SKU per order)

  • Order header (one row per order)

  • Daily aggregated sales (one row per product per day)

If granularity is inconsistent across tables or measures, you’ll get wrong numbers (double counts, weird averages). Always:

  • Decide the grain early.

  • Keep the fact table at the lowest necessary grain for your reports.

  • Use aggregated tables for faster summary reports if needed.


Performance — why modelling makes or breaks speed

Power BI uses VertiPaq: a columnar, in-memory engine with dictionary encoding and compression. Good modelling optimizes those internals.

Practical performance rules

  • Remove unnecessary columns (they increase memory).

  • Prefer numeric surrogate keys — smaller dictionaries and faster joins.

  • Reduce cardinality where possible (high-cardinality columns are expensive).

  • Use star schema so queries join fewer tables.

  • Keep dimension attributes that you use in visuals; move rarely used attributes to a separate table.

  • Use Import mode for best performance; DirectQuery has runtime dependency on source and limits optimization.

  • Use incremental refresh for large fact tables.

Advanced tools

  • Aggregations: create pre-aggregated summary tables for high-level reports and let Power BI route queries to them.

  • Composite models & Dual storage mode: combine Import and DirectQuery, use dual to optimize lookup tables.

  • VertiPaq Analyzer or Power BI Performance Analyzer to find bottlenecks.

Concrete benefit: a well-designed star model can reduce dataset size drastically and cut query times from minutes to seconds.


Accuracy — avoid the false harmonies

Bad modelling doesn’t just slow you down — it misleads.

Common accuracy pitfalls

  • Duplicate or inconsistent dimension keys (e.g., “John Smith” vs “john smith”) → wrong joins and inflated counts.

  • Mixing granularities in measures (summing order lines then counting orders without DISTINCT) → double counts.

  • Using many-to-many without careful bridging → incorrect aggregations.

  • Relying on bi-directional filters to “fix” an issue — it may mask poor model design.

How to validate

  • Do spot checks: compare totals in your fact table vs a simple SUM in the model.

  • Verify DISTINCTCOUNT(OrderID) between source and model.

  • Use small test measures to assert expected behavior before building complex visuals.


Practical recipe: build a clean Power BI model (step-by-step)

In Power Query: clean & shape

- Remove unused columns.

- Standardize keys and text (trim, clean, proper case).

- Ensure dates are real date types.

- Aggregate if you can reduce grain safely.

- Create surrogate keys if necessary (e.g., ProductKey).
Enter fullscreen mode Exit fullscreen mode
  1. Create a Date table (essential)
- Generate a full date dimension with Year, Quarter, Month, Fiscal columns.

- Mark it as Date table in Model view (Modeling → Mark as Date table).
Enter fullscreen mode Exit fullscreen mode
  1. Load fact(s) and dimension(s)
- FactSales should have foreign keys to dims.

- Ensure keys are the right data type (whole number for IDs).
Enter fullscreen mode Exit fullscreen mode
  1. In Model view:
- Create one-to-many relationships from dimension → fact.

- Set cross-filter to single direction unless you have a specific reason.

- Hide technical key columns from report view.
Enter fullscreen mode Exit fullscreen mode
  1. Create measures (not calculated columns) where possible
- Measures calculate on the fly and are memory efficient.


Total Revenue = SUM(FactSales[Revenue])
Orders = DISTINCTCOUNT(FactSales[OrderID])
Average Order Value = DIVIDE([Total Revenue], [Orders])
Enter fullscreen mode Exit fullscreen mode
  1. Test for correctness
- Compare totals to source extracts.

- Validate a few sample customers/products/dates.
Enter fullscreen mode Exit fullscreen mode
  1. Optimize
- Remove unused columns/tables.

- Consider aggregations for very large datasets.

- Use incremental refresh for historical fact data.
Enter fullscreen mode Exit fullscreen mode

Handling special cases

Slowly Changing Dimensions (SCD)

  • Type 1: overwrite attributes (current view only)

  • Type 2: store history with row-effective dates or version keys — useful when you need historical reporting at the same grain as facts.

Role-playing dimensions

  • Date can be order date, ship date, invoice date. Use separate foreign keys in fact and USERELATIONSHIP for alternate measures.

Many-to-many

  • Use a bridge (junction) table or composite model; avoid ad-hoc bidirectional relationships.

DirectQuery & Composite models

  • DirectQuery keeps data at source (good for real-time but slower).

  • Composite models allow mixing Import and DirectQuery to get the best of both worlds.


Quick checklist — tuneup before publishing

  • Grain of fact table defined and documented

  • Star schema (or justified snowflake) in place

  • Date table present and marked as such

  • Relationships 1:* with single direction (unless required)

  • Numeric surrogate keys used for joins

  • Unused columns removed & hidden from report view

  • Measures created for aggregation (not unnecessary calculated columns)

  • Test totals match source system for several samples

  • Performance validated (Performance Analyzer)


Troubleshooting common issues (quick tips)

  • Wrong totals? Check relationship direction and active relationships.

  • Duplicate counts? Check grain and use DISTINCTCOUNT.

  • Slow visuals? Remove high-cardinality columns from visuals, consider aggregation.

  • Many-to-many confusion? Introduce a bridge table and use measures carefully.


Final chord — why this matters

Good modelling is the sheet music for your data. When you model well:

  • Reports are fast and responsive (your audience stays engaged).

  • Numbers are correct and trustworthy (your stakeholders have confidence).

  • DAX stays readable and maintainable (you can iterate quickly).

  • Future changes are easier — like modulating into a new key without breaking the song.

Start simple: build a clean star schema, treat the date table as sacred, use measures, and optimize only where you need to. As a keyboard player, I know how freeing it feels to have the skeleton of a good chord progression — you can improvise wonders on top. The same is true for your data: solid structure unlocks creativity.

Top comments (0)