SCHEMAS AND DATA MODELLING - POWER BI

#analytics #data #dataengineering #microsoft

Data modelling is a foundational aspect of Power BI that determines how data is structured, related, and queried for analysis. Data modelling defines the structural relationship between tables to ensure query performance and reporting accuracy. This article explores key concepts like star and snowflake schemas, fact and dimension tables, relationships, and why prioritizing good design is essential for optimal performance and reliable reporting.

Understanding Star Schema

The star schema is the standard for Power BI. It utilises a central fact table connected to multiple dimension tables.

Fact Tables: Contain quantitative metrics (e.g., sales, temperature) and foreign keys. They define the model's granularity.
Dimension Tables: Contain descriptive attributes such as business attributes (e.g., product names, dates). They provide the context for filtering and grouping.

Relationships in a star schema are typically one-to-many, flowing from the dimension table to the fact table. This configuration minimizes query complexity. Usually the fact tables represent the "many" while dimension tables represent the "one" aspect in the relationships.

Snowflake Schema

A snowflake schema normalises dimension tables into sub-dimensions (e.g., a product dimension split into separate tables for category, subcategory, and product details). While this reduces data redundancy at the source, it is inefficient for Power BI. This is because it results in more tables, longer filter propagation chains, increased model complexity, and poorer performance due to additional joins.
Snowflake schemas may be useful in specific scenarios (e.g., when source data is heavily normalised or storage is a major constraint)

Performance Impact: Increased table counts and longer filter propagation chains degrade speed.

Recommendation: Denormalise data into single-dimension tables to simplify the semantic model and improve usability.

Fact and Dimension Tables

Fact tables contain measurable data (e.g., sales orders, inventory levels) with foreign keys linking to dimensions and numeric columns for aggregation. They grow over time and define the model's granularity—ensuring consistency is critical to avoid inaccurate summaries.
Dimension tables hold descriptive attributes (e.g., product names, customer details, dates) with a unique key (often a surrogate key for handling changes like slowly changing dimensions). They are smaller and support hierarchies for drilling down in reports

Relationships in Power Bi

Relationships define how tables connect and how filters propagate.
Directionality: Single-direction or active relationships filters are preferred. Bi-directional and many-to-many relationships should be limited as they increase logic complexity and slow queries

Active vs. Inactive: Active relationships are the default path for filter propagation. Role-playing dimensions (e.g., multiple date types) should be handled via separate tables rather than complex inactive relationship chains.

Why Good Modelling Is Critical

Poor data modelling leads to bloated models, slow report rendering, inaccurate results (e.g., from inconsistent granularity), and a confusing user experience. A well-designed star schema optimises compression, reduces query complexity, and scales better for large datasets.
In contrast, flat tables or excessive snowflaking increase redundancy or joins, hurting refresh times and report interactivity. Best practices include using Power Query for transformations, surrogate keys, and avoiding unnecessary columns to keep models lean.
In summary, prioritising star schema principles in Power BI data modelling delivers faster, more reliable insights—making it a cornerstone for any serious BI implementation.

DEV Community