Schemas & Data Modelling in Power BI

#tutorial #productivity #discuss #learning

![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4em6iuxplrbtrwmusi74.jpg

Data Modelling in Power BI

Data modelling is the process of structuring your tables and defining relationships so Power BI can:

Aggregate data correctly

Filter data efficiently

Produce accurate measures

Perform fast, even with large datasets

Think of it as designing the blueprint of your report before decorating it with visuals.

Star Schema Overview

Star schema is a mature modeling approach widely adopted by relational data warehouses.

It requires modelers to classify their model tables as either dimension or fact.

Dimension Tables

Dimension tables describe business entities—the things you model. Entities can include products, people, places, and concepts, including time itself.
The most consistent table in a star schema is the date dimension table. A dimension table contains a key column (or columns) that acts as a unique identifier, and other columns. Other columns support filtering and grouping your data.

Fact Tables

Fact tables store observations or events, and can be sales orders, stock balances, exchange rates, temperatures, and more.
A fact table contains dimension key columns that relate to dimension tables and numeric measure columns.
The dimension key columns determine the dimensionality of a fact table, while the dimension key values determine the granularity of a fact table. For example, consider a fact table designed to store sales targets that has two dimension key columns, Date and ProductKey.
It's easy to understand that the table has two dimensions. The granularity, however, can't be determined without considering the dimension key values.
In this example, consider that the values stored in the Date column are the first day of each month.
In this case, the granularity is at the month-product level.

Generally, dimension tables contain a relatively small number of rows. Fact tables, on the other hand, can contain many rows and continue to grow over time.

Fig. 1.1

Normalization vs. denormalization

To understand some star schema concepts described in this article, it's important to know two terms: normalization and denormalization.

Normalization is the term used to describe data that's stored in a way that reduces repetitious data. Consider a table of products that has a unique key value column, like the product key, and other columns that describe product characteristics, like product name, category, color, and size. A sales table is considered normalized when it stores only keys, like the product key. In the following image, notice that only the ProductKey column records the product.

Fig. 1.2

If, however, the sales table stores product details beyond the key, it's considered denormalized. In the following image, notice that the ProductKey and other product-related columns record the product.

Fig. 1.3

When you source data from an export file or data extract, it's likely that it represents a denormalized set of data. In this case, use Power Query to transform and shape the source data into multiple normalized tables.

As described in this article, you should strive to develop optimized Power BI semantic models with tables that represent normalized fact and dimension data. However, there's one exception where a snowflake dimension might be denormalized in order to produce a single model table.

Star schema relevance to Power BI semantic models

Star schema design and many related concepts introduced in this article are highly relevant to developing Power BI models that are optimized for performance and usability.

Consider that each Power BI report visual generates a query that's sent to the Power BI semantic model. Generally, queries filter, group, and summarize model data. A well-designed model, then, is one that provides tables for filtering and grouping, and tables for summarizing. This design fits well with star schema principles:

Dimension tables enable filtering and grouping.

Fact tables enable summarization.

There's no table property that modelers set to set the table type as dimension or fact. It's in fact determined by the model relationships. A model relationship establishes a filter propagation path between two tables, and it's the cardinality property of the relationship that determines the table type. A common relationship cardinality is one-to-many or its inverse many-to-one. The "one" side is always a dimension table while the "many" side is always a fact table.

Fig. 1.4

A well-structured model design includes tables that are either dimension tables or fact tables. Avoid mixing the two types together for a single table. We also recommend that you strive to deliver the right number of tables with the right relationships in place. It's also important that fact tables always load data at a consistent grain.

Lastly, it's important to understand that optimal model design is part science and part art. Sometimes you can break with good guidance when it makes sense to do so.

Snowflake Dimension

A snowflake dimension is a set of normalized tables for a single business entity. For example, Adventure Works classifies products by category and subcategory. Products are assigned to subcategories, and subcategories are in turn assigned to categories. In the Adventure Works relational data warehouse, the product dimension is normalized and stored in three related tables

Fig. 1.5
If you use your imagination, you can picture the normalized tables positioned outwards from the fact table, forming a snowflake design.

Fig. 1.6
In Power BI Desktop, you can choose to mimic a snowflake dimension design (perhaps because your source data does) or combine the source tables to form a single, denormalized model table. Generally, the benefits of a single model table outweigh the benefits of multiple model tables. The most optimal decision can depend on the volumes of data and the usability requirements for the model.

When you choose to mimic a snowflake dimension design:

Power BI loads more tables, which is less efficient from storage and performance perspectives. These tables must include columns to support model relationships, and it can result in a larger model size.
Longer relationship filter propagation chains need to be traversed, which might be less efficient than filters applied to a single table.
The Data pane presents more model tables to report authors, which can result in a less intuitive experience, especially when snowflake dimension tables contain only one or two columns.
It's not possible to create a hierarchy that comprises columns from more than one table.
When you choose to integrate into a single model table, you can also define a hierarchy that encompasses the highest and lowest grain of the dimension. Possibly, the storage of redundant denormalized data can result in increased model storage size, particularly for large dimension tables.

Fig 1.7