How to Implement Data Modelling in Power BI

#dataengineering #datascience #bigdata #beginners

In this article we will explore the fundamental concepts of data modelling and specifically how to implement data modelling within Power BI for effective data analysis.

Data Modelling is simply structuring or setting up your data in tables, relationships, and access to data for analysis scenarios.

Often you'll find data distributed in several sheets or systems since, that is how it's maintained. If lucky your data could be in a neat setup (Data Warehouse) where most of the structuring has already been done.

Case Scenario:

Here is some sample data, to help make sense of this concept:
Sample Chocolate Sales Data

Understanding the Tables

In our case, we have 5 tables.

Fact Table: One table that contains pointers to other tables, but has no specific information. In our case, the shipments table.
Dimension Tables: Each explains one dimension of data. What is happening from the perspective of each dimension/entity. These will be the rest of our tables.
Calendar Table: A special Dimension table found in most data that holds the time component of analysis, i.e., [Timeseries, Forecasting or Trend analysis].

Doing this in Power BI

Load Data: Load the data in Power BI by selecting Excel Workbook as the data source.
Automatic Modelling: Power BI automatically tries to match the tables and create a model; you can see this in the Model View.
Manual Relationship Setup: 🤔 However, not all the data was modelled automatically. To finish up, drag the 'Shipdate' field from the Shipment table onto 'cal_date' in the Calendar table to create a new relationship and complete the look.

Note: Notice that the schema now looks Star-shaped. This kind of model with one fact table and multiple dimension tables is called a Star Schema. The process of setting up this structure is Data Modelling.
Manage Relationships: You can view and edit all connections by clicking on Manage Relationships. This menu shows the relationship status, specifically whether they are Active or Inactive.

Table Characteristics

Relationship Design: The Products table shows each product in a row with more information from the Shipments table indicating a Many to One Relationship.
Fact Table Design: The shipments table has more records as compared to the dimension tables in our data.

Therefore, Fact tables are engineered to host many records and may also have fewer columns as compared to dimension tables.

Conclusion

Effective data modelling is the backbone of any powerful analysis. By transforming messy, multi-sheet data into a structured Star Schema, everything will feel fast and intuitive; when it's wrong, you spend more time fixing "broken" numbers than actually analyzing data.

DEV Community