Data Modelling and Scheming in PowerBI

#analytics #data #dataengineering #microsoft

Power BI is a data analytics and data visualization tool that can be useful to generate insights that will be used to make informed business decisions. Since is the source of these insights is the data, it has to be accurate, consistent and relevant in order to generate quality results that are usable in the decision making process. Data modelling and data scheming are concepts used to ensure the data quality is maintained, allowing for analysing and visualizing to be done quickly and efficiently. Data modelling is the process of structuring your data in tables, defining the relationships between the tables and defining calculations done data in the tables in form of expressions and formulas. Data scheming on the other hand is process of coming up with an architectural design pattern of your data tables to ensure analysis of data sourced from the different table is seamless and efficient.

A robust data model by the current standards has to have 2 types of tables:

A fact table
A dimension table

The fact table is the primary and central table in a data model. It used to store quantitative and measurable data like transactions and orders.
A dimensional table is one that has descriptive information about the data. Thereby giving more context about the data.

Relationships

Relationships are conceptual ideas that serve to indicate how data between different tables is related or connected. This is achieved through the use of a primary and a foreign key. The primary is located in dimensional tables while the foreign key is the fact table.
There are several types of relationships:

One-to-Many: This is the most common relationship, where one record in the dimension table relates to multiple records in the fact table.
One-to-One: Where one record in the dimension table relates to one record in the fact table.
Many-to-Many: Where more than one record in a table is related to more than one record in another table.

Schemas

Star Schema
This schema involves have a central table that serves as the fact table and other tables radiate from it and make up the dimensional tables. This organization forms a start like representation hence the name star schema.
Star schema’s strength lay in its simplicity and performance. The performance is guaranteed since few joins are needed to aggregate the data from multiple data tables. For these reasons, start schema stand to be the most widely used and recommended schema in Power BI.
Snowflake Schema

The snowflake schema extends the star schema by introducing more dimension tables. Instead of a single product dimension, you can break it into separate tables for products, categories, and subcategories.
Although this reduces data redundancy and mirrors traditional database design principles, it comes with trade-offs in Power BI like slower queries due to the necessity for more joins. The higher number of joins is as a result of the additional relationship chains that the engine must navigate, potentially slowing down the query performance. Additionally, it is hard to manage as it makes DAX calculations complicated and makes models more difficult for business users to understand.

Why Modeling Quality Matters

Performance: Poor data modelling not only slows down reports but can also produce incorrect results thereby reducing productivity. On the other hand, A well-designed model model minimizes the number of joins required for calculations, significantly speeding up report generation.
Accuracy and Reliability: A well-structured data model can also make automation a seamless process well defined relationships allow Power BI to efficiently handle daily, hourly, or scheduled data refreshes, preventing "timeout errors" common in un-modeled, flat, or chaotic datasets. Clear relationships and distinct fact/dimension tables also prevent calculation errors and ensure that filters apply correctly. Ambiguous relationships or a flat, denormalized table can lead to incorrect aggregations and misleading insights.
Usability and Maintainability: A logical and intuitive data model is easier for report developers to understand and build upon. It simplifies DAX calculations and reduces the likelihood of introducing errors. It also makes the model easier to maintain and extend as business requirements evolve.
Scalability: A well-designed schema can handle growing data volumes without a proportional drop in performance. This is vital as organizations collect more and more data over time.

In conclusion, poorly done data modelling and data scheming can result in reduced productivity arising from slow calculations, inaccurate calculations and delayed reports.

Since data is business specific, it is prudent that one organizes data tables into clear relationships that mirror the actual operations of the business to improve conciseness and reduce ambiguity.

DEV Community

Data Modelling and Scheming in PowerBI

Relationships

Schemas

Why Modeling Quality Matters

Top comments (0)