Mastering Data Modeling in Power BI: Schemas, Tables, and Relationships for Optimal Performance

#schemas #datascience #data

Currently, the world has become inseparable from data and that means we have to adopt and align ourselves too to survive. In order for us to survive we need to know what data is and what tools are used to manipulate data in the most efficient ways.
Enterprises all over are moving towards data, everybody wants to know what data is needed, where do we get it from and why is it important?
Well I joined data science to get these answers too, and when I began the course were introduce to a tool called Power BI.

So what is POWER BI??
In simple terms, this is a Business Intelligence tool-this is what "BI" in "power BI" stands for.
The term ‘Business Intelligence’ ascribes to technologies, software, and processes that gather unprocessed business data and transform it into informative data. Power BI has user-friendly exploitation and effectiveness that gives every person, data savvy or not, even the CEO, the opportunity to extract their full potential out of their data. You just gotta learn a few things about it first.

However, the true magic of Power BI lies not just in its visualization capabilities(which are superb btw) but in the underlying data model. Effective data modeling ensures that reports are accurate, performant, and scalable. This article delves into key concepts like star and snowflake schemas, fact and dimension tables and relationships in Power BI, while explaining why good modeling is essential for good reporting.

What is a schema?
It refers to the structure and organization of data within a data model. Schemas define how data is connected and related within the model, influencing the efficiency and performance of data queries and reports. Understanding schemas helps in designing best data models.

What is data modelling?
Data modeling is the process of identifying, organizing and defining the types of data a business collects and the relationships between them. It uses diagrams, symbols and textual definitions to visually represent how data is captured, stored and used. A well-designed data model helps to understand data requirements, ensure proper structure for reporting, align with business goals and maintain data integrity.

FACT TABLES VS DIMENSION TABLES

These two are the building blocks of data models in power BI.

Fact Tables

Are the heart of the model and store quantitative data i.e the data you want to analyze such as sales amounts, transaction counts. They contain figures. Fact tables are often large, containing huge chunks of data.

Dimension Tables

These provide context to the facts, describing the "who, what, when, where, and why" of your data. They contain descriptive attributes, such as product names, customer demographics, or date hierarchies (year, quarter, month). Dimension tables are usually smaller than fact tables.

The separation of these two highly increases efficiency in data querying.
It is from these two tables that different types of schemas have been developed such as the star and snowflake schema.

1. STAR SCHEMA

This Schema organizes data into a central fact table linked to multiple dimension tables, forming a layout that visually resembles a star. This structure makes analytical queries fast, simple, and efficient making it one of the most commonly used data modeling techniques in data warehousing.

Fact Table: The fact table sits at the center of the schema and stores the measurable, quantitative data used for analysis such as Sales amount, Units sold, Discount and profit.
NB: Each record in a fact table represents a business event (e.g., a sales transaction).
Dimension Tables: Dimension tables surround the fact table and contain descriptive attributes that add context to the facts such as Product details, Customer details and Employee or store information
These tables allow users to filter, and group the fact data for analysis (e.g., sales by region or by month or by product category).
Also note that the star schema does not have to be the shape of a star.
The picture below is a star schema as well, though, not in the shape of a star.

2. SNOWFLAKE SCHEMA

Now imagine if the dimension tables had branches. This means we have a primary source-the fact table- then the dimension tables then other branches attached to dimension tables making it look like the respiratory system or a root. This is the snowflake schema.
It is mainly used when hierarchies are so complex that a star schema would not be able to filter efficiently. This helps reduce data redundancy and develop a more detailed model compared to star schema.

Dimension tables are normalized into multiple related tables, creating a hierarchical or "snowflake" structure.

Fact table is still located at the center of the schema, surrounded by the dimension tables. However, each dimension table is further broken down into multiple related tables, creating a hierarchical structure that resembles a snowflake.

RELATIONSHIPS
This is the glue, the bond that hold everything together i.e the fact table connection to the dimension table and dimension table to others....
They define how tables interact in power BI. This can be done manually or through auto-detect function.
There are different types of table relationships namely:

1. One-to-One Relationship: Each row in the first table is related to only one row in the second table.
2. Many-to-One Relationship: **Many rows in the first table are related to one row in the second table.
**3. One-to-Many Relationship: One row in the first table is related to one or more rows in the second table.
4. Many-to-Many Relationship: Each row in the first table can be related to multiple rows in the second table. This type requires an intermediate table to link the two tables.

**THE IMPORTANCE OF GOOD MODELLING*

1. Reducing Redundancy: Normalized schemas like the snowflake schema help reduce data redundancy, leading to more efficient storage utilization.
2. Managing Large Datasets: Efficient schema design is crucial for managing and storing large volumes of data without compromising performance.
3. Optimizing Queries: Well-designed schemas improve query performance by reducing the complexity of joins and filtering operations.
4. Retrieval speed: Proper indexing of key columns in fact and dimension tables enhances data retrieval speed.

References