#Data analytics #Data Science #Power Bi

Schemas And Data Modelling in Power BI
Introduction
• This article explores data modeling concepts often used to achieve high performance and accurate data analytics in Power BI.
The article focuses on key schema types in Power BI and on how detailed data modelling improves reporting accuracy.
• Data modelling refers to the procedures used by data analysts, data scientists, and data engineers to structure data in tables based on defined relationships and a logical framework. The objective of data modelling is to achieve effective data cleansing, build accurate calculations, and prepare detailed business intelligence reports.
• A schema refers to the structure and defined relationships of data within a designed data model. Schemas shape how data analysis interacts with the database, influencing dashboard load times and decision-making efficiency.
• The two primary database schemas in Power BI are the star and snowflake schemas.
Star Schema
• A star schema is a data modeling approach in which a central fact table is directly connected to multiple dimension tables.
• A star schema consists of a fact table and multiple dimension tables.
• Tables in a star schema are connected via one-to-many relationships. Every dimension table is on the ‘one’ side, while the fact table is on the ‘many’ side, as indicated in the table below.

In Star Schemas;
_Dimension tables _
• Represent all business entities, the things being modelled. For example, in a product, place, or people dimension table, there is a key column that serves as a unique identifier. Other columns are used for filtering and grouping data.
• Dimension tables hold data based on the defined categorical fields in the fact table.
• It does contain duplicates.
_Fact tables
_• It’s the man of the data.
• store quantitative transactional data, such as sales orders, quantities sold, and related details.
• The fact table contains dimensionally columns that relate to the dimension tables and numeric measure columns.
• Fact tables have dimension key columns that directly relate to the dimension tables. Here, the dominant columns determine the table's dimensionality. Alternatively, dimension key values determine the table's granularity.
• Facts are likely to have duplicates.

Key Concepts for Star Schemas.
Normalization;

• Splitting data into multiple related tables reduces duplication and improves data integrity.
• “One fact, one place.”
• Used to describe how data is stored to ensure that there is no immediate repetition.
• For example, in a sales table with a product key, it is considered normal because it stores only keys.
Denormalization;
• The process of combining tables to reduce joins and simplify analysis.
• “Put related data together.”
Snowflake Schema
• Snowflake refers to a data modelling approach in which a central fact table is connected to multiple dimensions, with one or more dimension tables subdivided into sub-dimension tables.
• A snowflake schema consists of a single fact table and multiple dimension tables.
• Snowflake schemas are unique because dimension tables are normalized, i.e., they are broken down into smaller sub-tables.

Key Difference between Star Schema and Snowflake Schema

Relationships
• Relationships determine how Power BI connects and interacts with tables.
• A relationship definition shows how tables are connected using key columns.
• Typically, power employs one-to-many relationships, where dimension tables are on the ‘one’ side and fact tables are on the ‘many’ side.
• Relationships are characterized by filter directions that control how data flows between tables.
• Proper relationships enhance accurate aggregations and consistent reports.
Why Good Modelling Is Critical for Performance and Accurate Reporting

Enhance query performance.
A star schema compresses data.
Fact and dimension tables minimize data duplicates and improve model efficiency.
Correct relationships enhance accurate aggregations.

DEV Community

#Data analytics #Data Science #Power Bi

Top comments (0)