Schemas and Data modelling in Power BI.

#analytics #architecture #data #dataengineering

In Power BI, schemas and data modelling are about how you structure tables and define relationships so your data is accurate, fast, and easy to analyze.

A schema describes how tables are organized and related. It is framework that defines how data is structured, organized, and related within a system.

Data modelling is the process of connecting multiple data sources, defining relationships between them, and creating calculations to transform raw data into a structured model.

Dimensional schemas

Star schema.
Snowflake schema.

Star schema.

The Star Schema is the industry standard data modelling approach for Power BI. It gets its name from its visual layout: a central fact table surrounded by dimension tables, resembling a star.

Components of a Star schema

Fact table stores quantitative or measurable data.
Dimension table stores descriptive data e.g. date, product name.
Relationships describe how fact tables and dimension tables are connected and how filters flow between them. Features of Dimension Tables.

Fields.
Primary Keys
Every record in a dimension table is uniquely identified by its main key.

Attributes
These are descriptive fields that provide context.
Features of Fact Tables

Foreign Keys
The keys connect to dimension tables and offer context for the measures.

Measures
The provided figures are numerical data that quantify business performance indicators, such as sales income and units sold.

Relationships in star schema

One-to-Many - one row from a table is linked to many rows in another table.
One-to-One - Each row in one table matches one row in another table, and vice versa.
Many-to-One - many rows in a table relate to one row in another table.
Many-to-Many - multiple rows from one table match multiple rows in another table.

Normalization and Denormalization.

Normalization is the term used to describe data that's stored in a way that reduces redundancy.
Denormalization is the process of flattening related dimension data into single wide tables to improve analytical performance and simplify reporting.

Snowflake schema.
A snowflake schema is a variation of the star schema where dimension tables are normalized into multiple related tables instead of being kept as one wide table.

Key Features of snowflake schema

Normalized dimension tables - dimension tables are normalized into third normal form (3NF) or higher splitting them into smaller and granular sub tables.
Hierarchical structure - dimensions are organized into hierarchical layers (e.g. product - category -manufacturer) which naturally represents complex business relationships.
Reduced data redundancy - by eliminating repeated values in dimension tables, this schema significantly reduces the risk of data inconsistencies.

Components of snowflake schema.

Fact table is the core of the schema storing the primary quantitative data or measures of business processes. It contains multiple foreign keys that link directly to the primary dimension tables. It captures events at their most atomic level, typically resulting in the largest table in the warehouse.
Normalized dimension tables that surround the fact table and provide descriptive context for the recorded facts. Unlike the star schema these tables are normalized meaning repeated descriptive values are moved into separate tables reducing redundancy.
Lookup function that acts as the first level of description for the fact table.
Sub dimension tables are formed by further splitting or snowflaking the main dimension tables.
Hierarchical layers represent multiple levels of a hierarchy. e.g. a product dimension may link to a category table which then links to a manufacturer table.
Maintenance enhance data integrity because updates to a category name that only need to happen in one record rather than multiple records.

Key fields in Snowflake schema.

Primary keys are unique identifiers in dimension and sub dimension tables e.g. ProductID.
Foreign Keys are fields in the fact table or dimension tables that reference a primary key in a related table to form a join.
Surrogate keys are system generated integers used as keys instead of natural business keys to improve performance.

Importance of good modelling.

Proper modelling prevents logical errors in relationships that can lead to double counting or missing data.
Clear naming conventions and hidden technical fields make the model easier for non technical users to navigate.
Simple relationships between a central fact table and surrounding dimension tables reduce ambiguity.
Centralized dimension tables ensure every visual uses the same definitions preventing different charts from showing conflicting totals for the same metrics.
A modular model allows you to add new data sources or dimensions without rebuilding the entire system.

DEV Community

Schemas and Data modelling in Power BI.

Top comments (0)