David

Posted on Feb 2

Schemas and Data Modelling in Power B.I

#analytics #data #dataengineering #microsoft

Data modeling

is the architectural backbone of every successful Power BI solution. Whether building an executive dashboard or a complex analytics engine, the structural integrity of the data determines the performance, flexibility, and reporting accuracy

Schemas fall into two primary categories: Star and Snowflake, following general practices of dimensional modeling used widely in analytics and BI. Each defines a different way to model facts and dimensions, balancing simplicity, performance, storage efficiency, and modeling flexibility.

The Star Schema is usually the recommended approach because it, prioritizes read performance over write efficiency.

Star schema

The star schema involves structuring your data into multiple tables such that you have:
Fact tables contain quantitative attributes that you want to measure.
Dimension tables that contain descriptive attributes that group and categorize data from fact tables.

Fact tables

A fact table is the main table that records events (transactions) that happened, so duplicates are normal because the same type of event can occur many times. It’s also the table that changes/updates often compared to dimensions.
eg FactTable -Visits (VisitID, PatientID, DoctorID, FacilityID, DateID, VisitCost, LengthOfStay)

Dimension table

A dimension table is a table that holds information about categorical fields in the fact table and you use to slice and group the facts.
It’s typically smaller than the fact table.
eg Dimension - Patient, Doctor, Facility, Date

Relationships and cardinality

In Power BI, relationships are typically built by matching one column from one table to one column in another table. That means linking each dimension key to the corresponding key in the visits fact table, e.g., Patient[PatientID] → FactVisits[PatientID], Doctor[DoctorID] → FactVisits[DoctorID], Facility[FacilityID] → FactVisits[FacilityID], and Date[DateID] → FactVisits[DateID].

If two tables need to match on more than one field (for example, if you had a scenario where uniqueness is defined by both PatientID and DateID together), you would create a combined key such as PatientDateKey = PatientID & "-" & DateID in both tables and then relate using that single key column.

Cardinality describes how many rows in one table can match rows in another: one-to-one, one-to-many, many-to-one, or many-to-many.If a many-to-many relationship is unavoidable, use a Bridge Table (also known as an Associative Entity) to convert the logic into two one-to-many relationships.

Snowflake Schema

A snowflake schema is a data model where dimension tables are normalized into multiple related tables.

Instead of keeping everything in one large dimension table (like a star schema), you split descriptive data into smaller tables to reduce duplication.
It’s called “snowflake” because dimensions branch out into sub-dimensions, creating a layered structure.

Structure

Fact Table: Stores measurable event/transaction data (numeric values) at a specific grain.
Dimension Tables: Split into multiple connected tables to represent hierarchies.

Example (healthcare):

Patient → Ward → Department → Hospital
Doctor → Specialty → Department

Fact Table: Visits

VisitID
PatientID
DoctorID
WardID
DateID
VisitCost, LengthOfStay

Dimension Tables (normalized)

Patient (PatientID, FullName, WardID)
Ward (WardID, WardName, DepartmentID)
Department (DepartmentID, DepartmentName, HospitalID)
Hospital (HospitalID, HospitalName)

Relationships

Still mainly one-to-many relationships.
The difference is that dimensions connect through multiple levels (Patient → Ward → Department → Hospital), so filter paths are longer than in a star schema.

Good data modelling is critical because it directly dictates the efficiency, accuracy, and usability of your reports. A well-structured model with clear relationships and reduced redundancy improves query performance, allowing dashboards to load faster even as data scales. Furthermore, by enforcing data integrity constraints and defining logical connections between entities, a strong model ensures that calculations, filters, and aggregations remain accurate across the entire system, preventing errors that could lead to misleading insights.

DEV Community