Tom Chege

Posted on Mar 29 • Edited on Mar 31

Understanding Data Modelling in Power BI: Joins, Relationships, and Schemas Explained

#data #powerfuldevs #newbie

Introduction

Working with raw data often introduces challenges such as duplication, inconsistency, and difficulty in extracting meaningful insights.

Data modelling in Power BI addresses these challenges by structuring data into related tables, enabling efficient querying, accurate calculations, and scalable reporting.

This article explores:

SQL joins and their behaviour
Power BI relationships and how they differ from joins
Data modelling schemas
Practical implementation in Power BI

We’ll use a simple school dataset to ground these concepts.

1. Sample Dataset

We will work with three core tables:

Students (Dimension Table)

StudentID	Name	Age	Class
101	Alice	14	Form 2
102	Brian	15	Form 3
103	Catherine	14	Form 2
104	David	16	Form 4

Test Scores (Fact Table)

ScoreID	StudentID	Subject	Score
1	101	Math	85
2	101	English	78
3	102	Math	92
4	105	Science	88

⚠️ Note: Student 105 does not exist in the Students table—this represents a referential integrity issue.

📊 View Dataset Online

Explore the dataset directly: 👉here!

Teachers (Dimension Table)

TeacherID	Name	Subject
T1	Mr. Kim	Math
T2	Ms. Asha	English
T3	Mr. Otieno	Science

Data Roles

Fact Table: Test Scores (transactional/measurable data)
Dimension Tables: Students, Teachers (descriptive attributes)

2. What is Data Modelling?

Data modelling is the process of:

Organizing data into structured tables and defining relationships that enable efficient querying and accurate analysis.

In analytical systems like Power BI, this typically involves:

Separating facts from dimensions
Establishing keys (e.g., StudentID)
Designing a schema optimized for performance

3. Joins vs Relationships

Joins (Power Query Layer)

A join is a data transformation operation that combines rows from two or more tables based on a matching condition.

Key Characteristics:

Produces a new table
Data is physically merged
Can introduce duplication

Example:

Joining Students and Test Scores results in repeated student records for each score.

Relationships (Model Layer)

A relationship is a logical link between tables based on matching columns.

Key Characteristics:

Tables remain separate
Data is combined at query time
Enables efficient filtering and aggregation via DAX

Critical Difference

Aspect	Join	Relationship
Type	Physical transformation	Logical connection
Location	Power Query	Model View
Data duplication	Yes	No
Performance impact	Higher memory usage	Optimized

4. SQL Joins Explained (with Examples)

Joins determine how rows from two tables are combined based on a matching condition.

We will use:

Students table (left table)
Test Scores table (right table)

INNER JOIN

Definition

Returns only rows where a match exists in both tables.

SQL Example

SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
INNER JOIN TestScores t
ON s.StudentID = t.StudentID;

Result

StudentID	Name	Subject	Score
101	Alice	Math	85
101	Alice	English	78
102	Brian	Math	92

Explanation

Student 103 and 104 are excluded (no scores)
Score with StudentID 105 is excluded (no matching student)

LEFT JOIN

Definition

Returns all rows from the left table, with matching rows from the right.

SQL Example

SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
LEFT JOIN TestScores t
ON s.StudentID = t.StudentID;

Result

StudentID	Name	Subject	Score
101	Alice	Math	85
101	Alice	English	78
102	Brian	Math	92
103	Catherine	NULL	NULL
104	David	NULL	NULL

Explanation

All students are included
Missing scores appear as NULL
Useful for identifying missing activity

RIGHT JOIN

Definition

Returns all rows from the right table and matching rows from the left.

SQL Example

SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
RIGHT JOIN TestScores t
ON s.StudentID = t.StudentID;

Result

StudentID	Name	Subject	Score
101	Alice	Math	85
101	Alice	English	78
102	Brian	Math	92
NULL	NULL	Science	88

Explanation

Score for StudentID 105 appears
Missing student info is NULL
Highlights data inconsistencies

FULL OUTER JOIN

Definition

Returns all rows from both tables, matched where possible.

SQL Example

SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
FULL OUTER JOIN TestScores t
ON s.StudentID = t.StudentID;

Result

StudentID	Name	Subject	Score
101	Alice	Math	85
101	Alice	English	78
102	Brian	Math	92
103	Catherine	NULL	NULL
104	David	NULL	NULL
NULL	NULL	Science	88

Explanation

Combines LEFT and RIGHT JOIN behavior
Shows all matched and unmatched records

LEFT ANTI JOIN

Definition

Returns rows from the left table with no match in the right.

SQL Equivalent

SELECT s.StudentID, s.Name
FROM Students s
LEFT JOIN TestScores t
ON s.StudentID = t.StudentID
WHERE t.StudentID IS NULL;

Result

StudentID	Name
103	Catherine
104	David

Explanation

These students have no test scores
Useful for identifying inactivity

RIGHT ANTI JOIN

Definition

Returns rows from the right table with no match in the left.

SQL Equivalent

SELECT t.StudentID, t.Subject
FROM TestScores t
LEFT JOIN Students s
ON s.StudentID = t.StudentID
WHERE s.StudentID IS NULL;

Result

StudentID	Subject
105	Science

Explanation

This score has no valid student
Indicates a data quality issue

Key Insight

Each join answers a specific business question:

Join Type	Business Question	Example from Our Dataset
INNER	What data exists in both tables?	Which students have recorded test scores? → Alice and Brian
LEFT	What records are missing related data?	Which students are in the system but have not taken any tests? → Catherine and David
RIGHT	What data exists on the secondary side?	Are there any test scores without valid student records? → Score for StudentID 105
FULL	What does the complete dataset look like?	What is the full picture of students and scores, including missing or unmatched data?
LEFT ANTI	What records in the main table have no matches?	Which students have no test scores at all? → Catherine and David
RIGHT ANTI	What records are invalid or inconsistent?	Which test scores do not belong to any student? → StudentID 105

5. Power BI Relationships

Cardinality

Defines how rows relate between tables.

Type	Description
1:M	One dimension record relates to many fact records
M:M	Multiple matches on both sides
1:1	Unique mapping

👉 Best practice:
Use 1:M from Dimension → Fact

Cross-Filter Direction

Defines how filters propagate between tables.

Type	Behaviour
Single	One-way filtering
Both	Bi-directional filtering

⚠️ Overusing bi-directional filtering can introduce ambiguity.

Active vs Inactive Relationships

Only one relationship can be active between two tables
Inactive relationships can be used via DAX (USERELATIONSHIP)

6. Fact vs Dimension Tables

Fact Table

Contains measurable data
Often large
Example: Test Scores

Dimension Table

Contains descriptive attributes
Used for filtering and grouping
Example: Students

7. Schemas

Star Schema

          Students
             |
Teachers — Test Scores — Subjects

Characteristics:

Central fact table
Denormalized dimensions
Optimized for performance

Snowflake Schema

Students
   |
Address
   |
Region
   |
Test Scores

Characteristics:

Normalized dimensions
Reduced redundancy
Increased complexity

Flat Table (Denormalized Model)

All data stored in one table.

Trade-offs:

Simpler structure
Poor scalability
High duplication

8. Implementation in Power BI

Power Query (Data Preparation)

Clean data
Perform joins
Handle missing values

Model View (Data Modelling)

Create relationships
Define cardinality
Configure filters

Best Practice Workflow

Load data
Clean in Power Query
Avoid unnecessary joins
Build relationships in Model View
Validate using visuals

9. Common Modelling Issues

Data duplication from excessive joins
Incorrect cardinality
Ambiguous filter paths
Missing or inconsistent keys

Conclusion and Key Takeaways

Data modelling is the foundation of effective analysis in Power BI. It defines how data is structured, interpreted, and ultimately how insights are derived.

Throughout this article, we’ve seen how:

Joins are used to reshape and prepare data by combining tables in Power Query
Relationships allow Power BI to connect tables logically, enabling efficient analysis without duplication
Schemas, particularly the star schema, provide a scalable structure for organizing fact and dimension tables
Data integrity (e.g., handling missing or invalid records like StudentID 105) directly impacts the quality of insights

Practical Takeaways

Use Power Query for data cleaning and controlled joins
Use Model View to define relationships and control how data interacts
Prefer 1:M relationships from dimension to fact tables
Default to single-direction filtering unless there is a clear need for bi-directional filtering
Adopt a star schema for most analytical scenarios
Treat missing or unmatched data as valuable signals, not just errors