DEV Community

Tom Chege
Tom Chege

Posted on

Understanding Data Modelling in Power BI: Joins, Relationships, and Schemas Explained

Introduction

Working with raw data often introduces challenges such as duplication, inconsistency, and difficulty in extracting meaningful insights.

Data modelling in Power BI addresses these challenges by structuring data into related tables, enabling efficient querying, accurate calculations, and scalable reporting.

This article explores:

  • SQL joins and their behaviour
  • Power BI relationships and how they differ from joins
  • Data modelling schemas
  • Practical implementation in Power BI

We’ll use a simple school dataset to ground these concepts.


1. Sample Dataset

We will work with three core tables:

Students (Dimension Table)

StudentID Name Age Class
101 Alice 14 Form 2
102 Brian 15 Form 3
103 Catherine 14 Form 2
104 David 16 Form 4

Test Scores (Fact Table)

ScoreID StudentID Subject Score
1 101 Math 85
2 101 English 78
3 102 Math 92
4 105 Science 88

⚠️ Note: Student 105 does not exist in the Students tableβ€”this represents a referential integrity issue.

πŸ“Š View Dataset Online

Explore the dataset directly: πŸ‘‰here!


Teachers (Dimension Table)

TeacherID Name Subject
T1 Mr. Kim Math
T2 Ms. Asha English
T3 Mr. Otieno Science

Data Roles

  • Fact Table: Test Scores (transactional/measurable data)
  • Dimension Tables: Students, Teachers (descriptive attributes)

2. What is Data Modelling?

Data modelling is the process of:

Organizing data into structured tables and defining relationships that enable efficient querying and accurate analysis.

In analytical systems like Power BI, this typically involves:

  • Separating facts from dimensions
  • Establishing keys (e.g., StudentID)
  • Designing a schema optimized for performance

3. Joins vs Relationships

Joins (Power Query Layer)

A join is a data transformation operation that combines rows from two tables based on a matching condition.

Key Characteristics:

  • Produces a new table
  • Data is physically merged
  • Can introduce duplication

Example:

Joining Students and Test Scores results in repeated student records for each score.


Relationships (Model Layer)

A relationship is a logical link between tables based on matching columns.

Key Characteristics:

  • Tables remain separate
  • Data is combined at query time
  • Enables efficient filtering and aggregation via DAX

Critical Difference

Aspect Join Relationship
Type Physical transformation Logical connection
Location Power Query Model View
Data duplication Yes No
Performance impact Higher memory usage Optimized

4. SQL Joins Explained (with Examples)

Joins determine how rows from two tables are combined based on a matching condition.

We will use:

  • Students table (left table)
  • Test Scores table (right table)

INNER JOIN

Definition

Returns only rows where a match exists in both tables.

SQL Example

SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
INNER JOIN TestScores t
ON s.StudentID = t.StudentID;
Enter fullscreen mode Exit fullscreen mode

Result

StudentID Name Subject Score
101 Alice Math 85
101 Alice English 78
102 Brian Math 92

Explanation

  • Student 103 and 104 are excluded (no scores)
  • Score with StudentID 105 is excluded (no matching student)

LEFT JOIN

Definition

Returns all rows from the left table, with matching rows from the right.

SQL Example

SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
LEFT JOIN TestScores t
ON s.StudentID = t.StudentID;
Enter fullscreen mode Exit fullscreen mode

Result

StudentID Name Subject Score
101 Alice Math 85
101 Alice English 78
102 Brian Math 92
103 Catherine NULL NULL
104 David NULL NULL

Explanation

  • All students are included
  • Missing scores appear as NULL
  • Useful for identifying missing activity

RIGHT JOIN

Definition

Returns all rows from the right table and matching rows from the left.

SQL Example

SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
RIGHT JOIN TestScores t
ON s.StudentID = t.StudentID;
Enter fullscreen mode Exit fullscreen mode

Result

StudentID Name Subject Score
101 Alice Math 85
101 Alice English 78
102 Brian Math 92
NULL NULL Science 88

Explanation

  • Score for StudentID 105 appears
  • Missing student info is NULL
  • Highlights data inconsistencies

FULL OUTER JOIN

Definition

Returns all rows from both tables, matched where possible.

SQL Example

SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
FULL OUTER JOIN TestScores t
ON s.StudentID = t.StudentID;
Enter fullscreen mode Exit fullscreen mode

Result

StudentID Name Subject Score
101 Alice Math 85
101 Alice English 78
102 Brian Math 92
103 Catherine NULL NULL
104 David NULL NULL
NULL NULL Science 88

Explanation

  • Combines LEFT and RIGHT JOIN behavior
  • Shows all matched and unmatched records

LEFT ANTI JOIN

Definition

Returns rows from the left table with no match in the right.

SQL Equivalent

SELECT s.StudentID, s.Name
FROM Students s
LEFT JOIN TestScores t
ON s.StudentID = t.StudentID
WHERE t.StudentID IS NULL;
Enter fullscreen mode Exit fullscreen mode

Result

StudentID Name
103 Catherine
104 David

Explanation

  • These students have no test scores
  • Useful for identifying inactivity

RIGHT ANTI JOIN

Definition

Returns rows from the right table with no match in the left.

SQL Equivalent

SELECT t.StudentID, t.Subject
FROM TestScores t
LEFT JOIN Students s
ON s.StudentID = t.StudentID
WHERE s.StudentID IS NULL;
Enter fullscreen mode Exit fullscreen mode

Result

StudentID Subject
105 Science

Explanation

  • This score has no valid student
  • Indicates a data quality issue

Key Insight

Each join answers a specific business question:

Join Type Business Question Example from Our Dataset
INNER What data exists in both tables? Which students have recorded test scores? β†’ Alice and Brian
LEFT What records are missing related data? Which students are in the system but have not taken any tests? β†’ Catherine and David
RIGHT What data exists on the secondary side? Are there any test scores without valid student records? β†’ Score for StudentID 105
FULL What does the complete dataset look like? What is the full picture of students and scores, including missing or unmatched data?
LEFT ANTI What records in the main table have no matches? Which students have no test scores at all? β†’ Catherine and David
RIGHT ANTI What records are invalid or inconsistent? Which test scores do not belong to any student? β†’ StudentID 105

5. Power BI Relationships

Cardinality

Defines how rows relate between tables.

Type Description
1:M One dimension record relates to many fact records
M:M Multiple matches on both sides
1:1 Unique mapping

πŸ‘‰ Best practice:
Use 1:M from Dimension β†’ Fact


Cross-Filter Direction

Defines how filters propagate between tables.

Type Behaviour
Single One-way filtering
Both Bi-directional filtering

⚠️ Overusing bi-directional filtering can introduce ambiguity.


Active vs Inactive Relationships

  • Only one relationship can be active between two tables
  • Inactive relationships can be used via DAX (USERELATIONSHIP)

6. Fact vs Dimension Tables

Fact Table

  • Contains measurable data
  • Often large
  • Example: Test Scores

Dimension Table

  • Contains descriptive attributes
  • Used for filtering and grouping
  • Example: Students

7. Schemas

Star Schema

          Students
             |
Teachers β€” Test Scores β€” Subjects
Enter fullscreen mode Exit fullscreen mode

Characteristics:

  • Central fact table
  • Denormalized dimensions
  • Optimized for performance

Snowflake Schema

Students
   |
Address
   |
Region
   |
Test Scores
Enter fullscreen mode Exit fullscreen mode

Characteristics:

  • Normalized dimensions
  • Reduced redundancy
  • Increased complexity

Flat Table (Denormalized Model)

All data stored in one table.

Trade-offs:

  • Simpler structure
  • Poor scalability
  • High duplication

8. Implementation in Power BI

Power Query (Data Preparation)

  • Clean data
  • Perform joins
  • Handle missing values

Model View (Data Modelling)

  • Create relationships
  • Define cardinality
  • Configure filters

Best Practice Workflow

  1. Load data
  2. Clean in Power Query
  3. Avoid unnecessary joins
  4. Build relationships in Model View
  5. Validate using visuals

9. Common Modelling Issues

  • Data duplication from excessive joins
  • Incorrect cardinality
  • Ambiguous filter paths
  • Missing or inconsistent keys

Conclusion and Key Takeaways

Data modelling is the foundation of effective analysis in Power BI. It defines how data is structured, interpreted, and ultimately how insights are derived.

Throughout this article, we’ve seen how:

Joins are used to reshape and prepare data by combining tables in Power Query
Relationships allow Power BI to connect tables logically, enabling efficient analysis without duplication
Schemas, particularly the star schema, provide a scalable structure for organizing fact and dimension tables
Data integrity (e.g., handling missing or invalid records like StudentID 105) directly impacts the quality of insights

Practical Takeaways

  • Use Power Query for data cleaning and controlled joins
  • Use Model View to define relationships and control how data interacts
  • Prefer 1:M relationships from dimension to fact tables
  • Default to single-direction filtering unless there is a clear need for bi-directional filtering
  • Adopt a star schema for most analytical scenarios
  • Treat missing or unmatched data as valuable signals, not just errors

Top comments (0)