Introduction
Working with raw data often introduces challenges such as duplication, inconsistency, and difficulty in extracting meaningful insights.
Data modelling in Power BI addresses these challenges by structuring data into related tables, enabling efficient querying, accurate calculations, and scalable reporting.
This article explores:
- SQL joins and their behaviour
- Power BI relationships and how they differ from joins
- Data modelling schemas
- Practical implementation in Power BI
Weβll use a simple school dataset to ground these concepts.
1. Sample Dataset
We will work with three core tables:
Students (Dimension Table)
| StudentID | Name | Age | Class |
|---|---|---|---|
| 101 | Alice | 14 | Form 2 |
| 102 | Brian | 15 | Form 3 |
| 103 | Catherine | 14 | Form 2 |
| 104 | David | 16 | Form 4 |
Test Scores (Fact Table)
| ScoreID | StudentID | Subject | Score |
|---|---|---|---|
| 1 | 101 | Math | 85 |
| 2 | 101 | English | 78 |
| 3 | 102 | Math | 92 |
| 4 | 105 | Science | 88 |
β οΈ Note: Student 105 does not exist in the Students tableβthis represents a referential integrity issue.
π View Dataset Online
Explore the dataset directly: πhere!
Teachers (Dimension Table)
| TeacherID | Name | Subject |
|---|---|---|
| T1 | Mr. Kim | Math |
| T2 | Ms. Asha | English |
| T3 | Mr. Otieno | Science |
Data Roles
- Fact Table: Test Scores (transactional/measurable data)
- Dimension Tables: Students, Teachers (descriptive attributes)
2. What is Data Modelling?
Data modelling is the process of:
Organizing data into structured tables and defining relationships that enable efficient querying and accurate analysis.
In analytical systems like Power BI, this typically involves:
- Separating facts from dimensions
- Establishing keys (e.g.,
StudentID) - Designing a schema optimized for performance
3. Joins vs Relationships
Joins (Power Query Layer)
A join is a data transformation operation that combines rows from two tables based on a matching condition.
Key Characteristics:
- Produces a new table
- Data is physically merged
- Can introduce duplication
Example:
Joining Students and Test Scores results in repeated student records for each score.
Relationships (Model Layer)
A relationship is a logical link between tables based on matching columns.
Key Characteristics:
- Tables remain separate
- Data is combined at query time
- Enables efficient filtering and aggregation via DAX
Critical Difference
| Aspect | Join | Relationship |
|---|---|---|
| Type | Physical transformation | Logical connection |
| Location | Power Query | Model View |
| Data duplication | Yes | No |
| Performance impact | Higher memory usage | Optimized |
4. SQL Joins Explained (with Examples)
Joins determine how rows from two tables are combined based on a matching condition.
We will use:
- Students table (left table)
- Test Scores table (right table)
INNER JOIN
Definition
Returns only rows where a match exists in both tables.
SQL Example
SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
INNER JOIN TestScores t
ON s.StudentID = t.StudentID;
Result
| StudentID | Name | Subject | Score |
|---|---|---|---|
| 101 | Alice | Math | 85 |
| 101 | Alice | English | 78 |
| 102 | Brian | Math | 92 |
Explanation
- Student 103 and 104 are excluded (no scores)
- Score with StudentID 105 is excluded (no matching student)
LEFT JOIN
Definition
Returns all rows from the left table, with matching rows from the right.
SQL Example
SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
LEFT JOIN TestScores t
ON s.StudentID = t.StudentID;
Result
| StudentID | Name | Subject | Score |
|---|---|---|---|
| 101 | Alice | Math | 85 |
| 101 | Alice | English | 78 |
| 102 | Brian | Math | 92 |
| 103 | Catherine | NULL | NULL |
| 104 | David | NULL | NULL |
Explanation
- All students are included
- Missing scores appear as NULL
- Useful for identifying missing activity
RIGHT JOIN
Definition
Returns all rows from the right table and matching rows from the left.
SQL Example
SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
RIGHT JOIN TestScores t
ON s.StudentID = t.StudentID;
Result
| StudentID | Name | Subject | Score |
|---|---|---|---|
| 101 | Alice | Math | 85 |
| 101 | Alice | English | 78 |
| 102 | Brian | Math | 92 |
| NULL | NULL | Science | 88 |
Explanation
- Score for StudentID 105 appears
- Missing student info is NULL
- Highlights data inconsistencies
FULL OUTER JOIN
Definition
Returns all rows from both tables, matched where possible.
SQL Example
SELECT s.StudentID, s.Name, t.Subject, t.Score
FROM Students s
FULL OUTER JOIN TestScores t
ON s.StudentID = t.StudentID;
Result
| StudentID | Name | Subject | Score |
|---|---|---|---|
| 101 | Alice | Math | 85 |
| 101 | Alice | English | 78 |
| 102 | Brian | Math | 92 |
| 103 | Catherine | NULL | NULL |
| 104 | David | NULL | NULL |
| NULL | NULL | Science | 88 |
Explanation
- Combines LEFT and RIGHT JOIN behavior
- Shows all matched and unmatched records
LEFT ANTI JOIN
Definition
Returns rows from the left table with no match in the right.
SQL Equivalent
SELECT s.StudentID, s.Name
FROM Students s
LEFT JOIN TestScores t
ON s.StudentID = t.StudentID
WHERE t.StudentID IS NULL;
Result
| StudentID | Name |
|---|---|
| 103 | Catherine |
| 104 | David |
Explanation
- These students have no test scores
- Useful for identifying inactivity
RIGHT ANTI JOIN
Definition
Returns rows from the right table with no match in the left.
SQL Equivalent
SELECT t.StudentID, t.Subject
FROM TestScores t
LEFT JOIN Students s
ON s.StudentID = t.StudentID
WHERE s.StudentID IS NULL;
Result
| StudentID | Subject |
|---|---|
| 105 | Science |
Explanation
- This score has no valid student
- Indicates a data quality issue
Key Insight
Each join answers a specific business question:
| Join Type | Business Question | Example from Our Dataset |
|---|---|---|
| INNER | What data exists in both tables? | Which students have recorded test scores? β Alice and Brian |
| LEFT | What records are missing related data? | Which students are in the system but have not taken any tests? β Catherine and David |
| RIGHT | What data exists on the secondary side? | Are there any test scores without valid student records? β Score for StudentID 105 |
| FULL | What does the complete dataset look like? | What is the full picture of students and scores, including missing or unmatched data? |
| LEFT ANTI | What records in the main table have no matches? | Which students have no test scores at all? β Catherine and David |
| RIGHT ANTI | What records are invalid or inconsistent? | Which test scores do not belong to any student? β StudentID 105 |
5. Power BI Relationships
Cardinality
Defines how rows relate between tables.
| Type | Description |
|---|---|
| 1:M | One dimension record relates to many fact records |
| M:M | Multiple matches on both sides |
| 1:1 | Unique mapping |
π Best practice:
Use 1:M from Dimension β Fact
Cross-Filter Direction
Defines how filters propagate between tables.
| Type | Behaviour |
|---|---|
| Single | One-way filtering |
| Both | Bi-directional filtering |
β οΈ Overusing bi-directional filtering can introduce ambiguity.
Active vs Inactive Relationships
- Only one relationship can be active between two tables
- Inactive relationships can be used via DAX (
USERELATIONSHIP)
6. Fact vs Dimension Tables
Fact Table
- Contains measurable data
- Often large
- Example: Test Scores
Dimension Table
- Contains descriptive attributes
- Used for filtering and grouping
- Example: Students
7. Schemas
Star Schema
Students
|
Teachers β Test Scores β Subjects
Characteristics:
- Central fact table
- Denormalized dimensions
- Optimized for performance
Snowflake Schema
Students
|
Address
|
Region
|
Test Scores
Characteristics:
- Normalized dimensions
- Reduced redundancy
- Increased complexity
Flat Table (Denormalized Model)
All data stored in one table.
Trade-offs:
- Simpler structure
- Poor scalability
- High duplication
8. Implementation in Power BI
Power Query (Data Preparation)
- Clean data
- Perform joins
- Handle missing values
Model View (Data Modelling)
- Create relationships
- Define cardinality
- Configure filters
Best Practice Workflow
- Load data
- Clean in Power Query
- Avoid unnecessary joins
- Build relationships in Model View
- Validate using visuals
9. Common Modelling Issues
- Data duplication from excessive joins
- Incorrect cardinality
- Ambiguous filter paths
- Missing or inconsistent keys
Conclusion and Key Takeaways
Data modelling is the foundation of effective analysis in Power BI. It defines how data is structured, interpreted, and ultimately how insights are derived.
Throughout this article, weβve seen how:
Joins are used to reshape and prepare data by combining tables in Power Query
Relationships allow Power BI to connect tables logically, enabling efficient analysis without duplication
Schemas, particularly the star schema, provide a scalable structure for organizing fact and dimension tables
Data integrity (e.g., handling missing or invalid records like StudentID 105) directly impacts the quality of insights
Practical Takeaways
- Use Power Query for data cleaning and controlled joins
- Use Model View to define relationships and control how data interacts
- Prefer 1:M relationships from dimension to fact tables
- Default to single-direction filtering unless there is a clear need for bi-directional filtering
- Adopt a star schema for most analytical scenarios
- Treat missing or unmatched data as valuable signals, not just errors
Top comments (0)