Kendi

Posted on Mar 28

Power BI Data Modeling Explained Simply: Joins, Relationships, and Schemas

#analytics #beginners #data #database

Here is something nobody tells you when you start learning Power BI: the visuals are the easy part. Anyone can drag a bar chart onto a canvas. What separates a report that works from one that lies to you is what happens before you touch a single visual — the data model.

Get the model right and your numbers are accurate, your reports are fast, and your DAX measures are simple. Get it wrong and no amount of formatting or fancy visuals will fix it. You will just have a beautifully designed wrong answer.

This article covers everything you need to build models that actually work.

What is Data Modeling?

Data modeling is the process of organizing your tables and defining how they connect to each other so that analysis is accurate, efficient, and scalable.

In Power BI specifically, your data model determines three things:

Whether your calculations are correct
How fast your report runs
How easily someone can explore the data

A weak model produces incorrect aggregations, duplicate counts, and reports that take forever to load. A strong model makes all of that disappear.

Part 1: SQL Joins — Combining Tables in Power Query

Joins are how you physically combine two tables into one based on a shared column called a key. In Power BI, joins happen in Power Query during data preparation — before anything reaches your model.

Let us use a consistent example throughout. You have two tables:

Staff Table
| StaffID | StaffName |
|---------|-----------|
| S01 | James |
| S02 | Amina |
| S03 | Peter |

Training Table
| StaffID | Course |
|---------|--------|
| S01 | Excel |
| S02 | Power BI |
| S04 | SQL |

Notice: Peter (S03) has no training record. The SQL course (S04) has no matching staff member. This mismatch is exactly what each join type handles differently.

INNER JOIN — Only What Matches in Both

Returns rows that have a match in both tables. Anything without a match on either side is excluded.

Result:
| StaffID | StaffName | Course |
|---------|-----------|--------|
| S01 | James | Excel |
| S02 | Amina | Power BI |

Peter is excluded — no training record. The SQL course is excluded — no matching staff.

Use case: A clean attendance report showing only staff with confirmed training records.

LEFT JOIN — Keep Everything on the Left

Returns all rows from the left table. Rows from the right table are included only where a match exists. No match means null.

Result:
| StaffID | StaffName | Course |
|---------|-----------|--------|
| S01 | James | Excel |
| S02 | Amina | Power BI |
| S03 | Peter | null |

Peter stays — but his Course is null because no training record exists for him.

Use case: All staff members, flagging those who have not yet completed any training.

RIGHT JOIN — Keep Everything on the Right

Mirror of the LEFT JOIN. All rows from the right table are kept. Left table rows are included only where a match exists.

Result:
| StaffID | StaffName | Course |
|---------|-----------|--------|
| S01 | James | Excel |
| S02 | Amina | Power BI |
| S04 | null | SQL |

The SQL course stays — but StaffName is null because S04 does not exist in the Staff table.

Use case: All training courses offered, identifying any assigned to staff members who no longer exist in the system.

FULL OUTER JOIN — Everything from Both Tables

Returns every row from both tables. Where there is no match, nulls fill the gap on the missing side. Nothing is excluded.

Result:
| StaffID | StaffName | Course |
|---------|-----------|--------|
| S01 | James | Excel |
| S02 | Amina | Power BI |
| S03 | Peter | null |
| S04 | null | SQL |

Use case: A full audit comparing your HR system against your training system — surfacing every mismatch in both directions at once.

LEFT ANTI JOIN — Only the Unmatched Left Rows

Returns only rows from the left table that have no match in the right table. The opposite of an INNER JOIN in a sense.

Result:
| StaffID | StaffName |
|---------|-----------|
| S03 | Peter |

Only Peter — the staff member with no training record.

Use case: Finding staff members who have not attended any training. A compliance check.

RIGHT ANTI JOIN — Only the Unmatched Right Rows

Returns only rows from the right table that have no match in the left table.

Result:
| StaffID | Course |
|---------|--------|
| S04 | SQL |

Only the SQL course — assigned to a StaffID that does not exist.

Use case: Identifying orphaned records in a system — training records pointing to staff members who no longer exist.

How to Create Joins in Power Query

Go to Home tab → Transform Data to open Power Query Editor
Select the table you want as your left table
Click Home tab → Merge Queries
Select the second table from the dropdown
Click the matching column in each table to set the key
Choose your Join Kind from the dropdown
Click OK
Click the expand icon on the new merged column to select which columns to bring in

Join Summary Table

Join Type	Keeps from Left	Keeps from Right	Best For
Inner	Matching only	Matching only	Clean matched data
Left	All rows	Matching only	Enrich left, flag gaps
Right	Matching only	All rows	Enrich right, flag gaps
Full Outer	All rows	All rows	Full system audit
Left Anti	Unmatched only	Nothing	Find missing references
Right Anti	Nothing	Unmatched only	Find orphaned records

Part 2: Power BI Relationships — Joins vs Relationships

This distinction trips up almost everyone starting out.

A join physically merges two tables into one. The result is a single flat table. You use this during data preparation in Power Query.

A relationship keeps tables separate and creates a logical filter connection between them. The data stays in its own table. Power BI uses the relationship to know how filters should flow when someone interacts with a report.

In most professional Power BI models, relationships handle the core structure. Joins are reserved for specific data preparation steps where you genuinely need to flatten or enrich a table before loading it into the model.

	Joins	Relationships
Where	Power Query	Model View
Tables	Physically combined	Stay separate
Performance	Heavier	More efficient
Flexibility	Static	Dynamic
Use for	Data preparation	Analysis

How to Create Relationships in Power BI

Method 1 — Drag and drop in Model View:

Click the Model View icon on the left sidebar
Find the key column in one table
Drag it onto the matching key column in the other table
A relationship line connects the two tables

Method 2 — Manage Relationships:

Go to Modeling tab → Manage Relationships
Click New
Select both tables and their matching columns
Set cardinality and cross-filter direction
Click OK

Cardinality — What Kind of Relationship Is It?

One-to-Many (1:M) — Use This as Your Default

One row in the first table matches many rows in the second.

Example: One Department can have many Employees. The Departments table lists each department once. The Employees table has that department ID repeated across many rows.

This is the most common, most efficient, and most reliable relationship type in Power BI. Build your model around 1:M relationships wherever possible. In Model View it shows as 1 on one side and ***** on the other.

Many-to-Many (M:M) — Use With Real Caution

Many rows in both tables can match each other.

Example: Doctors and Hospitals. One doctor works at multiple hospitals. One hospital employs multiple doctors.

Power BI supports M:M natively but it introduces ambiguous filtering and can produce incorrect totals. Where possible, resolve M:M by introducing a bridge table — a third table that breaks the relationship into two clean 1:M relationships.

One-to-One (1:1) — Rare and Usually Unnecessary

Each row in one table matches exactly one row in the other.

If you have a 1:1 relationship, ask yourself honestly whether these tables should just be merged. They usually should.

Active vs Inactive Relationships

Power BI allows only one active relationship between any two tables. Active relationships are what visuals and DAX measures use by default. Additional relationships between the same tables must be inactive — shown as dashed lines in Model View.

When do you need inactive relationships?

A Date table connected to a fact table with multiple date columns is the classic case. Say your Orders table has OrderDate, ShipDate, and DeliveryDate. You can only have one active relationship to your Date table. The others are inactive.

To use an inactive relationship in a measure, activate it temporarily with USERELATIONSHIP:

Sales by Ship Date = CALCULATE(
    SUM(Orders[Revenue]),
    USERELATIONSHIP(Orders[ShipDate], Calendar[Date])
)

This measure calculates revenue filtered by ShipDate instead of the default OrderDate — without permanently changing the model.

Cross-Filter Direction — Which Way Do Filters Flow?

Single direction: Filters flow from the "one" side to the "many" side only. This is the default and the right choice for most relationships.

Both directions (bidirectional): Filters flow both ways simultaneously.

Bidirectional sounds helpful but it creates real problems — ambiguous filter paths, performance degradation, and measures that produce incorrect results in ways that are very hard to diagnose. Start with single direction always. Only consider bidirectional after testing confirms you genuinely need it and the behavior is correct.

Part 3: Fact Tables and Dimension Tables

Every professional data model is built around this distinction.

Fact Tables

A fact table stores measurable events — things that happened. Each row is one transaction or event.

Contains numbers you want to measure: revenue, quantity, duration, count
Contains foreign keys that point to dimension tables
Typically long — many rows
Typically narrow — few columns

Examples: Hospital admissions, e-commerce orders, bank transactions, flight bookings

Dimension Tables

A dimension table stores descriptive context about those events — the who, what, where, and when.

Contains descriptive attributes: names, categories, locations, dates
Contains a primary key that the fact table references
Typically short — few rows
Typically wide — many columns

Examples: Patients, Products, Branches, Calendar

The practical rule: Your slicers and filters come from dimension tables. Your aggregations and measures come from fact tables. Keep them separate.

Part 4: Schemas — The Overall Shape of Your Model

Star Schema — Start Here Every Time

A central fact table connects directly to surrounding dimension tables. One hop from any dimension to the fact. No chains.

              [Calendar]
                  |
[Customer] — [Sales Fact] — [Product]
                  |
              [Branch]

Clean, fast, easy to maintain. Power BI's calculation engine is specifically optimized for this structure. Your DAX measures will be simpler and your reports will be faster on a star schema than any other structure. This is the recommended starting point for almost every Power BI model.

Snowflake Schema — Normalized but Complex

Dimension tables are broken into sub-dimensions connected in chains.

[Region] ← [Branch] — [Sales Fact] — [Product] → [Category] → [Department]

Instead of a single Branch dimension with Region included as a column, Region becomes its own separate table connected to Branch.

This reduces data redundancy — useful in enterprise data warehouses where storage and integrity matter. In Power BI reporting however, the additional complexity adds joins, slows queries, and makes the model harder to navigate. If your source data arrives in a snowflake structure, consider flattening dimension chains in Power Query before loading into the model.

Flat Table (Denormalized) — Simple but Limited

Everything — facts and dimensions — in a single table. No relationships needed.

OrderDate	CustomerName	City	ProductName	Category	Revenue
01/03/2026	James	Nairobi	Laptop	Electronics	85000
02/03/2026	Amina	Mombasa	Phone	Electronics	42000

Fast to build, easy to understand. But "Electronics" is repeated in every electronics row. Change a category name and you are updating thousands of cells. Performance degrades quickly as row count grows.

Use flat tables for: Quick one-off analysis, very small datasets, early prototyping before building a proper model. Avoid them for anything that will be maintained, updated, or scaled.

Schema Comparison

Schema	Structure	Performance	Complexity	Best For
Star	Fact + direct dimensions	Excellent	Low	Most Power BI reporting
Snowflake	Fact + chained dimensions	Good	Medium	Enterprise warehouses
Flat	Single table	Poor at scale	Very low	Small, simple, one-off

Part 5: Role-Playing Dimensions

A role-playing dimension is one dimension table used multiple times in the same model, each time in a different context.

The Date table is the most common example. A Hospital fact table might have AdmissionDate, DischargeDate, and SurgeryDate — all referencing the same Calendar dimension but each representing a different point in time.

Solution 1 — One active, rest inactive:
Create one active relationship between Calendar and AdmissionDate. Create inactive relationships to DischargeDate and SurgeryDate. Use USERELATIONSHIP in DAX when you need the inactive ones.

Discharges This Month = CALCULATE(
    COUNT(Admissions[PatientID]),
    USERELATIONSHIP(Admissions[DischargeDate], Calendar[Date])
)

Solution 2 — Duplicate the dimension table:
Create three separate Calendar tables — AdmissionCalendar, DischargeCalendar, SurgeryCalendar — each with its own active relationship. More relationships to maintain but no inactive relationship complexity in DAX.

For most scenarios, Solution 1 is cleaner.

Part 6: Common Modeling Issues

Many-to-Many without a bridge table
Connecting two fact tables directly produces unreliable totals. Fix by introducing a shared dimension table that both relate to through 1:M relationships.

Bidirectional filtering everywhere
Slows the model and creates ambiguous results. Default to single direction. Only use Both direction after deliberate testing.

Circular relationships
Table A → Table B → Table C → Table A creates a loop Power BI cannot resolve. Fix by identifying and removing the redundant relationship.

No dedicated Date table
Using date columns from fact tables directly breaks time intelligence functions. Always create a continuous Date dimension table, mark it as a Date table in Power BI (right-click table in Model View → Mark as Date Table), and use it as the single source for all date filtering.

Loading unnecessary columns
Every column you load into the model consumes memory. In Power Query, remove columns you will not use before loading. Keep the model lean.

The Recommended Workflow

Step 1 — Load your data into Power Query

Step 2 — Clean and prepare using joins and transformations in Power Query

Step 3 — Build the model in Model View using 1:M relationships in a star schema

Step 4 — Create a Date table and mark it appropriately

Step 5 — Write DAX measures on top of the clean model

Following this sequence means your measures are built on a solid foundation. Skipping to DAX before the model is right is the most common reason Power BI reports produce numbers nobody trusts.

Final Thought

The most important insight in data modeling is also the simplest one: keep facts and dimensions separate, connect them with 1:M relationships, and structure them as a star.

Everything else in this article — the join types, cardinality options, schema variations, role-playing dimensions — is either building on that foundation or explaining what happens when you deviate from it.

Master the star schema with clean 1:M relationships first. You will handle 90% of real-world Power BI modeling scenarios with that alone.

Part of my data science learning journey.