Martin Kamau

Posted on Mar 29

Understanding Data Modelling in Power BI: Joins, Relationships, and Schemas Explained

#powerbi #beginners #analysis #data

I'm still fairly new to Power BI, so when the topic of data modelling came up, I won't lie — it sounded way more complicated than it turned out to be. Once I started working with an actual dataset, things began to click. For this article, I'm using a dataset of 500 Kenyan farmers tracking crops like Maize, Tea, Coffee, and Beans across counties like Meru, Kiambu, Kisumu, and Nakuru. Having something real to point at made all the difference.
Here's what I learned.

What is Data Modelling?

Data modelling is the process of connecting different data sources, defining their relationships, and organizing them so that your Power BI visuals are accurate and make sense.
Think of it as the plumbing behind your dashboard. The charts are the taps — but if the pipes are badly set up, everything that comes out will be wrong.
In our Kenya crops dataset, all the information lives in one flat file — farmer names, counties, crop types, profits, everything on a single row. Data modelling is how we organize that into a structure Power BI can actually work with.

1. Fact Tables vs Dimension Tables

These two table types are the foundation of everything else.
Fact Tables hold the numbers you want to measure and analyze.
From our dataset, that's things like:

Yield (Kg)
Revenue (KES)
Cost of Production (KES)
Profit (KES)

Dimension Tables hold the descriptive context — the who, what, and where.
From our dataset, those would be:

Counties (Kiambu, Meru, Nakuru...)
Crop Types (Maize, Tea, Coffee...)
Seasons (Long Rains, Short Rains, Dry Season)
Soil Types (Clay, Loam, Sandy...)

Fact TableDimension TableContainsNumbers (Revenue, Yield, Profit)Labels (County, Crop, Season)Row countMany rowsFewer, unique rowsRelationship sideMany (*)One (1)

2. SQL Joins in Power Query

Joins combine rows from two separate tables based on a shared column. In Power BI, you do this inside Power Query → Home → Merge Queries.
Imagine our Kenya crops data was split into two tables:
Farmers Table
FarmerIDNameCounty1Farmer 1Kiambu2Farmer 2Meru3Farmer 3Nakuru
Harvests Table
FarmerIDCrop TypeProfit (KES)1Potatoes2,501,0892Coffee4,421,83099Maize90,000
Notice: Farmer 3 has no harvest record. FarmerID 99 has no matching farmer. Each join type handles this situation differently.

Inner Join — Only Matching Rows
Returns rows that exist in both tables. Farmer 3 disappears. FarmerID 99 disappears. Only Farmers 1 and 2 survive.

Use it when: You only want complete, fully matched records.

Power Query: Merge Queries → Join Kind: Inner

Left Join — Keep All of the Left Table
Returns everything from the left table, plus whatever matches from the right. Farmer 3 stays but with blank harvest columns.

Use it when: Your left table is your master list and you don't want to lose any records — even ones with missing data on the other side.

Power Query: Merge Queries → Join Kind: Left Outer

Right Join — Keep All of the Right Table
The mirror of Left Join. Everything from the right table comes through. FarmerID 99 now shows up.

Use it when: Your right table is the primary one. Most analysts just swap the tables and do a Left Join instead — same result, easier to read.

Power Query: Merge Queries → Join Kind: Right Outer

Full Outer Join — Keep Everything
Returns every row from both tables. Farmer 3 shows up. FarmerID 99 shows up. Blanks fill where there's no match.

Use it when: You need the full picture — which farmers have no harvests, and which harvests have no registered farmer.

Power Query: Merge Queries → Join Kind: Full Outer

Left Anti Join — Left Rows with No Match
Returns only rows from the left table that have no match on the right. Here, that's just Farmer 3.

Use it when: You want to find gaps — farmers in your registry with zero harvest records.

Power Query: Merge Queries → Join Kind: Left Anti

Right Anti Join — Right Rows with No Match
Returns only rows from the right table with no match on the left. Here, that's FarmerID 99.

Use it when: You want to catch "ghost" records — harvest entries that can't be linked to any registered farmer.

Power Query: Merge Queries → Join Kind: Right Anti

All 6 Joins at a Glance
Join TypeWhat You GetKenya Crops ExampleInnerMatching rows onlyFarmers who have harvest recordsLeftAll left + right matchesAll farmers, harvest data where it existsRightAll right + left matchesAll harvests, farmer data where it existsFull OuterEverything from bothFull audit — nothing excludedLeft AntiLeft with no right matchFarmers with no harvest on recordRight AntiRight with no left matchHarvests with no registered farmer

3. Power BI Relationships

This is where a lot of beginners (myself included) get confused — because joins and relationships look similar but are completely different things.

A join (Power Query) physically merges two tables into one wider table
A relationship (Model View) keeps tables separate but creates a link between them so filters can travel across

Think of a join as stapling two documents together. A relationship is more like a reference note saying "this file connects to that one over there."
You create relationships in Model View — the third icon on the left sidebar in Power BI Desktop.

Cardinality — How Many Rows Match?
One-to-Many (1:M) is the most common type. One county maps to many harvest records. One crop type maps to many farmers growing it.
County (1) ────────── Harvests (*)
"Meru" → Farmer 2 — Coffee
→ Farmer 10 — Cassava
→ Farmer 13 — Potatoes
Many-to-Many (M:M) means multiple rows on both sides match multiple rows on the other. This can cause double-counting errors, so it's best avoided where possible.
One-to-One (1:1) means each row on one side matches exactly one row on the other. Rare in practice.

💡 Nearly all your relationships should be 1:M. If you're creating something else, it's worth stopping to double check why.

Cross-Filter Direction
This controls which direction a filter travels through a relationship.
Single (recommended): Filter flows from the dimension table (1 side) to the fact table (* side) only.
County slicer → "Nakuru" selected
↓
Harvests table filters to only Nakuru records ✓
Both: Filters flow in both directions. Can create unexpected results in more complex models. Stick with Single unless you have a specific reason to change it.

Active vs Inactive Relationships
Power BI only allows one active relationship between two tables at a time.
Our Harvests table has both a Planting Date and a Harvest Date. If you connect a Date dimension to both columns, only one can be active. The other becomes inactive — shown as a dotted line in Model View. Inactive relationships can be activated in specific calculations using a DAX function called USERELATIONSHIP().

How to Create a Relationship
Drag and drop:

Open Model View (left sidebar)
Click and drag "County" from your Counties table
Drop it onto "County" in the Harvests table
A line appears — double-click it to check the settings Manage Relationships dialog:
Model View → Home → Manage Relationships
Click New
Select both tables and their matching columns
Choose Cardinality and Cross-filter direction
Click OK

4. Schema Types

A schema is just how your tables are arranged in the model. There are three main types.

Star Schema ⭐ — The One You Want
One fact table in the center. Dimension tables surrounding it, each connected directly to it.
[Date]
|
[Counties] — [Harvests] — [Crops]
|
[Farmers]
This is Power BI's preferred setup. It's fast, straightforward to maintain, and makes writing calculations much easier.
Use it: Almost always. When in doubt, build a star schema.

Snowflake Schema ❄️ — The Complicated Version
Like a star, but dimension tables are broken into smaller sub-tables.
For example, instead of one Crops table with Crop Type and Crop Variety columns, you'd have a separate Variety table that links to the Crops table, which then links to Harvests.
[Crop Variety] → [Crop Type] → [Harvests]
This saves storage space but makes the model harder to work with.
Use it: Only if your source data is already structured this way and you can't easily reshape it. Otherwise, flatten it back into a star using Power Query.

Flat Table (DLAT) — Everything in One Sheet
This is exactly what our Kenya_Crops_Cleaned_Final.csv is — one table, 20 columns, 500 rows, all in a single sheet. No joins or relationships needed.
Simple to load and easy to understand, but "Meru" repeats in dozens of rows, "Tea" repeats hundreds of times. At large scale, this gets slow and hard to update.
Use it: For small datasets or quick work. Not the best long-term approach.

Schema Comparison
SchemaBest ForPower BI PerformanceStarMost analytical workExcellentSnowflakeAlready-normalized data sourcesGood but more complexFlat TableSmall data, quick reportsSlow at scale

5. A Few More Things Worth Knowing

Role-Playing Dimensions
Our Harvests table has two date columns: Planting Date and Harvest Date. Ideally you'd connect a Date dimension to both — but Power BI only allows one active relationship at a time.
The easiest fix for beginners: create two separate Date tables in Power Query using Reference — one called Date_Planting, another called Date_Harvest. Connect each one to its matching column. Each gets its own active relationship and you don't need to touch any DAX.

Common Modelling Issues
IssueWhat It Looks LikeFixCircular dependencyPower BI auto-disables one of your relationshipsRemove the relationship that's creating the loopAmbiguous filter pathTwo routes between the same tablesKeep only one active path between any two tablesUnintended M:M* on both ends of a relationship lineDeduplicate the key column in the dimension tableBroken time intelligenceDate-based measures return wrong valuesRight-click your Date table → Mark as Date Table

Joins vs Relationships — When to Use Which
UseWhenJoins (Power Query)You want to combine tables into one before loadingRelationships (Model View)You want tables to stay separate but connected in the model

Wrapping Up

Data modelling is the part of Power BI that makes everything else work — or not work. Once you understand that fact tables hold your numbers, dimension tables hold your descriptive labels, and relationships are the channels filters travel through, the rest starts to fall into place.
For our Kenya crops dataset: it comes as a flat table, which is fine to start. But in a real project, the goal would be to split it into a star schema — Harvests at the center, with Counties, Crops, Farmers, and Dates connected around it via 1:M relationships.
That's the target. Now at least we know what we're aiming for.