DEV Community

Ibrahim0695
Ibrahim0695

Posted on

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained

INTRODUCTION
Following the first week of messing around with ‘backend’ data at LuxDevHQ, this week, we were given a chance to mingle with the data analytics group again. For what you might ask, well, Power BI. Honestly, I didn’t think I would enjoy ‘frontend’ data like this! A lot has been learnt, but I think the most intriguing part of Power BI is data modelling, because to me it has a sort of likeness to data engineering in that it improves performance, reduces errors, and makes analysis easier. Who wouldn’t want data that is organized in structures that allows for fast and accurate analysis and visualization... I mean who?

In this article I’m planning to dive into data modelling in BI, placing focus on joins, relationships and schemas.

So what is data modelling?
It is simply the process of connecting different data sources, defining how they relate to one another(tables), and organizing them into a structure that provides easy analysis. A good way to think of it is the way movie shops, categorizes and indexes its movies and series in a way that is easy to retrieve and view.

SQL Joins in Power BI simplified

SQL joins are implemented through the "Merge Queries" feature in the Power Query Editor, which combines rows from two tables based on a shared column.

Power Bi supports several join types, below are their description and a simple example to get you up to speed;
image of joins

Inner join; Returns only rows where the key exists in both tables.

inner join
Targeted Marketing: A list of customers who have both an active account and a registered email address.

Left Outer(left join); Keeps all rows from the "Left" table and adds matching data from the "Right.

left join image

Sales Report: All products in your catalog, including those that haven't sold yet (which show as null).

Right Outer( right join); Keeps all rows from the "Right" table and adds matching data from the Left.

right join

Audit: All transactions in a bank ledger, matching them to customer profiles (useful if some transactions lack profile data).

Full Outer JOIN; Returns all rows from both tables, matching them where possible.

full join
Consolidated Inventory: Merging stock lists from two different warehouses to see every item available in the company.

Left Anti JOIN; Returns rows from the Left table that do not have a match in the Right.

left anti image

Inventory Cleanup: Finding products in your warehouse that have never appeared in a sales record.

Right Anti JOIN; Returns rows from the Right table that do not have a match in the Left.

right anti

Error Tracking: Finding sales records that are missing a corresponding "Product ID" in your master product list.

How to Create Joins in Power BI:

Open Power BI;

  1. - Click Transform Data on the Home ribbon to open the Power Query Editor.
  2. - Select your primary table from the "Queries" pane on the left.
  3. - In the Home ribbon, click Merge Queries.
  4. - In the dialog box, select the second table from the dropdown.
  5. - Crucial Step: Click the matching column in both table previews (e.g., ProductID).
  6. - At the bottom, choose your Join Kind (e.g., Left Outer).
  7. - Once merged, click the **Expand **icon (two arrows) in the new column header to choose which columns from the second table you want to pull in.

Relationships in BI

In Power BI, Relationships are the secret sauce that allows you to connect different tables so they can work together as one. Without them, you would just have a bunch of isolated lists that can’t speak to each other.

Imagine you have a list of Sales and a list of Products. A relationship tells Power BI that the "Product ID" in your Sales list is the same as the "Product ID" in your Product list.

The Core Concept: Fact vs. Dimension Tables
Before looking at the lines connecting tables, you need to know the two types of tables:

  1. - Fact Tables(The "What happened"): These are long tables that store events or transactions (e.g., Sales, Temperatures, Stock levels). They usually have many numbers.
  2. - Dimension Tables(The "Context"): These are shorter, unique lists that describe things (e.g., a list of every Product you sell, a list of every Store, or a Calendar).

TYPES OF RELATIONSHIPS

1. Cardinality (The "How Many")

When you drag a line between two tables in the Model View, Power BI looks at how many times a value appears on each side.

One-to-Many(1:M) The "Gold Standard"
This is the most common and healthiest relationship.

Example: One Product (in your Dimension table) can be sold Many times (in your Sales/Fact table).

One-to-One (1:1)
Every row in Table A matches exactly one row in Table B.
This is rare and usually means the to tables should probably just be merged into one.

Many-to-Many(M:M)
This happens when values repeat in both tables.

Example: A list of Students and a list of Classes (One student has many classes; one class has many students).

Warning: Beginners should avoid this if possible, as it can make your numbers "double-count" or act strangely.

2. Cross-Filter Direction (The "Flow")

The arrow in the middle of the relationship line tells Power BI which way the "filtering power" flows.

Single Direction(One-way arrow): The filter flows from the "One" side to the "Many" side. If you click a "Category" in a slicer, your "Sales" total updates. This is the safest and fastest setting.

Both Directions (Two-way arrow): The filter flows both ways. While it sounds helpful, it can cause performance lag and confusing results in large models.

3. Active vs. Inactive Relationships

Power BI only allows one active path between two tables.

Active (Solid Line): This is the "Main Highway." Power BI uses this for all your charts by default.

Inactive (Dotted Line): This is a "Backroad." It exists, but Power BI won't use it unless you write a specific piece of code (DAX) to tell it to take that path. This is common if you have both an Order Date and a Ship Date in the same table.

How to Create a Relationship (Step-by-Step)

  • Click the Model View icon on the far left of Power BI Desktop (it looks like three small boxes connected by lines).
  • Find the column that exists in both tables (e.g., CustomerID).
  • Click and drag the column name from one table and drop it directly onto the same column name in the other table.
  • A line will appear. You can double-click that line to open the "Edit Relationship" window if you need to change the Cardinality or Filter Direction.

** Managing Relationships**

  • Model View: Click the Model icon (the third one down on the left-hand pane).
  • Drag & Drop: Simply click a field in one table and drag it onto the corresponding field in another. Power BI will attempt to guess the cardinality.
  • The Dialog Box: Double-click any relationship line (or click Manage Relationships in the top ribbon) to manually toggle it between Active/Inactive or change the Cross-filter direction.

Schemas

A **Schema **is simply the way you arrange your tables and the relationships between them. Think of it like a map: it shows where the data lives and how a filter in one place travels to update a number in another.

To understand schemas like relationships, you first need to know the two types of tables that live inside them:

Fact Tables: The "What happened." These are long lists of numbers (Sales, Temperature readings, Stock levels).

Dimension Tables: The "Who, Where, and When." These provide context (Product names, Date calendars, Store locations).

Here are the three most common ways to organize these tables:

1. The Star Schema (The Gold Standard)
The Star Schema is the most recommended layout for Power BI. It features one Fact table in the center, directly connected to several Dimension tables. It looks like a star because the dimensions radiate out from the middle.

**Why use it? **It is the fastest for Power BI to calculate. Because every dimension is only "one jump" away from the fact table, your reports will be snappy and your DAX formulas will be simpler.

Tip: If you aren't sure which one to use,** always try to build a Star Schema.**

2. The Snowflake Schema
A Snowflake Schema is just a Star Schema where some of the "arms" (dimensions) are broken down into even smaller tables. For example, instead of having a "Product" table with the Category and Sub-category inside it, you have three separate tables: Product → Sub-Category → Category.

Why use it? It is very organized and saves space because it doesn't repeat words as often.

The Downside: It’s "colder" and slower. Power BI has to "jump" through multiple tables to get an answer (e.g., to find Sales by Category, it has to go from Category to Sub-Category to Product to Sales). This can slow down large reports.

3. The Flat Table (DLAT)
This isn't really a "schema" in the traditional sense; it’s just one giant table that contains everything—the sales, the product names, the dates, and the customer addresses all in one spreadsheet-style view.

Why use it? Beginners often start here because it’s how Excel works. It’s easy to understand at first glance.

The Downside: It is very inefficient. If you have 1 million sales of the same "Red Hammer," Power BI has to store the words "Red Hammer" 1 million times. This makes your file size huge and makes "Time Intelligence" (like comparing this month to last month) very difficult to calculate.

How to Build These in Power BI
You manage these schemas in the Model View (the icon on the far left that looks like three little boxes connected by lines).

Drag and Drop: Click a column (like ProductID) in one table and drag it onto the matching column in another table to create the "arm" of your schema.

Check the Direction: Look for the arrow on the line. In a good Star Schema, the arrow should point away from the Dimension and toward the Fact table. This means the Dimension is "filtering" the Fact.

Common Modeling Issues to Watch For

  • Role-Playing Dimensions: This happens when a dimension (like Date) needs to filter the Fact table multiple times. Solution: Use Inactive Relationships and the DAX function USERELATIONSHIP.
  • - Many-to-Many Hazards: Relating two tables on a non-unique key can lead to "Double Counting." Solution: Use a "Bridge Table" with unique values to sit between them.
  • - Circular Dependencies: When filters loop back on themselves. Power BI will block these to prevent infinite calculation loops

Conclusion

It is crucial that before you throw data into Power BI, model it first This is go along way in avoiding confusion and lag.

Top comments (0)