Amo InvAnalysis

Posted on Mar 29

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained

#analytics #beginners #data #microsoft

According to Microsoft, data modeling in Power BI refers to the process of connecting data from various sources, defining their relationships with each other and structuring it all for efficient analysis and visualization.

You might have read all this and gone, so what? Why data modeling in Power BI? Imagine that you have a massive box full of many random receipts from your purchases. If someone asked you to determine how much you spent on fuel, or say fruits on a certain month, how would you go about it?

If you were to manually sift through it all, not only would it be tedious, but also there is a high chance your answer may not be 100 percent accurate. However, if you happened to have a system of organizing those receipts into folders for each week or month, and a way to link it all together, you would probably find a more accurate answer easily.

This is all what data modeling in Power BI is, or at least a simplified version of it. You can think of data modeling as the act of connecting the receipts or raw data from various sources into folders, connecting them or in other words, defining their relationships; with the end goal of structuring it all for purposes of analysis and visualization.

By the end of this article, you will understand data modeling in Power BI. And to keep things light, you can expect this article to take you through joins, relationships and schemas in Power BI with lots of analogies relating to day-to-day stuff you encounter. Don't let the technical jargon scare you, dive in!

Foundations First: Fact vs. Dimension Tables

Before diving into joins and relationships, you need to understand the two fundamental table types in data modeling.
Fact Tables (Verbs) contain the measurements or metrics of your business—the things that happen.

Think of them as the action tables. Sales transactions, website clicks, insurance claims, or M-Pesa transactions are all fact tables. They answer questions like "how much?" or "how many?" These tables typically contain numeric values you want to aggregate: revenue, quantity sold, claim amounts, or transaction counts.

Dimension Tables (Nouns) contain the descriptive attributes that give context to your facts. These are the "who, what, where, when" tables. Customer information, product catalogs, date calendars, or regional data are dimension tables. They provide the categories you use to slice and analyze your fact tables by product category, by customer location, by month, or by sales region.

The relationship between these is straightforward: dimension tables describe the context, fact tables record what happened in that context. A sales fact table records transactions; dimension tables tell you which customer made the purchase, what product they bought, and when it occurred.

SQL Joins in Power Query

Going by the earlier definition of data modeling in Power BI, joins, also called merging is a critical component in bringing together all pieces of the puzzle. In this case, stitching together two separate tables or data sources to come up with a unified table based on a related column between them.

The Six Types of Joins

There are different types of joins in SQL and Power Query. Here's a quick rundown using a relatable example.
Assume you're a teacher preparing for a school Math competition.

As part of this competition, you have a list of all students at the school (List A) and a list of students taking part in the competition (List B). Here's how the various joins work:

Inner Join (Matching Pairs Only) ** returns only students who appear on both lists, i.e., students who are both enrolled at the school AND competing. If a student is enrolled but not competing, they're excluded. If someone is on the competitor list but not enrolled (perhaps from another school), they're also excluded. This is the most restrictive join.

Left Outer Join (The Default Join) ** is more generous. It retains all names from List A (all students), and where there's a match, brings in competition data from List B. Students not competing show up with blank/null values in the competition columns. This is Power Query's default merge type because it preserves all records from your primary table.

Right Outer Join ** is the complete opposite of the left outer join. It retains all names on the competitor list, and in case of a match, brings the student name from List A. The rest result in nulls if there isn't a match. This join is less common in Power BI workflows.

Full Outer Join ** is for when you want everyone on both lists in your results, regardless of whether they match or not. This is perfect for creating some sort of master directory. Every student appears, every competitor appears, with nulls filling in where data doesn't exist on one side.

Left Anti Join ** shows who doesn't match. In our example, a left anti join would show non-competitor students and would help in identifying potential students to recruit for the competition. This is useful for finding gaps or customers who haven't made purchases, products without sales, or claims without follow-up.

Right Anti Join ** shows competitors who are not students. This will be helpful in identifying non-students trying to compete essentially flagging data quality issues or exceptions that need investigation.

Power BI Relationships (The Model View)

While joins in Power Query physically combine tables into one, relationships in Power BI's Model View keep tables separate but connected. This is more efficient for large datasets and more flexible for analysis.

When you load data into Power BI, you can switch to Model View to see all your tables and the lines connecting them. These lines are relationships, and they tell Power BI how tables relate to each other without duplicating data.

Cardinality: "Who Connects to Whom"

Cardinality defines how many records in one table can relate to records in another table. Understanding this is critical for accurate analysis.

One-to-Many (1:M) ** is the most common relationship type. One record in the dimension table relates to many records in the fact table. For example, one customer can have many transactions, one product can appear in many sales records, or one county can contain many customers. This is the backbone of star schema design and should be your default relationship type.

Many-to-Many (M:M) ** occurs when multiple records in one table can relate to multiple records in another. For instance, students and classes; one student takes many classes, and one class has many students. While Power BI supports this relationship type, it can create performance issues and ambiguous filter paths. Use sparingly and only when necessary.

One-to-One (1:1) ** means one record in each table matches exactly one record in the other. This is rare and usually indicates that your tables should be combined. An example might be employee basic information in one table and employee salary information in another (separated for security reasons). While valid in specific scenarios, question whether you actually need two tables.

Cross-Filter Direction

This determines how filters flow between related tables.

Single Direction is the default and recommended setting. Filters flow from the "one" side to the "many" side. When you select a customer, it filters their transactions. When you select a product, it filters sales of that product. This is predictable, performs well, and avoids circular dependencies.

Both Directions (Bi-directional filter) allows filters to flow both ways. Select a transaction, and it filters back to show only relevant customers. While this sounds convenient, it can create ambiguous filter paths, performance issues, and unexpected results. Microsoft recommends avoiding bi-directional filtering except in specific many-to-many scenarios. If you think you need it, reconsider your data model first.

Active vs. Inactive Relationships

Power BI allows only one active relationship between any two tables. All other relationships become inactive, shown as dashed lines in Model View.

Active relationships filter automatically. Inactive relationships require explicit activation in DAX using the USERELATIONSHIP function.

This is useful when you have multiple date fields; order date, ship date, delivery date, all relating to the same calendar table. One relationship is active by default; you activate others in specific measures when needed.

Schemas: Designing the Architecture

How you arrange your fact and dimension tables matters. The schema you choose affects query performance, model complexity, and how easily users can understand your data.

The Star Schema (Highly Recommended/Standard)
The star schema is the gold standard for Power BI data modeling. One central fact table connects directly to multiple dimension tables, forming a star pattern when viewed in Model View.

For example, a sales fact table sits in the center, connected directly to Customer, Product, Date, and Store dimension tables. Each dimension connects only to the fact table, not to each other. This design is simple, performs exceptionally well, and is easy for users to understand. Microsoft explicitly recommends star schema for Power BI implementations.

The benefits are significant: fast query performance because Power BI's engine is optimized for this pattern, easy DAX calculations since relationships are straightforward, and simple troubleshooting when things go wrong.

The Snowflake Schema

The snowflake schema extends the star by normalizing dimension tables. Instead of one Product dimension, you might have Product, Category, and Subcategory tables linked together.

While this reduces data redundancy and looks elegant from a database design perspective, it creates problems in Power BI. Additional table joins slow query performance. More complex relationship paths make DAX harder to write and maintain. Users struggle to understand multi-hop relationships.

Unless you're working with extremely large dimension tables where normalization significantly reduces data size, avoid snowflake schemas in Power BI. The performance cost outweighs the storage savings.

The Flat Table (Denormalized Large Aggregated Table)

A flat table combines everything into one massive table all facts and dimensions merged together. Every row contains complete information: transaction amount, customer name, product details, date information, everything.

This approach seems simple at first. No relationships to manage, no joins to configure. However, it creates serious problems: massive data redundancy (customer names repeated millions of times), poor performance as table size explodes, difficult maintenance when dimension attributes change, and inefficient memory usage.

Flat tables have their place, small datasets for quick analysis or pre-aggregated summary tables. But for proper data modeling in Power BI, stick with star schema.

Getting Your Hands Dirty: Creating All These in Power BI

Theory is important, but implementation is where understanding solidifies. Here's how to go about it all:

How to Join (Merge) in Power Query
Open Power BI Desktop and load your data sources. Click "Transform Data" to open Power Query Editor.

Select the first table you want to merge, then click "Merge Queries" in the Home ribbon.

Choose the second table from the dropdown. Select the matching columns in both tables—these are your join keys.

Choose your join type from the options provided (Left Outer is default). Click OK.

A new column appears with "Table" values. Click the expand icon next to the column header, select which columns to bring from the second table, and click OK. The tables are now merged.

Click "Close & Apply" to load the result into Power BI.

How to Create Relationships in Model View

Switch to Model View using the icon on the left sidebar. You'll see all loaded tables displayed as boxes.

To create a relationship, click and drag a field from one table to the matching field in another table. Power BI automatically detects the relationship type and cardinality based on the data.

A line appears connecting the tables. Click the line to see relationship properties: cardinality (1:M, M:M, 1:1), cross-filter direction (single or both), and active/inactive status.

Using "Manage Relationships"

For more control, click "Manage Relationships" in the Home ribbon. This opens a dialog showing all relationships in your model.

Click "New" to create a relationship manually. Select the two tables and the columns that should relate them.

Choose cardinality and filter direction. Check "Make this relationship active" if needed. Click OK.

From this dialog, you can also edit existing relationships, delete relationships, or toggle them between active and inactive.

This is particularly useful when you have multiple relationships between the same tables and need to control which one is active.

Final Thoughts

Data modeling isn't just a technical exercise, it's the foundation of every insight your Power BI reports will generate.

A well-designed model makes DAX calculations straightforward, query performance fast, and reports easy to maintain. A poorly designed model creates endless headaches: slow refreshes, incorrect results, and DAX measures that take hours to write.

Start with star schema unless you have compelling reasons to deviate. Use one-to-many relationships with single-direction filtering as your default. Merge tables in Power Query when you need to physically combine data, but use relationships in Model View whenever possible to keep your model flexible and performant.

The receipt box analogy holds true: organization upfront saves massive time later. Invest effort in proper data modeling before building a single visual. Your future self and everyone who uses your reports will thank you for it.

References
[1] Microsoft Power BI Documentation - Data Modeling - https://docs.microsoft.com/en-us/power-bi/transform-model/desktop-modeling-view
[2] Microsoft Power BI - Bi-directional Relationships Guidance - https://docs.microsoft.com/en-us/power-bi/guidance/relationships-bidirectional-filtering
[3] Microsoft Power BI - Star Schema Guidance - https://docs.microsoft.com/en-us/power-bi/guidance/star-schema