DEV Community

Venus-Kennedy
Venus-Kennedy

Posted on

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.

Introduction

Data modeling is the foundation of any successful Power BI report. Without a proper model, even the best visualizations can produce misleading or inaccurate insights. In this article, I’ll break down data modeling in a simple, practical way—covering SQL joins, Power BI relationships, schemas, and common modeling challenges.

What is Data Modeling?
Data modeling is a visual representation of data structures, relationships, and business rules. This process is done to better understand and organize information. It is a crucial step in database design to ensure readability and efficiency.

NOTE: This process is different from “machine learning modeling," where predictive models are trained and evaluated.

*Understanding Joins
*

Joins are operations that combine two tables based on matching values in one or more columns. Let us walk through each join type using a simple example. Imagine you have two tables:

Sales table with columns: Date, Customer ID, Amount

Customers table with columns: CustomerID, CustomerName, City

The CustomerID column exists in both tables and serves as the key that connects them.

Inner Join
An inner join returns only the rows where both tables have a matching value. If a sale has a CustomerID that does not exist in the Customers table, that sale is dropped. If a customer has no matching sales, that customer is also dropped. You obtain only the intersection of both tables. This method is best to use when you only care about records that exist in both tables. For example, this shows only customers who have actually made a purchase.

Left Outer Join
A left outer join returns all rows from the left (first) table and the matching rows from the right (second) table. If a row in the left table has no match in the right table, the right-side columns come back as null.

When to use it: This option is the most common join type. Use it when you want to keep all your primary records (like all sales) and just add supplementary information from another table (like customer names). You do not want to lose sales data just because a customer record is missing.

Right Outer Join
A right outer join is the mirror of a left outer join. It returns all rows from the right table and the matching rows from the left table. Unmatched rows on the left side come back as null.

When to use it: A right join is less common than a left join because you can usually just swap the table order. But it is useful when you want to keep all records from a reference table (like all customers) even if they have no transactions yet.

Full Outer Join
A full outer join returns all rows from both tables. Where there is a match, the columns are filled in from both sides. Where there is no match on either side, you receive nulls.

When to use it: When you need a complete picture of both datasets. For instance, you can reconcile two lists to see which records exist in one, the other, or both.

Left anti-join
A left anti join returns only the rows from the left table that have no matching rows in the right table. It is essentially the opposite of an inner join for the left side.

When to use it: Finding orphaned records. For example, you can check all sales where the CustomerID does not exist in the Customers table. "This is great for data quality checks and identifying gaps in your data.

Right anti-join
A right anti-join returns only the rows from the right table that have no matching rows in the left table. When to use it: The flip side of a left anti. Say you want to see all customers who have never made a purchase. Useful for things like identifying dormant customers or products that have never been sold.

How to Create Joins in Power BI (Power Query)

  1. Open Power BI Desktop

  2. Click "Transform Data."

  3. Select a table

  4. Click "Merge Queries."

  5. Choose:

  6. Second table

  7. Matching columns

  8. Join type (Inner, Left, Right, etc.)

  9. Click OK.

  10. Expand the merged column

*Understanding Relationships in Power BI
*

Joins and relationships are not the same thing, even though they both involve connecting tables. Joins (in Power Query) physically combine two tables into one before the data reaches the model. The result is a single merged table. Relationships (in the data model) keep tables separate but define a logical connection between them. Power BI uses these connections at query time to filter and aggregate data across tables without merging them together. This is the preferred approach for data modeling in Power BI because it keeps your model clean, efficient, and flexible.

*Cardinality Types
*

Cardinality describes how many rows on one side of a relationship match rows on the other side.

One-to-Many (1:M) is the most common and recommended type. One row in the lookup table matches many rows in the data table. For example, one customer can have many orders. The "one" side has unique values (like CustomerID), and the "many" side can have duplicates (like CustomerID in a Sales fact table).

Many-to-One (M:1) is just the reverse direction of one-to-many. It is the same relationship, just described from the other table's perspective.

One-to-One (1:1) means each row in one table matches exactly one row in the other table. This is uncommon and often signals that the two tables could be combined into one. It sometimes shows up when you split a wide table for organizational reasons.

Many-to-Many (M:M) means both columns can contain duplicate values. This is the trickiest type and should be used sparingly. It is sometimes necessary when you are working with tables at different levels of granularity. For example, if you have sales targets stored at the product category level, but your product dimension is at the individual product level.

Cross-Filter Direction
Cross-filter direction controls how filters flow between related tables.

"Single" means filters flow in one direction only, from the "one" side to the "many" side. This is the default and recommended setting for most relationships. When you filter on a dimension table (like selecting a specific customer), that filter flows through to the fact table (showing only that customer's sales). But filtering the fact table does not flow back to filter the dimension table.

"Both" (bidirectional) means filters flow in both directions. This can be useful in specific scenarios, like when you need a slicer on a fact table to filter a dimension. But bidirectional filtering can cause performance issues and create ambiguous filter paths, so use it only when you genuinely need it.

Active vs. Inactive Relationships
Power BI only allows one active relationship between any two tables at a time. In the Model view, active relationships show as solid lines, while inactive relationships appear as dashed lines.

Why would you have an inactive relationship? The most common reason is role-playing dimensions. Suppose you have a Date table and a Sales table, but your Sales table has three date columns: OrderDate, ShipDate, and DeliveryDate. You can create three relationships from the Date table to the Sales table, but only one can be active at a time. The other two become inactive.

How to Create Relationships in Power BI
There are several ways to create and manage relationships in Power BI Desktop:

Method 1: Drag and Drop in Model View
Click the Model view icon on the left sidebar of Power BI Desktop (it looks like a diagram).

You will see all your tables displayed as cards with their columns listed.

To create a relationship, simply click on a column in one table and drag it to the matching column in another table.

Power BI will auto-detect the cardinality and cross-filter direction. You can double-click the relationship line to edit these settings.

Method 2: Manage Relationships Dialog
Go to the Modeling tab in the ribbon and click Manage Relationships.

This opens a dialog that lists all existing relationships. You can see the tables, columns, cardinality, and whether each relationship is active.

Click New to create a relationship. Select your two tables and the columns to join on. Set the cardinality and cross-filter direction.

Click OK to save.

Method 3: Autodetect
In the Manage Relationships dialog, click Autodetect.

Power BI will scan your tables for columns with matching names and data and attempt to create relationships automatically.

This is convenient, but always review the results. Autodetect can create incorrect relationships if column names are coincidentally similar.

Method 4: Model Explorer (Properties Pane)
In the Model view, you can also right-click on the Relationships section in the Model Explorer pane and select New relationship. This opens a streamlined Properties pane where you fill in the details. This approach generates fewer queries, which matters when working with large datasets or DirectQuery connections.

Fact Tables vs. Dimension Tables
Understanding the difference between fact tables and dimension tables is fundamental to building a good data model.

Fact tables store your transactional or event data. These are the tables with the numbers you want to analyze: sales amounts, order quantities, revenue figures, and website clicks. Fact tables tend to be tall (many rows) and relatively narrow (fewer columns). Each row represents a single event or transaction. Fact tables contain foreign keys that link to dimension tables.

Dimension tables store the descriptive context around your facts. They answer the "who, what, where, and when" questions: customer names, product categories, store locations, and dates. Dimension tables tend to be wide (many descriptive columns) and shorter (fewer rows compared to facts). Each row represents a unique entity, and the key column has unique values.

A simple way to think about it: if you are summing, counting, or averaging a column, it probably belongs in a fact table. If you are filtering or grouping by a column, it probably belongs in a dimension table.

*Schemas: Star vs. Snowflake and Flat Table
*

What is a Schema?
A schema is the blueprint for the logical relationships among data. It shows how a table is related to another. For a single source table, it shows the properties of each column.

Common Schemas
When you load data into Power BI (from a file or from an online server), they may be represented as multiple tables. But you may also have a task that involves analyzing only one big table. Here are some common schemas that you might experience as a data analyst or business intelligence analyst.

  1. Flat Schema
    It is a single table with columns and rows. It has no relationship with any other tables. Although it is straightforward (no need to join tables), it can cause performance issues for a larger dataset and it cannot manage complex relationships between fields (or columns).

  2. Star Schema
    It represents different tables, with dimension tables connected to a central facts table. In a star schema, the facts table contains the quantitative data while the dimension tables are supplementary descriptive information about the data in the facts table. Below is an example of a star schema. The Sales facts table is connected to four dimension tables (Products, Region, Date, Salesperson).

Star schema is often used in data warehousing for analytical and reporting purposes. However, because the dimension tables are denormalized (i.e. values for some columns may repeat), star schema can lead to increased storage requirements. It may also be limited in analyzing more complex relationships between data in some business cases.

  1. Snowflake Schema Snowflake schema extends the star schema by normalizing the dimension tables, breaking them into multiple related tables. There is still a central facts table connected to smaller dimension tables, which are connected to even smaller dimension tables.

The purpose of normalizing data tables is to minimize redundancy, improve data integrity, and facilitate database maintenance. However, query performance might be adversely affected as there are more relationships than in the Star schema.

*Common Data Modeling Issues
*

  1. Ambiguous Relationships

Occurs when multiple paths exist between tables.

  1. Circular Relationships

Loops in relationships causing errors.

  1. Many-to-Many Confusion

Leads to incorrect aggregations.

  1. Poor Performance

Caused by:

Flat tables
Too many joins
Bidirectional filters

*Best Practices
*

  1. Use Star Schema
  2. Keep relationships simple
  3. Avoid unnecessary joins
  4. Use single-direction filtering where possible
  5. Separate facts and dimensions

Conclusion

Data modeling is what makes Power BI powerful. Understanding joins, relationships, and schemas allows you to build accurate, scalable, and high-performing dashboards.

If you get the model right, everything else DAX, visuals, and insights becomes much easier.

Top comments (0)