DEV Community

Cover image for Connecting Data the Right Way: Modeling, Relationships, and Schema Design in Power BI
David Mwandairo
David Mwandairo

Posted on

Connecting Data the Right Way: Modeling, Relationships, and Schema Design in Power BI

As a data professional, one norm that I have had to accept is that raw data rarely comes in one neat package. Most of the times, it'll be scattered, messy, and it lives across different tables that naturally don't talk to each other. Have you ever stayed up late looking at several spreadsheets trying to figure out how they relate to each other? Well, the good folks at Microsoft built Power BI to solve this. Power BI gives you the tools to create relationships between different sets of messy data. Before building beautiful dashboards with Power BI, however, you will need to understand what is happening beneath the surface.

This article examines the importance of data modelling, relationships, schemas and joins in Power BI. Understanding how these concepts work together and getting them right makes all the difference in creating insightful dashboards.

Beyond the Spreadsheet: Giving Your Data a Voice through Meaningful Relationships

Data modelling involves structuring and organizing your data in a way that Power BI can make sense of it. It's similar to setting up a filing system before you start working. Without having a sensible structure in place, your reports will either break or produce results that are untrustworthy.

In Power BI, the data model resides in the Model View, where all your tables and the connections between them are visibly laid out. A good data model answers one simple question: how does each piece of data relate to every other piece?

In this article, we will use one relatable example. Imagine you own an online bookstore and your data lives in three different tables:

  • Customers - stores customer names, emails and locations.
  • Orders - stores order IDs, dates, amounts, and customer references.
  • Books - stores book titles, authors, genres, and prices.

Each table is useful on its own but limited. Structuring a good data model will connect them in a way that Power BI will understand the full story, for example who bought what, when and for how much.

Using Joins in Power BI to Merge Tables

A join is an operation that combines rows from two tables based on a related column they share. This happens in Power BI within the Power Query Editor, which is the data transformation workspace before the data enters the model. To perform a join in Power Query, you navigate to:

Home > Merge Queries
Enter fullscreen mode Exit fullscreen mode

You then select the two tables that you would like to combine and choose the matching columns. For our bookstore, you would match the Customer_ID column in the Orders table with the Customer_ID column in the Customers table.

There are six types of joins in Power BI and picking the right one is critical:
Note: The left table is the primary table you will choose and make reference to. The right table will be the table you'd like to establish a connection with.

Inner Join

This join returns only the rows where there is a match in both tables. In our bookstore data, only customers who have placed at least one order will be seen in the results

Left Outer Join

This join returns all rows from the left table and only the matching rows from the right table. From our bookstore data, it will return all the customers, even the ones with no orders yet.

Right Outer Join

This is the opposite of a left outer join. It returns all rows from the right table and only the matching rows from the left. From our bookstore example, it will return all the orders even if the customer record is missing.

Full Outer Join

This join returns everything from both tables whether they match or not and missing values will appear as null. From our data, it will return every customer and every order regardless of a match.

Left Anti Join

This join returns only the rows from the left that have no match in the right table. From our bookstore data, it will return the customers who have never placed an order, thus can be used in targeting inactive users.

Right Anti Join

This join returns the rows from the right that have no match in the left table. In our case, it will return the orders with no associated customer record.

Selecting the wrong type of join can subtly corrupt you reports, therefore, always pause and ask yourself what you actually need to see.

Connecting the Dots: How Relationships Shape Power BI Data Models

While joins are necessary in connecting tables during the data preparation stage, relationships connect tables at the model level, thus allowing them to interact during analysis. In the Model View, relationships are created by dragging a column from one table and dropping it onto the matching column in another table. The relationship between them will be represented by a line that Power BI will draw between the two tables.

Types of Relationships

One-to-Many (1:*)

In this relationship type, one record in table X relates to multiple records in table Y, but each record in table Y relates to only one record in table X. For example, one customer can make many orders but each order belongs to one customer.

Many-to-Many (:)

Several records in table X can relate to multiple records in table Y. This relationship is complex and can return unanticipated results if not handled carefully even though it is supported in Power BI. It requires special attention to execute. For example, one book can appear in many orders and a single order can contain many books.

One-to-One (1:1)

Each record in table X matches exactly on record in table Y. For example, each customer has exactly one loyalty profile record.

Cross-Filter Direction

Filter direction controls how filters flow between tables when interacting with reports. They are classified into:

  • Single Direction - Filters flow in one direction i.e. from the lookup table to the data table. This is the usual default.
  • Bidirectional - Filters flow in both directions. If overused, this can produce confusing results.

How Schema Design Shapes Your Data Model

A schema is the basic layout and structure of a data model. The schema you choose will affect the performance and readability of your Power BI reports. There are two types of schema used in Power BI:

Star Schema ⭐

This schema consists of one central Fact Table enveloped by multiple Dimension Tables. The fact table holds the numerical, measurable data while the dimension table holds the descriptive, categorical data. The figure below shows a representation of the star schema:

        [Customers]
             |
[Books] — [Orders] — [Dates]
             |
         [Locations]
Enter fullscreen mode Exit fullscreen mode
Table Type Example Contains
Fact Table Orders Order amounts, quantities, revenue
Dimension Table Customers Names, emails, locations
Dimension Table Books Titles, authors, genres, prices
Dimension Table Dates Day, month, quarter, year

The star schema is fast and easy to understand. Power BI's engine is optimized to work with this structure, hence reports will load faster and DAX calculations will work more efficiently.

Snowflake Schema ❄️

This schema further extends the star schema by breaking down the dimension tables into more related tables. It's structure resembles that of a snowflake because of its branching structure. Its structure is as shared below:

[Author Details] — [Books] — [Orders] — [Customers] — [Customer Segments]
Enter fullscreen mode Exit fullscreen mode

The snowflake schema is known to add complexity to a data model even though it may save on storage space and reduce data redundancy. Filters, in effect, have to move through more relationships to get to their destination, thus slowing down performance and making the data model hard to maintain.
Pro-tip: Stick with the star schema when starting out with Power BI because it is the most used in Power BI projects.

Why Do We Need to Know This?

Charts and visuals are exciting to create when we need to showcase our Power BI projects, but a poorly structured data model will produce bad results. Some of the effects of a poorly structured data model include; filters behaving strangely and numbers not adding up when using DAX functions.

A solid foundation in the creation of good data models will save you from hours of debugging. - David Mwandairo

Getting your joins, relationships and schema right from the beginning results to:

  • Simpler DAX calculations.
  • Faster report performance.
  • Dashboards that are easier to scale and maintain.
  • Accurate and trustworthy numbers.

The data model is the foundation of a good Power BI report. When you build it well, everything else will become considerably easier.

Conclusion

Power BI is a capable Business Intelligence tool. It just requires you to understand the fundamentals and it will treat you well. Joins, modelling and relationships aren't the most exciting topics but understanding them well will separate data professionals that prepare reports which look good from those who prepare reports that actually work.

To perfect the Star Schema concept, you should ensure that you use intentional joins in your data. The one-to-many relationships that you build will have to be clean and always ask yourself whether you data reflects real-world logic. Once that foundation is solid, the dashboards and insights will take care of themselves.

Top comments (0)