Odhiambo

Posted on Mar 30

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained

#microsoft #datascience

Many times while working with relational data, there is need to get context information from different data tables. Understanding how to perform modelling when working with such situations is essential. Let us look at some data modelling concepts when working with Power Bi.

Joins

In Power Bi, joins are called Merge queries. This is when two tables are joined together depending on matching values from columns. At least one column in the two tables must have matching values eg. an id appearing both in users table and a similar matching id value appearing in subscriptions table. The names of the two columns do not need to be same, but the underlying values must match.

In most cases, this involves a primary key from one table and a foreign key in the other table. To perform joins in Power Bi, we select Merge queries command from the home tab.

Image from learn.microsoft.com

Several types of joins are available in Power Bi:

inner join: returns only matching rows from both the left and right tables

Image from learn.microsoft.com
left outer join: returns all the rows from left table but only the rows from the right table that have a match

Image from learn.microsoft.com
right outer join: returns all the rows from right table but only the rows from the left table that have a match

Image from learn.microsoft.com
full outer join: returns all the rows from both the left and right tables

Image from learn.microsoft.com
left anti join: returns only records from left table that do not have matching rows in the right table

Image from learn.microsoft.com
right anti join: returns only records from right table that do not have matching rows in the left table

Image from learn.microsoft.com
fuzzy merge: this is only supported over text columns. It aims to provide a more standardadized way of representing text data in a columns eg. where distinct names are misspelled eg Vollvo instead of Volvo. It it preceded by another join type, typically, left outer join.

Sample goal of fuzzy join from learn.microsoft.com
cross join: results in a Cartesian-like result from the two tables
you first select a table of interest then click on custom column command from the Add column tab on the ribbon.

image from learn.microsoft.com

Enter any name for the new custom column in the dialogue window that appears. In the Custom column formula* enter the name of the other table or query. Here it is called Colors

Image from learn.microsoft.com

Select ok and also expand the new column and click ok again to view final result.

Image from learn.microsoft.com

Relationships

There are 4 main types of relationships called cardinality
1. Many to one(*:1): It is the default. A column can have many instances of a value and only have one instance of the same value in the other table, known as a lookup table.
2. One to one(1:1): A columns can have only one instance of a value and also only one instance of the value in the other table.
3. One to many(1:*): A columns has only one instance of a value but can have many instances in the other table
4. Many to many(:): There are no unique constraints on values in the tables.

Active relationships in Power Bi are shown by solid continuous lines while inactive relationships are shown by dotted lines. Only one active path exists between tables and this is used to filter data for visuals.

Joins vs relationships

From the foregoing, we can see that joins physically merge tables while relationships only show linkages between tables.

Schemas

In Power Bi, schemas describe how your data is organized and structured. A well designed data model improves the performance of your queries as well as usability of in DAX.

A few concepts are worth noting when dealing with schemas in Power Bi:

Dimension tables: These tables describe the things being modeled such as products or places.
Fact tables: They store observations or events and contain dimension key columns.

1. Star schema

Tables are classified either as dimension of fact tables.

star schema from radacad.com

Normalizing tables in star schema involves storing data in fact tables by avoiding repetition.

2. Snowflake

It is some sort of a normalized version of star schema. Dimension tables are broken down further into sub-dimensions and end up forming branch-like structure.

Image from mmsqltips.com

3. Flat Table (DLAT)

Data is entered and consolidated into one single table. The table is usually very wide and has no relationship with other tables.

Image by Firat on medium.com

It is called DLAT(denormalized large aggregation table)

Role playing dimensions

These occur when a single dimension table can be used to filter a fact table. A fact table of sales, for instance, can be filtered based on the dates, such as order_date, shipping_date to show these specific analysis for dates.

Common modelling issues in Power Bi

Some common issues you might run into when working with Power Bi include the following:

Overloading the model: Trying to load everything instead of just what is needed from the slow. This might result in slow query processes. Reduce the number of columns used and remove any unnecessary field to improve the model.
Overusing the calculation to create new columns instead of just creating a new measure can lead to excessive use of RAM. It is recommended to use measures for aggregation and only use calculated columns when necessary.
Ignoring data types: Data should always be in the correct types to improve on the model. It is advisable to first transform the data in Power Query before loading.
Misconfiguring the table relationships can lead to circular dependencies or broken relationships. Particularly, using many-to-many relationships inaccurately can lead to double counts and eventual inaccurate data.

DEV Community

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained

Joins

Image from learn.microsoft.com

Image from learn.microsoft.com

Image from learn.microsoft.com

Image from learn.microsoft.com

Image from learn.microsoft.com

Image from learn.microsoft.com

Image from learn.microsoft.com

Sample goal of fuzzy join from learn.microsoft.com

image from learn.microsoft.com

Image from learn.microsoft.com

Image from learn.microsoft.com

Relationships

Joins vs relationships

Schemas

1. Star schema

star schema from radacad.com

2. Snowflake

Image from mmsqltips.com

3. Flat Table (DLAT)

Image by Firat on medium.com

Role playing dimensions

Common modelling issues in Power Bi

Top comments (0)