DEV Community

Cover image for Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.
Leah Kivuti
Leah Kivuti

Posted on

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.

Two weeks ago I had no idea what Power BI is, what data modeling means, or why people keep talking about joins and relationships. Everything sounded complicated and honestly, a bit overwhelming. But as I started learning step by step, I realized these concepts are actually very logical and easy to understand when explained in a simple way.

Let us start with joins, because that is where most people begin when working with data.

When working with data, we often have multiple tables that need to be connected. For example, you might have a customers table and a sales table. The customers table contains details like customer names, while the sales table contains transactions such as what was bought and how much was spent. These tables are connected using a common column like Customer ID.

This is where joins come in.

A join helps us combine data from two tables based on a common column. But not all joins behave the same way. Each type of join answers a different kind of question.

A left join keeps all the records from the left table and then checks if there is a match in the right table. If there is a match, it brings in the data. If there is no match, it still keeps the record from the left table and leaves the other side blank. This is useful when you want a complete list from one table and just want to see which ones have matching data in another table.

A right join does the opposite. It keeps all records from the right table and checks for matches in the left table. Even if there is no match, the records from the right table are still included.

A full outer join keeps everything from both tables. Whether there is a match or not, all records are included. This is useful when you want to see all possible data from both sides and identify where matches exist and where they do not.

An inner join is strict. It only keeps records that exist in both tables. If there is no match, the data is removed. This is useful when you only care about matching records.

Then we also have anti joins.

A left anti join shows records that exist in the left table but do not have a match in the right table. For example, customers who have never made a purchase.

A right anti-join shows records that exist in the right table but do not have a match in the left table. For example, transactions that are not linked to any known customer.

So every time we want to answer a question using joins, we must tell the system exactly how to connect the tables. We have to choose the type of join and apply it manually depending on what we are trying to achieve.

This is how things work in SQL and in many traditional data systems. However, Power BI works a bit differently, and this is where things become more interesting.

In Power BI, we do not usually write joins every time we build a report. Instead, we define relationships between tables once, and Power BI automatically uses those relationships whenever we create visuals.

A relationship is simply a connection between two tables using a common column. Once you define that connection, Power BI understands how data in one table relates to data in another table. This is what allows you to combine data from different tables in a single visual.

For example, imagine you are building a bar chart that shows customer names and total quantity purchased. The customer name comes from the customers table, while the quantity comes from the sales table. These fields are not in the same table, but because there is a relationship between them using Customer ID, Power BI is able to bring them together in one visual. Without that relationship, this would not be possible.

Relationships also allow filtering across tables. If you select a specific customer, the sales data automatically updates to show only that customer’s transactions. This happens because Power BI understands how the tables are connected. This brings out a key difference between joins and relationships.

Joins are temporary. You create them each time you need them, and they physically combine tables into one.

Relationships are more permanent. Once you define them, they stay in your model and are used automatically whenever you build reports, unless you remove them.

Understanding this difference is very important because it changes how you think about working with data in Power BI.

Instead of repeatedly combining tables, you focus on building a clean data model where tables are properly connected. Once that is done, everything else becomes much easier.
After understanding joins and relationships, the next important concept in data modeling is cardinality.

Cardinality describes the nature of the relationship between two tables. In simple terms, it explains how data repeats across tables. It helps Power BI understand how tables interact and how filters behave when you are building reports.

Most of the time, especially when working with fact and dimension tables, you will deal with one-to-many relationships. This is the most common type.

A one-to-many relationship means one record in one table is linked to many records in another table. For example, one customer can make many purchases. So in this case, the customer's table has one record for that customer, but the sales table can have multiple records for the same customer. When you look at it from the other direction, it becomes many-to-one, meaning many sales belong to one customer. It is the same relationship, just viewed differently.

The second type is one-to-one. This is less common and happens when each record in one table matches exactly one record in another table. A good example is an employee and their ID. One employee has one ID, and that ID belongs to only one employee. There is no duplication on either side.

The third type is many-to-many. This happens when multiple records in one table relate to multiple records in another table. For example, students and courses. One student can take many courses, and one course can have many students. These relationships can be more complex and should be handled carefully. In most cases, it is better to structure your data in a way that avoids many-to-many relationships where possible.

Now, let us connect this to fact and dimension tables.

A fact table stores transactions or events. It tells you what happened. For example, a sales table records each purchase, including values like quantity and total sales. Each row represents a single transaction.

A dimension table provides context to those transactions. It answers questions like who made the purchase, what was bought, and where it happened. For example, customers, products, and stores are all dimension tables because they describe the data in the sales table.

In most well-structured models, you will find one-to-many relationships between dimension tables and the fact table. One customer links to many sales, one product links to many sales, and one store links to many sales.

Now, let us look at how relationships are created in PowerBI.

Power BI can automatically detect relationships if columns have the same names. However, this is not always accurate. For example, it might try to connect two columns named city from different tables, even when they are not actually related. This can create incorrect relationships. Because of this, it is always important to review and create relationships manually.

To create or manage relationships, go to Manage Relationships in Power BI. From there, you can add a new relationship by selecting the two tables and choosing the common column. You also define the cardinality and confirm the connection.

Another way is to use Model View, where you can drag one column from a table and drop it onto a matching column in another table. It is also important to know that relationships can be active or inactive.

An active relationship is the one Power BI uses by default when building visuals. An inactive relationship exists in the model, but is not used unless you specifically activate it in your calculations. You can have multiple relationships between two tables, but only one can be active at a time.

One practical tip is to ensure your column names are clear and consistent. For example, renaming columns like ID to Customer ID, Product ID, or Store ID makes it easier to identify and create correct relationships.

In summary, cardinality helps define how tables are connected, fact tables store what happened, dimension tables describe what happened, and relationships bring everything together in Power BI. When these are set up correctly, your reports become much easier to build and understand.

Now, let us talk about filter direction, which is very important when working with relationships.

When you create a relationship, Power BI needs to know how filters should flow between tables. This is called cross-filter direction. In most cases, especially in a proper data model, filters flow from dimension tables to the fact table.

This means if you filter by customer, product, or store, it will affect the sales table. For example, if you select a specific customer, the sales data will automatically update to show only transactions related to that customer. The same happens if you filter by product or store. This is why relationships are so powerful.

For example, you can create a chart that shows the total quantity by customer name. The quantity comes from the sales table, while the customer name comes from the customers table. Because there is a relationship between them, Power BI is able to combine these fields correctly.

If the relationship does not exist, the visual will not work as expected. You might only see totals that do not break down properly, or Power BI might not know how to connect the data at all.

The same applies when using slicers.

Slicers are simply filters in Power BI. You can create slicers using fields like customer name, product name, or store name. When your relationships are set correctly, selecting a value in one slicer will affect all related visuals in your report.

For example, selecting a customer should update charts showing product sales and store performance. If this is not happening, it usually means your relationships are missing, incorrect, or inactive.

Power BI also allows two types of cross-filter direction.

Single direction is the default and most recommended. Filters flow in one direction, usually from dimension tables to the fact table. This keeps your model simple and avoids confusion.

Both directions allow filters to move in both directions between tables. This can be useful in some advanced scenarios, especially with many-to-many relationships, but it should be used carefully because it can create ambiguity and unexpected results.

In summary, after creating relationships, you must ensure they are correct, active, and properly configured. Renaming columns helps avoid confusion, removing incorrectly detected auto relationships improves accuracy, and understanding cross-filter direction ensures your data behaves as expected.

When all these are set correctly, your visuals, filters, and reports will work smoothly and give you accurate insights.

Now that we understand joins, relationships, cardinality, and filter direction, we can define data modeling more clearly.

Data modeling is the process of organizing your data into structured tables and defining how those tables relate to each other so that analysis becomes easy and meaningful. It is not just about connecting data, but about designing how that data should be structured from the beginning.

You decide what goes into the fact table, what belongs in dimension tables, and how everything connects. That structure is what we call a data model.

There are different ways to structure this model, and these are called schemas. Let us start with the simplest one, the flat table.

A flat table is a single table that contains all the data in one place. There are no separate fact or dimension tables, and there are no relationships. Everything is stored together.

Flat tables are very common because this is how data is usually collected or exported. For example, when you download data from a system or scrape data from the web, it often comes as one large table. They are easy to read and understand because everything is in one place. You can quickly look at the table and understand what is happening without needing to connect multiple tables.

However, flat tables are not ideal for analysis, especially when the data becomes large. One major issue is data duplication. The same values, such as customer names, product names, or locations, are repeated many times. This increases the size of the data and reduces performance.

Another issue is maintenance. If something changes, for example, a store location or a product name, you have to update it in many rows. This is time-consuming and can lead to inconsistencies. Flat tables can also become very large and slow as more data is added. This makes them harder to manage and less efficient for reporting.

Because of these limitations, we move from flat tables to more structured models called schemas.

Schemas help us organize data properly by separating it into fact and dimension tables and connecting them using relationships. This reduces duplication, improves performance, and makes analysis easier.

The two main schemas used in data modeling are the star schema ** and the **snowflake schema, which we will explore next.

A schema is simply the structure of how your tables are organized and connected. It defines how your data is arranged so that it is easy to understand and analyze.

The most common and most recommended schema in Power BI is the star schema.

A star schema is a structure where you have one central fact table, and then multiple dimension tables connected directly to it. When you look at it in Model View, it actually looks like a star, with the fact table at the center and the dimension tables spreading out around it.

For example, you can have a sales table at the center. This is your fact table because it stores transactions. Then you connect it directly to dimension tables like customers, products, and stores. All these dimension tables link directly to the sales table, not to each other.

As long as you have one central fact table and multiple dimension tables connected directly to it, you have a star schema. This is the most commonly used schema because it is simple, clear, and efficient.

First, it reduces data duplication compared to a flat table. Instead of repeating customer names, product names, or store details many times, you store them once in their own tables.

Second, it is easy to understand. You can quickly look at the model and identify the fact table and the dimension tables without confusion.

Third, it is easier to build and works very well with Power BI. It improves performance and makes reporting faster.

Because of these advantages, most professional Power BI models are built using the star schema.

In a star schema, the fact table stores events or transactions, while the dimension tables provide context. Relationships connect the tables, and cardinality defines how they relate.

However, there is another type of schema called the snowflake schema.

A snowflake schema is a more advanced version of the star schema. It starts as a star schema, but then the dimension tables are further broken down into smaller related tables.

This process of breaking down tables is called normalization.

Normalization means organizing data in such a way that reduces duplication and improves data consistency. Instead of storing the same information repeatedly in one table, you split it into multiple tables and connect them using IDs.

For example, instead of having a customer's table that contains customer name, city, county, region, and country all in one place, you can break it down.

You can have a customers' table with customer ID and city ID. Then you create a cities table that contains city details. That cities table can link to a counties table, and the counties table can link to a regions table.

So instead of repeating city, county, and region information for every customer, you store each piece of information once and connect them using relationships.

This creates a chain-like structure.

The fact table connects to a dimension table, which connects to another dimension table, and so on. This branching structure is what forms a snowflake schema.

Compared to the star schema, the snowflake schema has some advantages. It reduces data duplication even further. It improves data consistency because changes are made in one place. It works well for hierarchical data, such as city to county to region to country.

However, it also has disadvantages.

It is more complex to understand because it has more tables. It is harder to build and manage. It can slow down reporting if not designed properly. It is more difficult to debug when something goes wrong.

Because of this, the snowflake schema is usually used in more advanced scenarios, especially when dealing with very large datasets or hierarchical data.

In most cases, especially for beginners and for building dashboards, the star schema is the best choice.

To summarize, schemas define how your data model is structured. A star schema is simple, fast, and widely used, with one fact table connected directly to dimension tables. A snowflake schema is more complex, where dimension tables are further broken down into smaller related tables through normalization.

Understanding these two schemas is important because they form the foundation of how you design efficient and scalable data models in Power BI.

Common mistakes to avoid when building data models

Some common mistakes often happen when building data models, especially when you are still learning how Power BI works. One of the most frequent issues is missing relationships between tables. When tables are not properly connected, your visuals may not respond as expected. For example, if you have a patient's table and a visits table but you forget to relate them using a patient ID, selecting a specific patient will not correctly filter the visit records, leading to confusing or incomplete results.

Another common mistake is using the wrong type of relationship. For instance, if you incorrectly set up a many-to-many relationship when a one-to-many relationship would work, your results may become duplicated or inaccurate. A simple example is when one patient can have many visits, but each visit belongs to only one patient. If this is modeled incorrectly, Power BI may count or display values multiple times, which affects your analysis.

Keeping unnecessary columns in the fact table is also a problem. In a well-designed model, descriptive details such as patient name, doctor specialty, or department should live in dimension tables, not in the fact table. If you keep repeating these details in the fact table, you increase redundancy and make the dataset harder to manage. For example, instead of storing a patient’s name in every visit record, you store a patient ID in the fact table and keep the name in the patients table. This keeps the model clean and avoids repetition.

Another issue is overusing bi-directional filters. While bi-directional relationships can be useful in certain scenarios, using them everywhere can make your model complex and slow. For example, if you allow filters to flow both ways between a doctor's table and a visits table unnecessarily, a single filter might unexpectedly affect multiple tables in ways you did not intend, leading to confusing results. In most cases, a single direction from the dimension tables to the fact table is sufficient and safer.

Some people also overcomplicate their models by creating unnecessary snowflake schemas when a simpler star schema would work better. For example, instead of keeping patient, doctor, and department information in separate chains of tables, you could keep them as straightforward dimension tables directly connected to the fact table. Adding too many layers of related tables can make the model harder to understand and slower to query without adding real analytical value.

In simple terms, good data modeling is about keeping things clear, organized, and purposeful. When your tables are properly related, when each piece of information is stored in the right place, and when the schema is simple and efficient, your Power BI reports become easier to build, faster to run, and more reliable. For example, with a well-structured model of patients, doctors, diagnoses, and payments connected to a central visits table, you can easily answer questions like how many patients a doctor has seen, which payment methods are most common, or which diagnoses occur most frequently, without running into confusion or errors.

Power BI has significantly changed how I approach data and problem-solving. It has helped me move from viewing data as raw and disconnected information to seeing it as a structured and meaningful system that can tell a story. Through learning data modeling, relationships, and visualization, I am now able to organize datasets, connect different sources of information, and build dashboards that clearly communicate insights.

It has also improved my analytical thinking, making me more careful about how data is structured and how different pieces relate to each other. Overall, Power BI has strengthened both my technical skills and my confidence in working with data to support better decisions.

Top comments (0)