Ever tried to build a puzzle where half the pieces were in a different room?
That is how it feels working with raw data before understanding Data Modelling.
In the real world, data is not stored in one sheet, it is scattered across dozens different files. A major chain for example, has separate files for their stores, customers, sales and products. If you tried to cram all that information into one massive table, managing it would be difficult to manage!
So what is Data Modelling?
Data modelling is the process of organizing data into structured tables and defining how those tables relate to each other. In professional environments, data is organized into 2 main types;
- Facts tables - These store measurable events (the metrics). A sales table for instance, is a fact because it records every transaction, indicating how much was sold and when. Fact tables are usually long and full of numbers.
2.Dimensions tables - These provide descriptive contexts, such as customer names, color and product category. They are relatively smaller, and give meaning to the facts.
Tables are connected using Joins (in Power Query) or Relationships (in the model view).
In this article, we will look at;
- SQL Joins in Power Query
- Power BI Relationships
- Schemas
- Role Playing Dimensions
- Common Modelling Issues to watch out for
SQL JOINS IN POWER QUERY
Before data gets to the model, it has to be cleaned, this happens in Power Query using Joins.
A Join is a way of combining data from two tables based on a common column. It is a rule, that tells the computer how to merge two tables into one physical table.
Our scenario; An online retail store with two lists;
• Table A (Customers) – With Customer Name, Customer ID and Customer Location
• Table B (Orders) - Order ID, Customer ID, Order Amount, Order Date
There are 6 main kinds of Joins;
How to do it Step-By-Step (The steps are common for ALL joins.)
In Power Query:
- Click your main table (e.g. Customer List)
- Click Merge Queries in the Home Ribbon
- Select your second table, (e.g. Orders List in our case)
- Click on the matching columns in both tables (e.g. Customer ID)
- Choose your Join Kind from the dropdown.
POWER BI RELATIONSHIPS
Relationships define how tables are connected in a data model, so they can interact with each other in reports. Relationships connect tables while keeping them separate, and are created in the Model View.
Cardinality (The Match Ratio)
Cardinality describes the nature of the relationship between two tables. It answers the question How many
**_One-to-Many (1:M)_**
One record in one table relates to many records in another. For example; where one customer has multiple orders or where one bank account has multiple transactions.
Many-to-Many (M:M)
Multiple records in both tables relate to each other. These can cause double-counting and incorrect numbers. Use bridge tables if necessary.
One-to-One (1:1)
Where one record matches exactly one other in another table. This is pretty rare for relationships, and sometimes means that two tables should be merged instead.
Direction and Activity
Cross filter directions: Determines how filters flow between tables.
Single: Filters flow one way, from dimension to fact. A good example is how filtering customers will affect Orders, but orders won’t filter customers.
Both: Filters flow both ways. It is useful but can easily create confusion or incorrect results.
Active Vs Inactive Relationships
In Power BI, you can only have ONEactive relationship between two tables at a time.
Active filters are the connections that are always on.
Inactive filters are the connections that are always off, unless they are required for a certain calculation.
PowerBI allows you to have one active relationship, and one or more inactive relationships at any given time.
How to do it step-by-step
- Go to model view
- Click a column in your dimension table and drag it to the matching column in your fact table; A line appears
- Double click the line to check Cardinality and Direction
- You can make the relationship active or inactive
Differences Between Joins and Relationships
SCHEMAS
Now that we know how to connect tables, it is important to also know how to organize them. In data modelling, tables follow a layout, i.e. a Schema.
Every good model will separate data into either fact tables or dimensions tables.
There are three main table designs;
A Flat Table (DLAT)
This is where both your fact and dimension tables are in one place, as would appear in Excel.
While this could be used for one off analysis or very small datasets, it is not advisable on big models, because as data grows, the table becomes very slow and very hard to manage.
It also has a higher probability of duplicate data.
How to get it;
Use the Merge Queries in Power Query to merge all your tables together into one single, massive table.
The Star Schema
This is a model with one fact table at the center, and dimension tables surround it.
It is the industry standard, and results in a clean, fast and simple layout.
It is easy to understand and hard to break, and preferred in over 90% of reports.
It is advisable to use this, unless you have a reason not to.
How to get it;
- Load your Fact Table and Dimension Tables separately into Power Query.
- Enter Model View: Click the _Model View _icon on the far left sidebar..
- Click and drag a unique ID from your Dimension table (e.g., Product ID) onto the matching column in your Fact table.
- Drag the Fact table to the center and pan out your Dimension tables around it.
- Check for cardinality.
Snowflake Schema
This is a star schema, whose dimensions are further broken down to other dimensions. This is called normalization.
It is used when there are complex relationships such as Country>County>Sub-county.
While it can save on space due to less repetition, it becomes more complicated and reports are slower.
How to get it;
- Start Like a Star: Follow the steps above for your main Fact table.
- Split a Dimension for example instead of loading Products directly into the Fact link, load Products and then a separate Categories table.
- Connect hierarchically, i.e. Fact → Products, then Products → Categories.
ROLE PLAYING DIMENSIONS
Sometimes a single dimension table has to play more than one role. A great example is a date table, that you use to filter both Order dates, and Shipping Dates. In this case, the problem is handled using ‘inactive relationships.
Managing relationships this way allows for a clean model, as it lets you slice data differently, and there's no need to clone the table twice.
COMMON MODELLING ISSUES TO WATCH OUT FOR.
As you start building, here are a few issues to avoid;
- Many-to-Many relationships (M:M) - This happens when you try to connect two tables that don't have a unique "one" side. It causes Power BI to get confused and often results in numbers that look correct but are actually completely wrong.
- Blanks in Keys– If your ID column has empty cells, they could break the relationship silently. That is why you should clean your data in Power Query before loading.
- Flat tables- Don’t try to keep everything in one table just because it feels like excel, you may have a hard time navigating or getting anything done.






Top comments (0)