Magichu Njoroge

Posted on Jun 30

UNDERSTANDING DATA MODELING, SCHEMAS, RELATIONSHIPS, AND JOINS.

#beginners #database #microsoft #tutorial

Data Modeling in Power BI: Schemas, Relationships, and Joins Explained

Learning Power BI was a very exciting part of this journey, and my first time on model view almost changed that. I genuinely had no idea what I was looking at. But with good tutoring, practice, and patience, I finally got the hang of it. Almost... With every wall or obstacle hit, I write about it for the next person and for my deeper understanding. This is data modeling as how I understand it.

What Is Data Modeling?

Data modeling is the process of creating visual representations of the connections between data structures, with information about the individual attributes contained within those data structures.

In simple terms, a data model answers three questions:

What tables do I have?
How do they relate to each other?
Which direction do filters travel between them?

Get this right and your visuals update instantly; your numbers are accurate, and adding new data is easy. Get it wrong and you end up with reports that show inflated totals, filters that don't work, and dashboards that are a nightmare to maintain.

A well-designed data model helps

Understand data requirements.
Ensure proper structure for reporting.
Align with organizational goals.
Maintain data integrity.

The Two Types of Tables: Fact and Dimension

Every data model is built from two kinds of tables. Understanding
the difference between them is the single most important foundation
in data modeling.

Fact Tables

The fact table is at the center of the star schema and stores the core transactional data you want to analyze, such as sales records, orders, or financial transactions. Each row in the fact table is unique and contains keys that link it to related dimension tables.

Fact tables are usually:

Very long (thousands or millions of rows)
Full of numbers (quantities, amounts, counts)
Connected to multiple other tables via ID columns

Dimension Tables

A dimension table is connected to the fact table and stores the context around the data, the who, what, when, and where. Customers. Products. Dates. Stores.

Dimension tables are usually

Short but wide (fewer rows, more descriptive columns)
Full of text and categories
The tables you filter and slice by in your reports

┌─────────────────────────────────────────────────────────────┐
│                  FACT vs DIMENSION                          │
├──────────────────────────┬──────────────────────────────────┤
│     FACT TABLE           │     DIMENSION TABLE              │
│     (Sales_Fact)         │     (Dim_Product)                │
├──────────────────────────┼──────────────────────────────────┤
│  SaleID       (PK)       │  ProductID     (PK)              │
│  CustomerID   (FK) ──────┼──▶ CustomerID                    │
│  ProductID    (FK) ──────┼──▶ ProductID                     │
│  DateID       (FK)       │  ProductName                     │
│  StoreID      (FK)       │  Category                        │
│  Quantity                │  Price                           │
│  SalesAmount             │  Supplier                        │
├──────────────────────────┼──────────────────────────────────┤
│  Rows: Millions          │  Rows: Hundreds or Thousands     │
│  Updates: Daily          │  Updates: Rarely                 │
│  Contains: Numbers       │  Contains: Descriptions          │
└──────────────────────────┴──────────────────────────────────┘

A good way to remember it: the fact table is what happened. The dimension table is everything around what happened.

Schemas: The Blueprint of Your Model

This is where most beginners, myself included, get lost first.
A schema refers to the structure and organization of data within a data model. Schemas define how data is connected and related within the model. There are two main types of schemas one interacts with in Power BI.

The Star Schema

The star schema is the recommended approach for Power BI. One fact table sits at the center. All dimension tables connect directly to it. When you step back and look at it, it resembles a star; the fact table is the center, and the dimensions are the points radiating outward.

                    ┌─────────────────┐
                    │   Dim_Date      │
                    │─────────────────│
                    │ DateID    (PK)  │
                    │ Day             │
                    │ Month           │
                    │ Quarter         │
                    │ Year            │
                    └────────┬────────┘
                             │ 1
                             │
       ┌─────────────────────┼─────────────────────┐
       │ 1                   │ N                   │ 1
┌──────┴──────────┐  ┌───────▼──────────┐  ┌──────┴──────────┐
│  Dim_Customer   │  │   Fact_Sales     │  │  Dim_Product    │
│─────────────────│  │──────────────────│  │─────────────────│
│ CustomerID (PK) ├─▶│ SaleID    (PK)   │◀─┤ ProductID  (PK) │
│ FullName        │  │ CustomerID (FK)  │  │ ProductName     │
│ Email           │  │ ProductID  (FK)  │  │ Category        │
│ County          │  │ DateID     (FK)  │  │ UnitPrice       │
│ Phone           │  │ StoreID    (FK)  │  │ Supplier        │
└─────────────────┘  │ Quantity         │  └─────────────────┘
                     │ SalesAmount      │
       ┌─────────────│ Discount         │
       │ 1           └──────────────────┘
┌──────┴──────────┐
│   Dim_Store     │
│─────────────────│
│ StoreID    (PK) │
│ StoreName       │
│ Town            │
│ Region          │
└─────────────────┘

Why is the star schema the gold standard?

Simplifies queries by clearly defining relationships between facts and dimensions.
Reduces data redundancy through organized dimension tables.
Improves performance for large datasets and complex analytics.
It is easy for anyone to look at the model and understand it.
Adding a new dimension table later is straightforward.

The Snowflake Schema

The snowflake schema is when the dimension table is split up into multiple related sub-tables. It is an extension of the star schema. A product dimension, for example,
might split into a separate category table and a separate supplier table.

┌──────────────────┐        ┌──────────────────┐
│  Dim_Category    │        │  Dim_Supplier    │
│──────────────────│        │──────────────────│
│ CategoryID  (PK) │        │ SupplierID  (PK) │
│ CategoryName     │        │ SupplierName     │
└────────┬─────────┘        └────────┬─────────┘
         │ 1                         │ 1
         │                           │
         ▼ N                         ▼ N
┌──────────────────────────────────────────────┐
│              Dim_Product                     │
│──────────────────────────────────────────────│
│ ProductID    (PK)                            │
│ ProductName                                  │
│ CategoryID   (FK) ──▶ Dim_Category           │
│ SupplierID   (FK) ──▶ Dim_Supplier           │
│ UnitPrice                                    │
└───────────────────────┬──────────────────────┘
                        │ 1
                        │
                        ▼ N
               ┌────────────────────┐
               │    Fact_Sales      │
               │────────────────────│
               │ SaleID      (PK)   │
               │ ProductID   (FK)   │
               │ CustomerID  (FK)   │
               │ SalesAmount        │
               └────────────────────┘

┌──────────────────────┬───────────────────────────┐
│                      │  STAR      │  SNOWFLAKE   │
├──────────────────────┼────────────┼──────────────┤
│ Query Speed          │ Faster     │ Slower       │
│ Ease of Use          │ Simpler    │ More complex │
│ Power BI Performance │ Ideal      │ Not ideal    |
│ Storage              │ More space │ Less space   │
└──────────────────────┴────────────┴──────────────┘

Stick with the star schema in Power BI. It is what Microsoft recommends and what most professional models use.

Relationships: Connecting Your Tables

A relationship is used to define how tables are linked to each other, which helps to analyze and visualize data across multiple tables seamlessly. There are several types of relationships, as it will be discussed later. They appear as literal lines with a number on each end.

Primary Keys and Foreign Keys

Every relationship is built on two column types:

Primary Key (PK) — uniquely identifies every row in a table. No duplicates. Example: ProductID in the Products table.
Foreign Key (FK) — a column in another table that references that primary key. Example: ProductID in the Sales table, pointing back to which product was sold.

  Dim_Product                        Fact_Sales
┌──────────────────────┐           ┌──────────────────────┐
│ ProductID  ← PK      │─────1─────│ ProductID  ← FK      │
│ ProductName          │           │ SaleID               │
│ Category             │     N     │ CustomerID           │
│ UnitPrice            │◀──────────│ Quantity             │
└──────────────────────┘           │ SalesAmount          │
                                   └──────────────────────┘
One product can appear in many sales rows — this is 1:N

Types of relationships

One-to-Many (1:N) — the most common

One row in the dimension table connects to many rows in the fact table. This is the backbone of every star schema.

  Dim_Customer                     Fact_Sales
┌─────────────────┐              ┌─────────────────┐
│ CustomerID: C01 │──────────────│ SaleID: 1001    │
│ Name: Wanjiru   │    1 : N     │ CustomerID: C01 │
└─────────────────┘       │      ├─────────────────┤
                           │      │ SaleID: 1002    │
                           └─────▶│ CustomerID: C01 │
                                  ├─────────────────┤
                                  │ SaleID: 1003    │
                                  │ CustomerID: C01 │
                                  └─────────────────┘
Wanjiru appears once as a customer but has made three purchases.

One-to-One (1:1) — rare

Each row in one table matches exactly one row in another. Used
mostly when splitting a very wide table for performance reasons.

  Dim_Employee               Dim_EmployeePrivate
┌──────────────────┐        ┌────────────────────────┐
│ EmployeeID: E01  │────────│ EmployeeID: E01        │
│ Name: Kamau      │  1:1   │ NationalID: 12345678   │
│ Department: IT   │        │ EmergencyContact: ...  │
└──────────────────┘        └────────────────────────┘

Many-to-Many (N:N) — handle carefully

Many rows in Table A match many rows in Table B. Power BI can
handle this, but it often leads to ambiguous results. The clean solution is a bridge table.

  Dim_Student              Dim_Course
┌───────────────┐         ┌──────────────────┐
│ StudentID     │         │ CourseID         │
│ StudentName   │         │ CourseName       │
└───────┬───────┘         └────────┬─────────┘
        │ 1                        │ 1
        │                          │
        ▼ N    Bridge_Enrollment   ▼ N
        └─────▶┌─────────────────┐◀┘
               │ StudentID  (FK) │
               │ CourseID   (FK) │
               │ EnrolledDate    │
               └─────────────────┘
The bridge table resolves the many-to-many into two 1:N links.

Joins: What Happens Behind the Scenes

When Power BI evaluates a visual that pulls data from multiple tables, it performs a join; it combines rows from two tables based on a shared column. You never write the join yourself in Power BI, but knowing what type of join is happening helps you understand why some rows appear and others don't.

Inner Join — Only Matching Rows

Returns only the rows that have a match in both tables. If a row in one table has no match in the other, it does not appear from the results.

  Fact_Sales             Dim_Product
┌───────────────────┐   ┌───────────────────┐
│ SaleID │ProductID │   │ ProductID │ Name  │
│ S001   │  P10     │   │ P10       │ Rice  │
│ S002   │  P20     │   │ P20       │ Sugar │
│ S003   │  P99     │   └───────────────────┘
└───────────────────┘

        INNER JOIN on ProductID
        ▼
┌──────────────────────────────────┐
│ SaleID  │ ProductID │ Name       │
│ S001    │ P10       │ Rice       │
│ S002    │ P20       │ Sugar      │
└──────────────────────────────────┘
Sale S003 disappears — ProductID P99 has no match in Dim_Product

Left Join — Keep Everything from the Left

Returns all rows from the left table, with matching data from the right table. Non-matching rows get a blank value rather than being dropped.

        LEFT JOIN on ProductID
        ▼
┌──────────────────────────────────┐
│ SaleID  │ ProductID │ Name       │
│ S001    │ P10       │ Rice       │
│ S002    │ P20       │ Sugar      │
│ S003    │ P99       │ (blank)    │ ← kept, but no product name
└──────────────────────────────────┘

In Power BI, when you define a relationship between two tables, it uses a Left Join by default, all rows from the dimension side are preserved, and fact rows without a matching dimension value show as blank rather than disappearing entirely.

Cross-Filter Direction: Which Way Do Filters Travel?

When you click on a value in one visual, say, you click "Nairobi" on a map, Power BI filters every other visual on the page. The direction that filter travels between tables is controlled by the cross-filter direction setting on each relationship.

Single Direction (Default)

Filters flow from the dimension table toward the fact table only.
This is the safe, recommended default.

  Dim_Product                        Fact_Sales
┌─────────────────────┐             ┌──────────────────────┐
│ Category = "Flour"  │────filter──▶│ Shows only Flour     │
│                     │  one way    │ sales rows           │
└─────────────────────┘             └──────────────────────┘
                     Filter does NOT travel back

Bidirectional

Filters flow both ways. This sounds useful but can create circular filter paths and slow your report down significantly.

  Dim_Product          ◀────────────▶        Fact_Sales
┌─────────────────────┐  both ways  ┌──────────────────────┐
│ ProductName         │◀───────────▶│ SalesAmount          │
└─────────────────────┘             └──────────────────────┘
Use only when you have a specific, tested reason to do so

One thing that caught me off guard while learning this was assuming that because Power BI drew the relationship lines automatically, the model was correct. Sometimes it connected the wrong columns. You should always double-check the relationships before going any further to prevent the waste of time and resources.

Quick Reference Cheat Sheet

┌─────────────────────────────────────────────────────────────┐
│           POWER BI DATA MODELING CHEAT SHEET                │
├─────────────────────┬───────────────────────────────────────┤
│ CONCEPT             │ WHAT IT MEANS                         │
├─────────────────────┼───────────────────────────────────────┤
│ Fact Table          │ Stores transactions — the numbers     │
│ Dimension Table     │ Stores context — the descriptions     │
│ Star Schema         │ Fact at center, dimensions around it  │
│ Snowflake Schema    │ Dimensions split into sub-tables      │
│ Primary Key (PK)    │ Uniquely identifies every row         │
│ Foreign Key (FK)    │ References a PK in another table      │
│ 1:N Relationship    │ One dimension row → many fact rows    │
│ N:N Relationship    │ Needs a bridge table to resolve       │
│ Inner Join          │ Only rows that match in both tables   │
│ Left Join           │ All rows from left + matches on right │
│ Single Filter       │ Filters flow dimension → fact         │
│ Bidirectional       │ Filters flow both ways (use carefully)│
├─────────────────────┴───────────────────────────────────────┤
│ 
└─────────────────────────────────────────────────────────────┘

As someone still early in this journey, I'd appreciate comments, suggestions, and corrections, as they are my real learning curve and the most valuable part of my journey.

DEV Community