DEV Community: Frederick M

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.

Frederick M — Wed, 01 Apr 2026 13:20:00 +0000

If you’ve ever felt confused about joins vs relationships, or why your Power BI report is giving incorrect totals, this is where data modeling comes in.

This guide breaks it down simply, with real-world examples and practical steps inside Power BI.

1. What is Data Modeling?

Data modeling is how you structure your data so Power BI can:

Understand relationships between tables
Aggregate data correctly
Perform fast and accurate calculations

Think of it like this:

Data modeling = organizing your data into a clean, logical system before analysis.

2. Joins in Power BI (Power Query)

Joins happen before data is loaded, inside Power Query. They physically combine tables.

Where to create joins:

Go to Home → Transform Data
Open Power Query
Select a table
Click Merge Queries
Choose another table and matching column(s)
Select join type

Types of Joins (with real-life examples)

1. INNER JOIN

Returns only matching records.

Example:

Customers table
Orders table

Only customers who placed orders appear.

Customers      Orders
A              A
B              B
C              -

Result:
A, B

2. LEFT JOIN (Most Common)

Returns all records from left + matches from right.

Example:
All customers, even those without orders.

Result:
A (order)
B (order)
C (null)

3. RIGHT JOIN

Opposite of LEFT JOIN.

Returns all records from the right table.

4. FULL OUTER JOIN

Returns everything from both tables.

Result:
A (match)
B (match)
C (left only)
D (right only)

5. LEFT ANTI JOIN

Returns rows in left table with no match in right.

Example:
Customers who NEVER ordered.

6. RIGHT ANTI JOIN

Returns rows in right table with no match in left.

When to use joins

Use joins when:

You need to combine data permanently
You're shaping raw data before modeling

3. Relationships in Power BI

Relationships are created after loading data. They do NOT merge tables, just connect them.

Where to create relationships:

Go to Model View
Drag a column from one table to another OR
Go to Manage Relationships → New

Types of Relationships

1. One-to-Many (1:M), Most Common

One side = unique values (Dimension)
Many side = repeated values (Fact)

Example:

Customers (unique IDs)
Orders (many orders per customer)

2. Many-to-Many (M:M)

Both sides contain duplicates.

Use carefully, can cause ambiguity.

3. One-to-One (1:1)

Rare. Both tables have unique keys.

Cardinality (IMPORTANT)

Defines how tables relate:

1 → Many
Many → Many
1 → 1

Cross Filter Direction

Controls how filters flow:

Single Direction

Filters flow one way (recommended)

Both Direction

Filters flow both ways (can cause confusion if misused)

Active vs Inactive Relationships

Active: Used by default
Inactive: Requires DAX (USERELATIONSHIP)

Example:

Order Date (active)
Ship Date (inactive)

4. Joins vs Relationships (Critical Difference)

Joins	Relationships
Done in Power Query	Done in Model View
Combines tables	Keeps tables separate
Static	Dynamic
Increases table size	More efficient

Rule of thumb:

Use relationships whenever possible. Avoid unnecessary joins.

5. Fact vs Dimension Tables

Fact Table

Contains measurable data
Large
Example:
- Sales Amount
- Quantity

Dimension Table

Contains descriptive data
Smaller
Example:
- Customer Name
- Product Category

6. Data Modeling Schemas

1. Star Schema (BEST PRACTICE ⭐)

        Customers
            |
Products — Sales — Dates

Fact table in center
Dimensions around it

Advantages:

Fast
Simple
Scalable

2. Snowflake Schema

Dimensions are normalized into multiple tables.

Product → Category → Department

Pros:

Less redundancy

Cons:

More complex
Slower

3. Flat Table (DLAT - Denormalized)

Everything in one table.

Pros:

Simple to start

Cons:

Poor performance
Hard to maintain

7. Role-Playing Dimensions

A single dimension used multiple times.

Example:
Date table used as:

Order Date
Ship Date
Delivery Date

Solution:

Duplicate the Date table
Create separate relationships

8. Common Data Modeling Issues

1. Many-to-Many confusion

Fix:

Introduce a bridge table

2. Circular relationships

Fix:

Remove unnecessary relationships

3. Incorrect totals

Fix:

Check relationship direction & cardinality

4. Duplicate keys in dimension table

Fix:

Ensure uniqueness

9. Step-by-Step Workflow (Practical)

Step 1: Load Data

Get Data → Import tables

Step 2: Clean Data (Power Query)

Remove duplicates
Fix data types
Create joins only if necessary

Step 3: Build Relationships

Go to Model View
Connect tables using keys

Step 4: Validate Model

Check:
- Cardinality
- Filter direction
- Active relationships

Step 5: Create Measures (DAX)

Example:

Total Sales = SUM(Sales[Amount])

Final Thoughts

If you remember nothing else, remember this:

Use Star Schema
Prefer relationships over joins
Keep dimension tables clean and unique
Avoid many-to-many unless necessary

Once your model is clean, everything else (DAX, visuals, performance) becomes easier.

How Linux is Used in Real-World Data Engineering

Frederick M — Fri, 27 Mar 2026 18:47:32 +0000

Linux is the backbone of modern data engineering. From running ETL pipelines on cloud servers to managing distributed systems like Hadoop and Spark, proficiency with the Linux command line is non‑negotiable. In this guide, we’ll walk through a realistic data‑engineering workflow on an Ubuntu server – the kind of tasks you’ll perform daily when managing data pipelines, securing sensitive files, and organising project assets.

We’ll cover:

Secure login to a remote server
Structuring a data project with version‑aware directories
Creating and manipulating data files (CSV, logs, scripts)
Copying, moving, renaming, and cleaning up files
Setting correct permissions to protect sensitive data
Navigating the file system and re‑using command history

1. Logging into a Linux Server

In the real world, data engineers rarely work on their local laptop. Most tasks happen on remote servers (on‑premises or in the cloud). The first step is to securely connect to it using SSH, and put in the password when prompted.

ssh root@143.110.224.135

After login it, should look like this

Once logged in, it’s good practice to confirm you are using the correct account. Data pipelines often run under dedicated service accounts, so knowing your user context matters.

whoami      # displays the current username

Next, verify your current working directory, which should indicate where you will start creating folders and files using the command pwd

pwd          # prints the current working directory

2. Folder and File Creation

A well‑organised directory structure is vital for any data project. Let’s create a main folder named after ourselves and inside it create subfolders for raw data, processed data, logs, and scripts.
We will then confirm our folders were created using ls

mkdir ~/fredrickMDataEngineering   # make a new folder
cd ~/fredrickMDataEngineering  # enter the folder
mkdir raw_data processed_data logs scripts # make multiple folders
ls   # check current working directory content

Now we need to create files that simulate real data engineering assets. Inside each folder, create the appropriate csv files using the "touch" command:

# Raw data
touch raw_data.csv

# Processed data
touch cleaned_data.csv

# Logs
touch logs.csv

# Scripts
touch scripts.csv

3. File Operations: Copy, Move, Rename, Delete

Data engineering involves frequent file manipulation, backing up raw data, moving files between stages, versioning assets, and cleaning up obsolete files.

Lets back by copying the raw CSV file as a precaution before processing using the cp command

cp raw_data.csv raw_data_backup.csv

Lets move a file to simulate data flow:

mv cleaned_data.csv logs/

Rename the processed file to indicate a version:

mv raw_data/sample_data.csv raw_data/sample_data_v1.csv

Deleting unnecessary files:

rm raw_data/raw_data_backup.csv

Managing Permissions for Security In production, sensitive files (e.g., credentials, raw PII data) must have strict permissions. Here’s how we secure our project.

Lets make the main directory accessible only by its owner (no one else can read, write, or execute). We use the "-R" flag to make the command affect all files and folders in the curent working directory

chmod 700 -R ~/fredrickMDataEngineering

We will also set permissions for sensitive files using this command

chmod 600 raw_data/raw_data.csv

To confirm if our permissions are set, use this command in the current working directory.

ls -l

Our ETL script needs execute permission to run, we will use this command to make the script executable

chmod +x scripts/scripts.csv

7. Navigation and Command History

Data engineers constantly move between directories. Use relative and absolute paths to navigate:

cd ~/fMwangiDataEngineering/scripts   # go to scripts folder
cd ../logs                               # move back to logs
cd ~                                    # back to home

Hidden files (e.g., .env for environment variables) are common in data projects. View them with:

ls -a

Your terminal history is a goldmine, it helps reproduce exact commands, and track actions within the terminal. We view it with this command:

history

To re‑run a previous command (e.g., command number 2104), use

!2104

8. Why These Skills Matter

What you just practised is a miniature version of real‑world data engineering:

Structured folders mirror how data lakes or data warehouses are organised.
File operations (copy, move, rename) simulate the stages of an ETL pipeline, from ingestion to transformation to archiving.
Permissions protect sensitive data and ensure only authorised users (or processes) can modify critical files.
Scripts automate repetitive tasks, and command history allows you to audit or replay steps.