Introduction
In Power BI projects, many reporting issues such as slow performance, incorrect totals, or complex DAX formulas often stem from one root cause; poor data modeling. While visuals and measures usually get most of the attention, the data model is the true foundation of any reliable analytical solution.
Data modeling is the process of structuring data in a way that supports efficient analysis, accurate relationships, and meaningful insights. In Power BI, this typically means organizing data into fact tables and dimension tables, following proven data warehousing principles.
As Ralph Kimball explains in The Data Warehouse Toolkit:
_
Dimensions provide the 'who, what, when, where, why, and how'
context surrounding business process events._
This article provides a practical, beginner-friendly guide to data modeling in Power BI, using Sales and Fact Budget CSV datasets as working examples. The explanations are guided by concepts from Ralph Kimball’s The Data Warehouse Toolkit and practical demonstrations inspired by the Pragmatic Works Power BI Data Modeling video.
Why Data Modeling Is Important in Power BI
- Improved report performance
- Simpler and more readable DAX measures
- Accurate filtering and aggregations
- Easier maintenance and scalability
- Consistent business logic across reports
Power BI is not just a visualization tool; it is also an analytical engine. Without a proper model, even the best visuals can produce misleading results.
Business Scenario and Datasets Used
To demonstrate practical data modeling concepts, this article uses two related fact tables;Sales table represents actual transactional sales data and Fact Budget table represents planned or budgeted values.
These datasets allow analysis of actual performance versus planned targets, which is a common real-world business scenario. By modeling these tables correctly, we can compare revenue against budget, calculate variances, and evaluate performance trends over time.
Both fact tables share common descriptive data such as
products, dates, and markets, making them ideal for demonstrating star and snowflake schemas in Power BI.
How We Arrive at a Data Model in Power BI
Before any relationships are created in the Model view, a good data model begins in Power Query. This is where raw data is shaped, cleaned, and organized into fact and dimension tables. The steps below describe a practical, repeatable approach used in real-world Power BI projects.
Step 1: Load the Source Data
The process starts by loading the source files:
Open Power BI Desktop
Select Get Data
Choose the appropriate source e.g Text(CSV, Excel, database, etc.)
In this example, we load:
Sales data (actual transactions)
FactBudget data (planned or budget figures)
Once loaded, we select Transform Data to open Power Query.
Step 2: Identify Facts and Dimensions
With the data visible in Power Query, the next step is to determine Which tables represent business processes (facts)
and Which attributes describe those processes (dimensions)
The Sales and FactBudget tables are kept as fact tables, while descriptive fields such as Product, Date, Market, or Department are candidates for dimension tables.
This approach follows Kimball’s principle of separating measurements from descriptive context.
Step 3: Create Dimension Tables from Fact Data
Rather than importing separate dimension files, Power BI allows dimensions to be created directly from fact tables.
For each required dimension:
Duplicate the fact table (Right-click → Duplicate)
Rename the duplicated table (e.g., DimProduct, DimMarket)
Unselect all columns, then select only the columns relevant to that dimension
Use Remove Duplicates to ensure one unique row per dimension member
For example:
DimProduct keeps only Product-related columns
DimMarket keeps only Market or Region columns
This results in clean, compact dimension tables that connect efficiently to fact tables.
(Screenshot: Power Query showing duplicated table and removed columns)
Step 4: Creating a Proper Date Dimension (DimDate)
Using a dedicated Date dimension is a core data warehousing best practice. While Power BI allows the use of date columns directly from fact tables, this approach is limited and not recommended for analytical models.
A true Date dimension allows analysts to answer questions such as:
Is this date a weekday or weekend?
What month or quarter does it belong to?
How do holidays affect performance?
To achieve this, we create a Date table using Power Query, based on a function by Devin Knight.
Adding the Date Dimension Using Power Query
The steps are:
Go to Transform Data → Power Query Editor
Select Home → New Source → Blank Query
Open Advanced Editor
Paste the Date dimension function code
Rename the query (e.g., fnDimDate)
After saving the function:
Right-click the function → Invoke
Provide a Start Date and End Date
Power BI generates a full Date dimension table
This approach is based on the method described by Devin Knight: Creating a Date Dimension with Power Query (devinknightsql.com)
Why Use a Date Dimension Instead of a Fact Date Column?
Relying on a raw date column from a fact table limits analytical capability. A dedicated Date dimension:
Enables advanced time intelligence
Provides consistent filtering across multiple fact tables
Allows identification of weekdays, weekends, holidays, and fiscal periods
Improves model clarity and reusability
This is why both Sales and FactBudget tables connect to the same DimDate table in the model.
Understanding Fact Tables
Fact tables store quantitative, measurable data generated by business processes. In this example:
Sales Fact Table
The Sales table contains transactional metrics such as:
Sales amount
Quantity sold
Revenue
Profit
The grain of the Sales table is defined at the level of a specific transaction, typically by product, date, and market.
FactBudget Table
The FactBudget table stores planned or forecasted metrics such as:
Budgeted sales amount
Budgeted revenue
Unlike transactional data, budget data is often recorded at a higher level (for example, monthly or by department), which influences how it is modeled.
Dimension Tables: Providing Business Context
Dimension tables contain descriptive attributes that give meaning to numeric facts. According to Ralph Kimball:
“Dimensions provide the descriptive context for facts.”
Common dimensions used in this model include:
DimDate – when the transaction occurred
DimProduct – what was sold
DimCustomer / Market – who bought it and where
Dimensions answer critical business questions:
Who made the purchase?
What product was sold?
When did it occur?
Where did it happen?
How or why did it occur (channel, promotion, etc.)
Date Dimension Theory and Implementation
Using a dedicated Date dimension is a best practice in data modeling. Instead of relying on raw date columns from fact tables, a Date table provides:
Consistent time-based analysis
Support for time intelligence functions
Clear relationships across multiple fact tables
Typical Date Attributes
A Date dimension commonly includes:
DateKey (e.g., YYYYMMDD)
Full date
Year
Quarter
Month number
Month name
In Power BI, a Date table can be generated using DAX. This article includes a Date table created using custom date code, which is then related to both the Sales and FactBudget tables.
(Screenshot: Date table and its relationship to fact tables)
Surrogate Keys and Relationship Design
In analytical models, surrogate (index) keys are preferred over textual fields. Examples include:
DateKey
ProductKey
CustomerKey
Surrogate keys improve:
Performance
Relationship consistency
Integration across multiple fact tables
They are especially useful when combining data from different systems or when natural keys are inconsistent or complex.
Star Schema: Sales Data Model
The star schema is the most common and recommended modeling pattern in Power BI.
In the Sales model:
The Sales fact table sits at the center
Dimension tables surround it
Relationships are one-to-many
Filters flow from dimensions to facts
This structure simplifies reporting and ensures efficient query performance.
(Screenshot: Sales star schema in Power BI Model View)
Snowflake Schema: FactBudget Model
A snowflake schema occurs when dimension tables are further normalized into additional related tables.
In the FactBudget model:
Budget data may link to higher-level entities such as departments or regions
These dimensions connect to other dimension tables rather than directly to the fact table
While snowflake schemas add complexity, they are sometimes necessary, particularly for planning and budgeting data.
(Screenshot: Snowflake structure for FactBudget)
Integrating Star and Snowflake Schemas
Power BI allows multiple fact tables to coexist within a single model when they share common dimensions.
In this example:
Sales uses a star schema
FactBudget uses a snowflake schema
Both connect through shared dimensions such as Date and Product
This integration enables:
Actual vs budget comparisons
Variance analysis
Performance tracking across time and products
Most data engineering tools run on Linux. Tools such as Apache Hadoop, Spark, Kafka, Airflow, Docker, and Kubernetes are primarily designed for Linux environments.
Here’s why Linux matters:
- Server dominance – Most servers in the cloud (AWS, Azure, GCP) run on Linux.
- Performance and stability – Linux handles large-scale data processing efficiently.
- Automation-friendly – Powerful command-line tools for scripting and scheduling jobs.
- Open source – Free, customizable, and widely supported
As a data engineer, you will often:
- Connect to Linux servers via SSH
- Edit configuration files
- Run data processing scripts
- Monitor logs and system resources
Understanding Linux is therefore a core skill.
Understanding the Linux Terminal
The terminal, also called the command line or shell, allows you to interact with the Linux system by typing commands.
Example of terminal output;
damaris@ubuntu:~$
This shows:
- Username: damaris
- Machine name: ubuntu
- Current directory: ~ (home directory)
Basic Linux Commands for Beginners
1. Check for current directory
To check for current directory in the terminal, use the following command;
pwd
Output: /home/damaris
2. List files and folders
To list files in terminal use this command;
ls # listing files
ls -l ## to list folders
3. Create a directory
Use mkdir command to create a directory.
Example:
mkdir data_projects
4. Navigate Between Directories
Use cd command to navigate through directories.
Let's use the file mkdir data_projects we created to navigate the directory.
For example;
cd data_projects
To exit a directory, use cd ..
Example:
cd data_projects
ls #lists files in the directory
cd ..
5. Create a File
Use touch command to create a file.
Example;
touch sample.txt
6. View File Content
To view contexts or content of a file, use cat command.
cat sample.txt
Why Text Editors Matter in Data Engineering
As a data engineer, you will constantly edit SQL scripts, Python files, Shell scripts and Configuration files (YAML, JSON, .conf)
Linux provides powerful terminal-based text editors. The most common are Vi/Vim **and **Nano.
Using the Nano Editor in Ubuntu (Detailed Beginner Guide)
What Is Nano?
Nano is a simple, beginner-friendly text editor that runs inside the Ubuntu terminal. Unlike Vi/Vim, Nano does not use modes, which makes it much easier for new Linux users to learn and use.
For data engineers, Nano is commonly used to:
- Edit configuration files (.conf, .yaml, .json)
- Write quick notes or scripts
- Modify ETL pipeline settings
- Edit files on remote Linux servers
Nano is preinstalled on most Ubuntu systems, so you don’t need to install anything.
Opening the Terminal in Ubuntu
Before using Nano, you need to open the terminal.
You can do this in any of the following ways:
Press Ctrl + Alt + T
Search for Terminal in the Applications menu
You will see something like:
damaris@ubuntu:~$
This means you are in your home directory.
nano data_notes.txt
Creating a File Using Nano
To create a new file using Nano, type:
nano data_notes.txt
Then press Enter.
What Happens Next?
If the file does not exist, Nano creates it.
If the file already exists, Nano opens it for editing.
You will now see the Nano editor screen.
Understanding the Nano Editor Interface
When Nano opens, the screen has three main parts:
1. Main Editing Area (Center)
This is where you type your text.
Example:
This file contains notes for our data engineering project.
Source: MySQL
Destination: Data Warehouse
2. Status Bar (Bottom)
At the bottom of the screen, you’ll see something like:
^G Get Help ^O Write Out ^W Where Is ^K Cut ^X Exit
The ^ symbol means the Ctrl key.
So:
^O means Ctrl + O
^X means Ctrl + X
This shortcut list is one of Nano’s biggest advantages.
3. File Name Display (Top)
At the top, Nano shows the file name you are editing:
GNU nano 6.2 data_notes.txt
Typing Text in Nano
Nano starts in editing mode immediately.
You can begin typing right away without pressing any special keys.
Example:
ETL Pipeline Notes
------------------
Extract data from PostgreSQL
Transform data using Python
Load data into the warehouse
There is no insert mode or command mode like in Vi.
Saving a File in Nano
To save your work:
Press Ctrl + O (Write Out)
Nano will ask:
File Name to Write: data_notes.txt
Press Enter
Your file is now saved.
Exiting Nano
To exit Nano:
Ctrl + X
If You Have Unsaved Changes
Nano will ask:
Save modified buffer?
**
Press Y** → Save changes
Press N → Exit without saving
*Press Ctrl + C *→ Cancel exit
Opening an Existing File with Nano
To edit an existing file:
nano data_notes.txt
This opens the file so you can modify it.
Editing Text in Nano
Moving the Cursor
You can move around using:
- Arrow keys ↑ ↓ ← →
- Page Up / Page Down
Deleting Text
- Backspace → Delete previous character
- Delete key → Delete next character
Cutting and Pasting Text
Cut a Line
Ctrl + K
This cuts the entire line.
Paste a Line
Ctrl + U
This pastes the last cut text.
Searching for Text in Nano
To search within a file:
Ctrl + W
Type the word you want to find and press Enter.
Example:
warehouse
Practical Example: Editing a Configuration File
Imagine you are a data engineer editing a pipeline configuration file.
**
Step 1: Open the file**
nano etl_config.conf
**
Step 2: Add configuration details**
source_database=mysql
source_host=localhost
destination=warehouse
batch_size=500
**
Step 3: Save and exit**
Ctrl + O → Enter
Ctrl + X
Viewing the File from Terminal
After exiting Nano, you can confirm the file content using:
cat etl_config.conf
Output:
source_database=mysql
source_host=localhost
destination=warehouse
batch_size=500
Common Nano Shortcuts (Beginner Must-Know)
Shortcut Action
Ctrl + O Save file
Ctrl + X Exit Nano
Ctrl + K Cut line
Ctrl + U Paste
Ctrl + W Search
Ctrl + G Help
Top comments (0)