DEV Community

Damaa-C
Damaa-C

Posted on

Practical Data Modeling in Power BI: Star and Snowflake Schemas Explained

Introduction

In Power BI projects, many reporting issues such as slow performance, incorrect totals, or complex DAX formulas often stem from one root cause; poor data modeling. While visuals and measures usually get most of the attention, the data model is the true foundation of any reliable analytical solution.

Data modeling is the process of structuring data in a way that supports efficient analysis, accurate relationships, and meaningful insights. In Power BI, this typically means organizing data into fact tables and dimension tables, following proven data warehousing principles.

As Ralph Kimball explains in The Data Warehouse Toolkit:
_

Dimensions provide the 'who, what, when, where, why, and how'
context surrounding business process events._

This article provides a practical, beginner-friendly guide to data modeling in Power BI, using Sales and Fact Budget CSV datasets as working examples. The explanations are guided by concepts from Ralph Kimball’s The Data Warehouse Toolkit and practical demonstrations inspired by the Pragmatic Works Power BI Data Modeling video.

Why Data Modeling Is Important in Power BI

  • Improved report performance
  • Simpler and more readable DAX measures
  • Accurate filtering and aggregations
  • Easier maintenance and scalability
  • Consistent business logic across reports

Power BI is not just a visualization tool; it is also an analytical engine. Without a proper model, even the best visuals can produce misleading results.

Business Scenario and Datasets Used

To demonstrate practical data modeling concepts, this article uses two related fact tables;Sales table represents actual transactional sales data and Fact Budget table represents planned or budgeted values.

These datasets allow analysis of actual performance versus planned targets, which is a common real-world business scenario. By modeling these tables correctly, we can compare revenue against budget, calculate variances, and evaluate performance trends over time.

Both fact tables share common descriptive data such as
products, dates, and markets, making them ideal for demonstrating star and snowflake schemas in Power BI.

How We Arrive at a Data Model in Power BI

Before any relationships are created in the Model view, a good data model begins in Power Query. This is where raw data is shaped, cleaned, and organized into fact and dimension tables. The steps below describe a practical, repeatable approach used in real-world Power BI projects.

Step 1: Load the Source Data

The process starts by loading the source files:

Open Power BI Desktop
Select Get Data
Choose the appropriate source e.g Text(CSV, Excel, database, etc.)

In this example, we load:

Sales data (actual transactions)

FactBudget data (planned or budget figures)

Once loaded, we select Transform Data to open Power Query.

Step 2: Identify Facts and Dimensions

With the data visible in Power Query, the next step is to determine Which tables represent business processes (facts)
and Which attributes describe those processes (dimensions)

The Sales and FactBudget tables are kept as fact tables, while descriptive fields such as Product, Date, Market, or Department are candidates for dimension tables.

This approach follows Kimball’s principle of separating measurements from descriptive context.

Step 3: Create Dimension Tables from Fact Data

Rather than importing separate dimension files, Power BI allows dimensions to be created directly from fact tables.

For each required dimension:

Duplicate the fact table (Right-click → Duplicate)

Rename the duplicated table (e.g., DimProduct, DimMarket)

Unselect all columns, then select only the columns relevant to that dimension

Use Remove Duplicates to ensure one unique row per dimension member

For example:

DimProduct keeps only Product-related columns

DimMarket keeps only Market or Region columns

This results in clean, compact dimension tables that connect efficiently to fact tables.

(Screenshot: Power Query showing duplicated table and removed columns)

Step 4: Creating a Proper Date Dimension (DimDate)

Using a dedicated Date dimension is a core data warehousing best practice. While Power BI allows the use of date columns directly from fact tables, this approach is limited and not recommended for analytical models.

A true Date dimension allows analysts to answer questions such as:

Is this date a weekday or weekend?

What month or quarter does it belong to?

How do holidays affect performance?

To achieve this, we create a Date table using Power Query, based on a function by Devin Knight.

Adding the Date Dimension Using Power Query

The steps are:

Go to Transform Data → Power Query Editor

Select Home → New Source → Blank Query

Open Advanced Editor

Paste the Date dimension function code

Rename the query (e.g., fnDimDate)

After saving the function:

Right-click the function → Invoke

Provide a Start Date and End Date

Power BI generates a full Date dimension table

This approach is based on the method described by Devin Knight: Creating a Date Dimension with Power Query (devinknightsql.com)

Why Use a Date Dimension Instead of a Fact Date Column?

Relying on a raw date column from a fact table limits analytical capability. A dedicated Date dimension:

Enables advanced time intelligence

Provides consistent filtering across multiple fact tables

Allows identification of weekdays, weekends, holidays, and fiscal periods

Improves model clarity and reusability

This is why both Sales and FactBudget tables connect to the same DimDate table in the model.

Understanding Fact Tables

Fact tables store quantitative, measurable data generated by business processes. In this example:

Sales Fact Table

The Sales table contains transactional metrics such as:

Sales amount

Quantity sold

Revenue

Profit

The grain of the Sales table is defined at the level of a specific transaction, typically by product, date, and market.

FactBudget Table

The FactBudget table stores planned or forecasted metrics such as:

Budgeted sales amount

Budgeted revenue

Unlike transactional data, budget data is often recorded at a higher level (for example, monthly or by department), which influences how it is modeled.

Dimension Tables: Providing Business Context

Dimension tables contain descriptive attributes that give meaning to numeric facts. According to Ralph Kimball:

“Dimensions provide the descriptive context for facts.”

Common dimensions used in this model include:

DimDate – when the transaction occurred

DimProduct – what was sold

DimCustomer / Market – who bought it and where

Dimensions answer critical business questions:

Who made the purchase?

What product was sold?

When did it occur?

Where did it happen?

How or why did it occur (channel, promotion, etc.)

Date Dimension Theory and Implementation

Using a dedicated Date dimension is a best practice in data modeling. Instead of relying on raw date columns from fact tables, a Date table provides:

Consistent time-based analysis

Support for time intelligence functions

Clear relationships across multiple fact tables

Typical Date Attributes

A Date dimension commonly includes:

DateKey (e.g., YYYYMMDD)

Full date

Year

Quarter

Month number

Month name

In Power BI, a Date table can be generated using DAX. This article includes a Date table created using custom date code, which is then related to both the Sales and FactBudget tables.

(Screenshot: Date table and its relationship to fact tables)

Surrogate Keys and Relationship Design

In analytical models, surrogate (index) keys are preferred over textual fields. Examples include:

DateKey

ProductKey

CustomerKey

Surrogate keys improve:

Performance

Relationship consistency

Integration across multiple fact tables

They are especially useful when combining data from different systems or when natural keys are inconsistent or complex.

Star Schema: Sales Data Model

The star schema is the most common and recommended modeling pattern in Power BI.

In the Sales model:

The Sales fact table sits at the center

Dimension tables surround it

Relationships are one-to-many

Filters flow from dimensions to facts

This structure simplifies reporting and ensures efficient query performance.

(Screenshot: Sales star schema in Power BI Model View)

Snowflake Schema: FactBudget Model

A snowflake schema occurs when dimension tables are further normalized into additional related tables.

In the FactBudget model:

Budget data may link to higher-level entities such as departments or regions

These dimensions connect to other dimension tables rather than directly to the fact table

While snowflake schemas add complexity, they are sometimes necessary, particularly for planning and budgeting data.

(Screenshot: Snowflake structure for FactBudget)

Integrating Star and Snowflake Schemas

Power BI allows multiple fact tables to coexist within a single model when they share common dimensions.

In this example:

Sales uses a star schema

FactBudget uses a snowflake schema

Both connect through shared dimensions such as Date and Product

This integration enables:

Actual vs budget comparisons

Variance analysis

Performance tracking across time and products

Most data engineering tools run on Linux. Tools such as Apache Hadoop, Spark, Kafka, Airflow, Docker, and Kubernetes are primarily designed for Linux environments.

Here’s why Linux matters:

  • Server dominance – Most servers in the cloud (AWS, Azure, GCP) run on Linux.
  • Performance and stability – Linux handles large-scale data processing efficiently.
  • Automation-friendly – Powerful command-line tools for scripting and scheduling jobs.
  • Open source – Free, customizable, and widely supported

As a data engineer, you will often:

  • Connect to Linux servers via SSH
  • Edit configuration files
  • Run data processing scripts
  • Monitor logs and system resources

Understanding Linux is therefore a core skill.

Understanding the Linux Terminal

The terminal, also called the command line or shell, allows you to interact with the Linux system by typing commands.
Example of terminal output;

damaris@ubuntu:~$
Enter fullscreen mode Exit fullscreen mode

This shows:

  • Username: damaris
  • Machine name: ubuntu
  • Current directory: ~ (home directory)

Basic Linux Commands for Beginners

1. Check for current directory

To check for current directory in the terminal, use the following command;

pwd
Enter fullscreen mode Exit fullscreen mode

Output: /home/damaris

2. List files and folders

To list files in terminal use this command;

ls # listing files
ls -l ## to list folders
Enter fullscreen mode Exit fullscreen mode

3. Create a directory

Use mkdir command to create a directory.

Example:

mkdir data_projects
Enter fullscreen mode Exit fullscreen mode

4. Navigate Between Directories

Use cd command to navigate through directories.
Let's use the file mkdir data_projects we created to navigate the directory.
For example;

cd data_projects
Enter fullscreen mode Exit fullscreen mode

To exit a directory, use cd ..
Example:

cd data_projects
ls #lists files in the directory
cd ..
Enter fullscreen mode Exit fullscreen mode

5. Create a File

Use touch command to create a file.
Example;

touch sample.txt
Enter fullscreen mode Exit fullscreen mode

6. View File Content

To view contexts or content of a file, use cat command.

cat sample.txt
Enter fullscreen mode Exit fullscreen mode

Why Text Editors Matter in Data Engineering

As a data engineer, you will constantly edit SQL scripts, Python files, Shell scripts and Configuration files (YAML, JSON, .conf)

Linux provides powerful terminal-based text editors. The most common are Vi/Vim **and **Nano.

Using the Nano Editor in Ubuntu (Detailed Beginner Guide)

What Is Nano?

Nano is a simple, beginner-friendly text editor that runs inside the Ubuntu terminal. Unlike Vi/Vim, Nano does not use modes, which makes it much easier for new Linux users to learn and use.

For data engineers, Nano is commonly used to:

  • Edit configuration files (.conf, .yaml, .json)
  • Write quick notes or scripts
  • Modify ETL pipeline settings
  • Edit files on remote Linux servers

Nano is preinstalled on most Ubuntu systems, so you don’t need to install anything.

Opening the Terminal in Ubuntu

Before using Nano, you need to open the terminal.

You can do this in any of the following ways:

Press Ctrl + Alt + T

Search for Terminal in the Applications menu

You will see something like:

damaris@ubuntu:~$
Enter fullscreen mode Exit fullscreen mode

This means you are in your home directory.
nano data_notes.txt

Creating a File Using Nano

To create a new file using Nano, type:

nano data_notes.txt
Enter fullscreen mode Exit fullscreen mode

Then press Enter.

What Happens Next?

If the file does not exist, Nano creates it.

If the file already exists, Nano opens it for editing.

You will now see the Nano editor screen.

Understanding the Nano Editor Interface

When Nano opens, the screen has three main parts:

1. Main Editing Area (Center)

This is where you type your text.

Example:

This file contains notes for our data engineering project.
Source: MySQL
Destination: Data Warehouse
Enter fullscreen mode Exit fullscreen mode

2. Status Bar (Bottom)

At the bottom of the screen, you’ll see something like:

^G Get Help   ^O Write Out   ^W Where Is   ^K Cut   ^X Exit

Enter fullscreen mode Exit fullscreen mode

The ^ symbol means the Ctrl key.

So:

^O means Ctrl + O

^X means Ctrl + X

This shortcut list is one of Nano’s biggest advantages.

3. File Name Display (Top)

At the top, Nano shows the file name you are editing:

GNU nano 6.2         data_notes.txt
Enter fullscreen mode Exit fullscreen mode

Typing Text in Nano

Nano starts in editing mode immediately.

You can begin typing right away without pressing any special keys.

Example:

ETL Pipeline Notes
------------------
Extract data from PostgreSQL
Transform data using Python
Load data into the warehouse

Enter fullscreen mode Exit fullscreen mode

There is no insert mode or command mode like in Vi.

Saving a File in Nano

To save your work:

Press Ctrl + O (Write Out)

Nano will ask:

File Name to Write: data_notes.txt

Press Enter

Your file is now saved.

Exiting Nano

To exit Nano:

Ctrl + X

If You Have Unsaved Changes

Nano will ask:

Save modified buffer?

**
Press Y** → Save changes

Press N → Exit without saving

*Press Ctrl + C *→ Cancel exit

Opening an Existing File with Nano

To edit an existing file:

nano data_notes.txt
Enter fullscreen mode Exit fullscreen mode

This opens the file so you can modify it.

Editing Text in Nano

Moving the Cursor

You can move around using:

  • Arrow keys ↑ ↓ ← →
  • Page Up / Page Down

Deleting Text

  • Backspace → Delete previous character
  • Delete key → Delete next character

Cutting and Pasting Text

Cut a Line
Ctrl + K

This cuts the entire line.

Paste a Line

Ctrl + U

This pastes the last cut text.

Searching for Text in Nano

To search within a file:
Ctrl + W

Type the word you want to find and press Enter.

Example:

warehouse
Enter fullscreen mode Exit fullscreen mode

Practical Example: Editing a Configuration File

Imagine you are a data engineer editing a pipeline configuration file.
**
Step 1: Open the file**

nano etl_config.conf
Enter fullscreen mode Exit fullscreen mode

**
Step 2: Add configuration details**

source_database=mysql
source_host=localhost
destination=warehouse
batch_size=500
Enter fullscreen mode Exit fullscreen mode

**
Step 3: Save and exit**

Ctrl + O → Enter

Ctrl + X
Enter fullscreen mode Exit fullscreen mode

Viewing the File from Terminal

After exiting Nano, you can confirm the file content using:

cat etl_config.conf

Output:

source_database=mysql
source_host=localhost
destination=warehouse
batch_size=500

Common Nano Shortcuts (Beginner Must-Know)
Shortcut Action
Ctrl + O Save file
Ctrl + X Exit Nano
Ctrl + K Cut line
Ctrl + U Paste
Ctrl + W Search
Ctrl + G Help

Top comments (0)