DEV Community

Cover image for Data Preparation in Power BI: Cleaning, Transforming, and Loading Data for Real-World Analytics
Dreez
Dreez

Posted on

Data Preparation in Power BI: Cleaning, Transforming, and Loading Data for Real-World Analytics

Introduction

Before any dashboard delivers value, there’s a quiet but critical phase every analyst must master data preparation.

In Microsoft Power BI, this happens inside the Power Query Editor, where raw, messy, inconsistent data is shaped into something reliable and analysis-ready.

If you’re building a career in data analytics, this is not optional.

Dashboards don’t fail because of visuals, they fail because of poor data preparation.

In real-world scenarios, data comes in different formats (Excel, CSV, databases)

  • With missing values and inconsistencies
  • Without proper structure for analysis

Your role as an analyst is to clean, transform, and structure that data before it ever reaches a report.

What You’ll Learn

This guide walks you through the core transformation techniques in Power BI, with practical explanations and examples you can apply immediately:

  • Cleaning messy datasets
  • Creating calculated and conditional columns
  • Reshaping data (pivot/unpivot)
  • Combining datasets (merge/append)
  • Profiling data quality
  • Handling nulls and duplicates

💡 Career Insight:

Power Query skills are one of the most underrated but most demanded competencies in analytics roles.

1. Working with Power Query Editor

How to Access

  • Open Power BI Desktop
  • Click Home → Transform Data

Image1

This will take you directly to the Power Query Editor.

Image2

Key Components

  1. Queries Pane → List of tables
  2. Data Preview Grid → Your working dataset
  3. Applied Steps → Tracks every transformation
  4. Ribbon Tools → Transform, Add Column, View

💡

Every action you take is recorded, making your workflow repeatable and automated.

2. Identifying Column Headers and Names

Talking about data transformation and cleaning in Power Query, identifying column headers and names is considered the very first step.

Once your data is loaded into Microsoft Power BI, the Power Query Editor treats every row as part of the dataset by default. That’s why identifying the correct column headers and ensuring they are properly named is a critical first step. It helps structure your data correctly and makes further transformation and analysis much easier to understand and manage.

In most datasets, the column headers are typically located in the first row. Recognizing this early is an important step in preparing your data for transformation and reshaping. Once identified, these values should be promoted to serve as the official column headers within the table in Microsoft Power BI.

Doing this ensures your dataset is properly structured, making it easier to apply further transformations, perform analysis, and maintain clarity throughout your workflow.

As shown in the image, the highlighted first row is automatically treated as the header by the Power Query Editor in Microsoft Power BI. However, the actual column headers for the dataset are located in the second row.

This means the second row needs to be promoted to replace the default headers so that each column is accurately named, ensuring proper data structure and making subsequent transformations more meaningful and easier to apply.

Image1

A. How to Promote First Row to Header

  1. Select the drop-down button next to Column1 and
  2. Select Use First Row as Headers.

Image2

B. How to Rename Column Header

  1. Click on column name
  2. Select Transform from the Ribbon tray and
  3. Click on the rename option
  4. Edit name as desired.

Image 3

3. Creating Index Columns

In Microsoft Power BI, column indexing involves adding a new column that assigns a unique sequential number to each row in a dataset. This is especially useful during data preparation and transformation in the Power Query Editor.

Use cases:

  • Creating unique row identifiers: An index column provides a simple way to assign a unique ID to each row, which is helpful for tracking records, and creating relationship between tables, especially when the dataset lacks a primary key.

  • Ranking or ordering data: Indexing allows you to establish a clear order within your dataset. This can be used to rank values, maintain sorting positions, or create custom sequences for analysis.

How to add column index

  1. Go to Add Column tab
  2. Select Index Column
  3. Choose: From 0 or 1 or Custom

Note:

From 0: The numbering starts at 0 and increases sequentially (0, 1, 2, 3…).
From 1: The numbering starts at 1 and continues sequentially (1, 2, 3, 4…).
Custom: This option allows you to define your own starting point and increment (step value), giving you full control over how the sequence is generated.

Image4

💡 Useful when your dataset lacks a natural key.

4. Adding Column from Examples

In Microsoft Power BI, “Column from Examples” is a smart feature in the Power Query Editor that automatically creates a new column based on patterns you provide.

What it means:
You simply type a few example values of the result you want, and Power BI detects the pattern and generates the transformation for the entire column.

How it works (conceptually):

You provide sample outputs based on existing columns
Power Query analyzes the pattern
It applies the detected logic to all rows automatically

Use cases:

Extracting specific text (e.g., first names from full names)
Combining columns (e.g., first name + last name)
Formatting values (e.g., dates, codes, or labels)

Why it’s useful:
It reduces the need to write complex formulas manually, making data transformation faster, more intuitive, and beginner-friendly.

How to Add Column from Examples

  1. Select the Column for use case
  2. Click on "Add Column ribbon"
  3. Select Column from Examples
  4. click the "from selection" option

Image1

  1. Double-click on the column provided to enter desired example format and click OK.

Image2

PowerBI will replicates format through all rows down the column as shown below.

Image3

5. Grouping and Aggregation

In Microsoft Power BI, grouping and aggregation are key techniques used in the Power Query Editor to summarize and organize data for meaningful analysis.

Grouping involves combining rows that share common values in one or more columns, while aggregation applies calculations (like sum, count, or average) to those grouped rows.

How it works (conceptually):

Select a column to group by (e.g., Region, Product, or Category)
Apply an aggregation (e.g., total sales, number of transactions)
Power Query produces a summarized table based on those groups

Common aggregation operations:

  1. Sum: Total of numeric values
  2. Count: Number of rows
  3. Average: Mean value
  4. Min/Max: Smallest or largest value

How to Use Aggregation and Grouping

  1. Select column
  2. Click on Transform
  3. Click Group By
  4. Enter Desired column name for output
  5. Choose desired aggregation operation: Sum
  6. Choose column for operation execution ()
  7. Click OK.

In the example shown below we chose the SUM Aggregation.

Note: We aim to aggregate sales(quantity sold) by year from the sample data used.

Image1

Image2

💡 This creates summarized tables for analysis.

Image3

Note: In Microsoft Power BI, data transformations such as grouping and aggregation do not alter the original dataset but create a refined version through applied steps in the Power Query Editor. Once these transformations are completed, selecting “Close & Apply” loads the processed data into the data model. This transformed dataset is then used in the Report View to build visuals, ensuring that analysis is based on clean, structured, and meaningful data rather than the raw source.

6. Pivoting and Unpivoting Columns

In Microsoft Power BI pivoting and Unpivoting are essential data shaping techniques in the Power Query Editor used to reorganize how your data is structured—either by spreading values across columns or consolidating them into rows.

Hint: Pivot turns rows into columns, Unpivot turns columns into rows

A. Unpivoting Columns
Unpivoting converts multiple columns into rows of one column.

Image2

What it does: Unpivot columns turns multiple columns into attribute-value pairs and Creates a more flexible, normalized structure

Use cases:

  • Preparing data for analysis and visualization
  • Making datasets compatible with charts and DAX calculations
  • Converting wide data into a long format

Why It Matters
Prepares data for visualization
Converts wide data → long format (best for Power BI)

💡 Most dashboards rely on unpivoted (normalized) data.

How to Unpivot Columns

  1. Multi-select columns
  2. Click on Transform
  3. Click on Unpivot
  4. Select Unpivot columns

Image1

Result
Image2

B. Pivot Column

Does the opposite of Unpivot columns

  • Converts row values from one column into separate column headers
  • Aggregates values where necessary

Example:
Turning a “Month” column into separate columns like Jan, Feb, Mar with their corresponding sales values.

Image1

Use cases:

  • Creating summarized reports
  • Converting long data into a wide format
  • Preparing data for dashboard visuals

How to Pivot Columns

Using the transformed data from the Unpivot section, It is observed that the "product sold" and "profit" tag are unpivoted into a single column. Hence this suits the scenario for pivoting.
Image2

  1. Select column
  2. Click on transform
  3. Click on Pivot Column
  4. Select Value from the values column drop down
  5. Click OK.

Image2

Power BI coverts the distinct column values into headers as shown below:

Image3

7. Merging and Appending Queries in Microsoft Power BI

When working with multiple datasets in the Power Query Editor, merging and appending are essential techniques used to combine data in meaningful ways.

A. Merging Queries (Horizontal Combination)
Merging combines two tables based on a common column (like an ID), similar to a SQL join.

Use cases:

  • Adding customer details to a sales table
  • Combining related datasets from different sources
  • Enriching data with additional attributes

For this example, we will use the “Employees Demographics” table as the primary dataset. The objective is to merge it with the “Employee Salary” table in order to incorporate salary information into the dataset.

How to Merge Queries

  1. Select a primary table (employee_demographics)
  2. Click on Home Tab
  3. Click on Merge Queries on the far right of the Home Tab
  4. Choose either "Merge Queries" or "Merge Queries as New" base on preferred result. In this case we chose "Merge Queries as New"

Image1

  1. Choose a related table(employee_salary)

Image2

  1. Match them using a common field (e.g., Employee ID)

Image3

  1. Expand the merged column to bring in needed field(s) and click OK

Image4

Result

Image5

Note;
Types of joins available:

  • Inner join
  • Left outer join (most common)
  • Right outer join
  • Full outer join

B. Appending Queries (Vertical Combination)
Appending queries is the process of combining two or more tables by stacking them on top of each other, resulting in a single table with more rows.

What it does:

  1. Adds rows from one table to another
  2. Keeps the column structure consistent
  3. Matches columns based on their names

Use cases:

  • Combining monthly or yearly datasets into one table
  • Merging data from different departments or locations
  • Building a complete dataset from multiple sources

> Key idea:

  • Ensure the tables have similar or identical column structures
  • Appending increases the number of rows, not columns—making it ideal for consolidating similar datasets into a single, comprehensive table for analysis.

How to Append Queries

For this examples we will be making use of two similar structured tables, CodeSphere Hub Sales - 2019 and CodeSphere Hub Sales - 2020.

  1. Select the primary table (CodeSphere Hub Sales - 2019).
  2. Click on the Home Tab
  3. Select Append Queries on the far right of the Home Tab.
  4. Choose either "Append Queries" or "Append Queries as New" base on preferred result. In this case we chose "Append Queries as New"

Image1

  1. Select the other table to append. In this case we are using(CodeSphere Hub Sales - 2020).

Image2

  1. Click Ok.

Image3

Result

Image4

Note: This New query can be renamed and used further.

8. Date and Time Transformations in Microsoft Power BI

Date and time transformations in the Power Query Editor involve modifying, extracting, or formatting date/time values to make them more useful for analysis and reporting. These transformations involve Working with date and time data to:

A. Change formats (e.g., from text to date)
B. Extract specific components
C. Perform calculations using dates

A. Changing Data Types: This involves converting columns into proper Date, Time, or Date/Time formats to enable accurate analysis.

How to

  1. Select the date related field
  2. click on the left icon beside the header name
  3. select appropriate date format that aligns with the field entries.

Image1

B. Extracting Date Parts
Break down a date into components such as: Year, Quarter, Month, Day, Day of Week.

How to

  1. Select Date Column/Field.
  2. Click on "Add Column Tab"
  3. Click on the calendar icon
  4. select desired component to be extracted

Image1

Result

Image2

Note: Power BI created an extracted field "Month" base on our selected preference of extraction.

C. Calculating Durations
This transformation involves finding the differences between dates (e.g., number of days between two events).

Case Study: In this scenario, the goal is to calculate the time difference between the stock date and the transaction date using the CodeSphere Hub Sales – 2019 dataset.

How to

  1. Ensure Correct Data Types both columns (Stock Date and Transaction Date)data type should be set to Date.

Select column >> Transformation >> Data type >> Date

Image1

  1. Add a Custom Column Go to Add Column tab → Custom Column

Enter a formula like:

= [Transaction Date] - [Stock Date]

Image2
And click OK.

  1. Convert to Number of Days The result will be in duration format Select the Resulting Column → Go to Transform tab → Duration → Days

Image3

9. Adding Prefix and Suffix in Microsoft Power BI

Adding a prefix or suffix is a simple text transformation in the Power Query Editor used to modify column values by attaching additional text at the beginning or end.

What it means
Prefix: Text added before existing values
Suffix: Text added after existing values

Use cases

  • Creating standardized IDs (e.g., EMP-001)
  • Adding country or region codes
  • Formatting labels for better readability

For example, this simple transformation can be used to standardize values such as employee IDs or country codes.

  • Prefix “ID-” → ID-12345
  • Suffix “-NG” → 12345-NG

However, in this case study, we will apply a suffix to the Days column created from our earlier date transformation. This means each value will be formatted to include a label, such as 180-days, to improve clarity and readability.

How to

  1. Select the column you want to modify
  2. Go to Transform tab → Format
  3. Choose: Add Prefix → enter text to appear before values 0r Add Suffix → enter text to appear after values

Image1

  1. Enter desired text and click OK.

Image2

> Result

Image3

10. Data Profiling Techniques in Microsoft Power BI

Data profiling involves examining your dataset to understand its structure, quality, and overall condition before analysis. In the Power Query Editor, it helps you quickly detect errors, inconsistencies, and patterns.

There are 3 major data profiling tools used by professionals. These tools help professionals detect issues before analysis.

  1. Column quality: This helps professionals quickly identify data issues in each column. It displays the valid values, errors, empty values

  2. Column distribution: This useful for spotting duplicates or unusual patterns in column data. Shows how values are spread within a column, including:

  3. Distinct values

  4. Unique values

  5. Frequency distribution (histogram view)

  6. Column profile: This provides detailed statistics such as:

  7. Minimum and maximum values

  8. Average

  9. Count of values

  10. Data type distribution

How to Enable Data Profiling
Go to View tab in Power Query Editor
Enable:

  • Column Quality
  • Column Distribution
  • Column Profile

Image1

Use cases

  • Detecting missing or incorrect data
  • Identifying duplicates or outliers
  • Understanding data patterns before transformation
  • Key Insight

Data profiling helps you trust your data before working with it. By identifying issues early, you can clean and transform your dataset more effectively, leading to more accurate and reliable analysis.

11. Handling Missing or Null Values in Microsoft Power BI

Missing or null values are common in real-world datasets, and handling them properly is essential to ensure accurate analysis and reliable reporting.

What are Null Values?
Null values represent missing, undefined, or empty data in a column. If not addressed, they can lead to incorrect calculations and misleading insights.

Common Techniques for Handling Null Values

A. Removing Null Values: This is used to remove rows that contain nulls.
It is best when missing data is minimal and not critical

Using Column Filter (Most Common)

  • Select the column you want to check
  • Click the filter dropdown on the column header
  • Uncheck (null) or (blank)
  • Click OK

Image1

B. Replacing Null Values: This involve replacing null values with Default values (e.g., 0, “Unknown”) or Calculated values (e.g., average, median).

Using the Replace Value Feature

  1. Select Column
  2. Click on the transform ribbon
  3. Select Replace Values

Image 1

  1. Input the value to find, and the value to replace with.
  2. Click OK. Image2

Note: The "replace with" value data type must align with the column data type.

C. Fill Down / Fill Up: Fill-Down and Fill-Up are simple yet powerful techniques in the Power Query Editor used to handle missing (null) values by propagating existing values within a column.

  • Fill Down: Uses the value from the row above
  • Fill Up: Uses the value from the row below

They are useful for structured datasets with repeated categories.

Steps:

  1. Select the column containing null values
  2. Go to Transform tab
  3. Choose: Fill → Down or Fill → Up. Depends on what is appropriate for further analytic procedures.

Image1

Result

Image2

Conclusion

Data preparation is where raw data becomes valuable, reliable, and actionable.

In Power BI, the Power Query Editor gives you everything you need to:

  • Clean messy datasets
  • Transform structures
  • Combine multiple sources
  • Validate data quality

But beyond tools, what truly matters is your mindset.

A great analyst doesn’t just load data, they question it, shape it, and refine it.

Top comments (1)

Collapse
 
toshihiro_shishido profile image
toshihiro shishido

Data prep is where most analytics projects die quietly. The hardest part for ecom data is reconciling pre-discount vs post-discount revenue — Power BI and Looker default to whatever the source schema delivered, which varies by platform. AOV silently shifts 10-15% depending on which view wins.