Bamgboye Simisola

Posted on May 10

Mastering Power Query in Power BI: A Complete Data Transformation Guide

#datatransformation #powerbi #tutorial #powerquery

Power Query is the backbone of data preparation in Power BI. Before you can build stunning dashboards or write complex DAX, your data needs to be clean, consistently shaped, and properly related. In this guide, we'll use the real CodeSphere Hub dataset, comprising sales transactions, booking records, product info, and calendar data, to demonstrate every essential Power Query technique.

The dataset includes 163,000+ sales rows spanning 2015–2020, a messy hotel bookings CSV full of nulls and inconsistencies, a product catalogue, and several lookup tables. Exactly the kind of real-world mess Power Query was built to handle.

Working with Power Query Editor

The Power Query Editor is where all data transformation happens before data is loaded into Power BI's model. You access it by selecting Transform Data after importing a source, or from the Home ribbon in Power BI Desktop.

Key Interface Areas

Query Panel (left): Lists all loaded tables. Our project has 12 queries including Sales, Data Cleaning, Calendar, and all lookup tables.

Data Preview (center): Shows a sample of your data. Columns are typed (date, text, number) and clickable for column-level operations.

Applied Steps (right): Every action you take is recorded here as a step you can edit, reorder, or delete. This is Power Query's version of version control.

Formula Bar: Shows the M code behind each step, toggle it on from the View tab.

Data Profiling Techniques

Before cleaning, you need to understand your data. Power Query has three built-in profiling views that surface quality issues instantly. Open them from View → Column Quality, Column Distribution, Column Profile.

Column Quality
Shows three metrics as a percentage bar at the top of each column: Valid (green), Error (red), and Empty/Null (gray). In the Data Cleaning table, you can immediately see that guest_name and payment_status have significant null rates.

Column Distribution
Shows value frequency as a histogram under each column. Instantly reveals skewed distributions, outlier spikes, or unexpected cardinalities. For example, the listing_city column's distribution would show the "LA"/"los angeles"/"Los Angeles" fragmentation.

Column Profile
The most detailed view — select a single column and see stats panel: min, max, average, count, distinct count, null count, and a value frequency chart. Available for the entire dataset (not just top 1,000 rows) when you change the profiling scope at the bottom status bar.

Handling Missing and Null Values

The Data Cleaning table is a goldmine of real-world data quality issues — nulls in guest_name and payment_status, negative nights values, inconsistent boolean representations (True/False/Yes/No/n), and extra whitespace in guest names. Let's tackle each systematically.

1. Removing Rows with Nulls
For columns where a null makes a row useless (e.g., missing booking_id), remove the row entirely.

2. Replacing Nulls with Default Values
For optional fields, replace nulls with a meaningful default rather than deleting the row.
Default values
Replace with 0, mean or median for numerical fields
Replace with “Unknown” for categorical fields
Replace with business-defined defaults

Select the column
Right-click
Choose Replace Values

3. Removing Duplicates
Power Query can deduplicate on one or multiple key columns. For booking data, booking_id should be unique.

Select the key column(s)
— e.g., booking_id.
Home → Remove Rows → Remove Duplicates
(or right-click the column → Remove Duplicates).

OR right-click the column → Remove Duplicates

Creating Index Columns

Index columns provide unique row identifiers, essential for building relationships, tracking row positions, and creating surrogate keys. Our CodeSphere Hub Sales table doesn't have a guaranteed unique row key, making an index column critical.

Open the target query Select Fact Table in the Queries panel on the left.
Navigate to Add Column → Index Column In the ribbon, click Add Column tab, then click the dropdown arrow on Index Column.
Choose your start value Select From 1 (1-based) for human-readable IDs, or From 0 for zero-based indexing used in joins.
Rename the column Double-click the new Index column header and rename it.

Conditional Columns and Logic

Conditional columns let you create new derived columns based on if/else logic, without writing complex DAX measures. In the Data Cleaning table, we can classify bookings by payment status and validity.

Go to Add Column → Conditional Column
Set up the first condition
Add an else-if clause
Set the else (default) value

Column from Examples

This is one of Power Query's most powerful AI-assisted features. You provide 1–3 example output values, and Power Query infers the transformation rule automatically. It's ideal when you know what output you want but don't know the exact M formula.

Select column
Click on the column header to highlight it.
Add Column → Column from Examples → From Selection
A new editable column panel appears on the right.
Type your example output in the first cell
For example, "los angeles" type Los Angeles, for "LA" type Los Angeles, for "San Fran" type San Francisco.
Press Enter after each example
Power Query suggests a formula. Keep typing examples until the entire column looks correct. Click OK.

Grouping and Aggregation

Group By is Power Query's equivalent of SQL's GROUP BY; it lets you summarise data by collapsing rows into aggregated results. This is useful before loading data, reducing model size and query load time.
Common aggregation functions include:
Sum
Count
Average
Minimum
Maximum

Select the Sales query
Make sure you're working on the right table.
Home → Group By
The Group By dialogue opens.
Configure grouping
Group By column: Region_Key. New column name: TotalQty. Operation: Sum. Column: Quantity_Sold.
Add a second aggregation (optional)
Click Add aggregation. New column: TotalTransactions. Operation: Count Rows.

Pivoting and Unpivoting Columns

The Pivot Demo.csv in our dataset is a perfect real-world example; it contains years as column headers (2015, 2016, 2017…) with metrics as rows. This "wide" format is common in exports but difficult to work with in Power BI's data model, which needs a "long" format.

Unpivoting: Wide → Long (Most Common)

Load the Pivot Demo table
It has columns: Metric, 2015, 2016, 2017, 2018, 2019, 2020.
Select the year columns (2015–2020)
Hold Ctrl and click each year column header.
Transform → Unpivot Columns
Power Query creates an Attribute column (the year) and a Value column (the number).
Rename columns
Rename Attribute to Year and Value to Amount.

Merging and Appending Queries

When a dataset has two separate annual sales files, we have to look up tables (Regions, Gender, Categories) that need to be joined.

Appending Queries (Union / Stack Rows)

Use Append when you want to combine tables with the same structure — like stacking 2019 and 2020 sales into one table.

Home → Append Queries → Append Queries as New
(creates a new query instead of modifying an existing one)
Select tables to combine.
Choose three or more tables and add Sales 2019 and Sales 2020 to the list.
Click OK.
Power Query stacks the rows. Columns are matched by name; mismatches produce nulls.

Merging Queries (SQL-style JOIN)

Use Merge to bring lookup columns into your fact table. Let's enrich our Sales table with Region names from the Regions lookup.

Select Sales query → Home → Merge Queries
Select the right table:
Choose Regions. Click Region_Key in both tables to set the join key.
Join Kind: Select Left Outer (all Sales rows + matching Region info).
Expand the nested table column.
Click the expand icon on the new Regions column. Uncheck all but Region and Country. Uncheck "Use original column name as prefix."

Date and Time Transformations

The Calendar table contains date-time strings in ISO format, like 2015-01-01T09:53:24. Proper date handling is critical for time intelligence in DAX.

Key Date Operations

Change Type to Date: Select Date_and_Time column → Transform → Data Type → Date/Time → Date.
Extract Year/Month/Day: Add Column → Date → Year / Month / Day. Creates integer columns useful for slicers.
Extract Month Name: Add Column → Date → Month → Name of Month. Returns "January", "February", etc.
Extract Day of Week: Add Column → Date → Day → Day of Week. Returns 0 (Sunday) through 6 (Saturday) — or Name of Day for text.
Calculate Date Difference: Use a Custom Column with an M formula to find the duration between dates.

Adding Prefix and Suffix Using Power Query

Adding prefixes or suffixes to values standardises identifiers, helps distinguish keys across tables, and makes data more readable. Our Invoice_ID column already uses "INV-" prefixes — let's see how that's done and how to apply it to other columns.

Format → Add Prefix/Suffix (GUI)

Select the column
(e.g., Region_Key)
Transform tab → Format → Add Prefix
Type the prefix, e.g., REG-. Result: 1 → REG-1, 10 → REG-10.

You Now Have a Clean, Model-Ready Dataset

Working through all these techniques on any dataset, you've transformed a collection of raw, messy CSVs into a structured, consistent, analysis-ready data model. Here's what was accomplished:

Loaded different tables and datasets
Added index columns as surrogate keys for relationship management
Created conditional columns to flag data quality issues and categorise data
Unpivoted the wide Pivot Demo table into a proper long format
Merged sales data with Region and many more datasets
Appended 2019 and 2020 sales files into a unified table
Extracted date parts from the Calendar table for time intelligence
Identified and resolved 7 data quality issues in the booking dataset

With this foundation in Power Query, your DAX measures will be simpler, your reports faster, and your insights more reliable. Happy modelling!

DEV Community