Working with data is challenging. Data is rarely in the right format, in the right place, at the right time.
If you are new to PowerBI or upgrading your analysis from Excel getting familiar with the transformation tools will make it easier to tell data stories and for your end-user to gain insight.
Load the data
1 - Rename the query
2 - Rename columns
3 - Remove rows and columns
4 - Remove rows with a filter
5 - Remove duplicates
6 - Merge columns
7 - Replace values
8 - Format text
9 - Change data types
10 - Add a custom column
Before we can get started shaping and transforming our data we need to either upload it or connect to the data source. PowerBI supports many databases and has connectors to support platforms like Google Analytics and Salesforce.
For this example we'll be using video game sales data in a csv file. When the data is loaded and we then select ‘transform data’ to be taken to the Data View. From here we can shape our data using tools on the ribbon or right-clicking to access all the options we need.
The query can be renamed in the Query Settings pane on the right-hand side of the canvas. I’ve changed this to ‘Sales’ from the file name that was loaded by default.
The Query Settings pane is also home to the ‘Applied Steps’ list. Every change you make to your data creates a new step on the list. This makes it easy to undo any unwanted changes.
If you’re loading in multiple datasets from different sources using a descriptive name makes it easy to find what you’re looking for. This is even more important when you are loading data from a database. If you have datasets named ‘Query1’, ‘Query2’, and ‘Query3’ you and your end-user are going to have a tough time finding what they’re looking for.
By right clicking the column header you can rename your column to something that’s more user friendly or descriptive.
Like a descriptive name for a dataset, a descriptive column name makes it easier to find what you’re looking for. This is especially important when loading datasets with technical names for columns. Have some empathy for your end-user and they’re more likely to trust your data.
You can find options to remove rows and columns under the ‘Home’ tab on the ribbon. There are options to remove a single row or column or remove all but your selected row or column.
By removing unnecessary columns it is easier for our end-user to get to what they need. Scrolling through unnecessary columns is especially frustrating if you have multiple datasets to search through.
We can remove specific rows that aren’t needed in our dataset using a filter. The menu can help remove columns that are empty, start or end with a specific value.
Just like the other steps to clean the dataset, this will make it easier to navigate for our end-user and help them answer their questions quicker.
By right clicking and selecting 'remove duplicates' we can remove any duplicates in the dataset.
This is useful when loading in an unfamiliar dataset that hasn’t come from a trusted source especially if the dataset is large. This could be done in Excel but this gets more difficult with large datasets. This could also be done in a database but if you don’t have permissions to load in data this may not be an option for you.
By right clicking and selecting 'merge columns' we can merge columns using a separator and renaming it to something useful.
This is useful for merging a First and Last Name column, or columns that would be better together. This can also be done with a formula but right clicking makes this task quicker.
You can find the replace values option on the ‘Transform’ tab of the ribbon. The advanced menu gives options to insert special characters or match the entire cell contents.
Datasets may have been loaded with typos that need cleaning up or values that aren’t as descriptive as they could be. This is especially useful if product codes or jargon have been introduced and plain English would make it easier for end-users to understand.
Formatting options can be found in the ‘Transform’ tab of the ribbon and can be used to trim, clean, and change the case of text.
Cleansing text fields makes it easier to read and perform further transformations. The ‘Trim’ option is particularly useful to remove any leading or trailing spaces.
Clicking on the icon next to the column header shows the data type transformation options.
This a quick way to change the data type if it has been loaded incorrectly. Changing a number which has been loaded as text means we can start performing calculations. Making sure dates are stored as dates means we can use these as filters in dashboards.
The custom column option can be found on the ‘Add column’ tab of the ribbon. A column can be created and named here using a formula to build the logic.
This can be useful to add calculations on existing columns, adding text to an existing value, or displaying part of a date. Just remember to check the data type before you start trying to add values together.
These are just some of the beginner-friendly options available to cleanse and shape data in PowerBI. There are plenty of other tools available to wrangle your data both within the GUI and the M language but this is a great place to start.
No matter which data source and tools you are using it's important to keep the user of your data and visualisations in mind. Having empathy for how the data will be used should guide how you transform your dataset.