DEV Community

Cover image for Data Cleaning, DAX And Dashboards in PowerBI.
Derrick Kimanthi
Derrick Kimanthi

Posted on

Data Cleaning, DAX And Dashboards in PowerBI.

INTRODUCTION:

Having messy data is completely normal, but working with messy data can really be overwhelming. Luckily, PowerBI provides a platform that makes data cleaning and analysis fast and easy. In this article I will explain how analysts translate messy data using key tools in PowerBI such as: PowerQuery, Data Analysis Expression (DAX) and Dashboards to visualize key insights.

What is Data cleaning?

Data cleaning is the process of analysing, detecting and correcting: inaccurate, incomplete and poorly formatted data within a given dataset.

Loading Data in PowerBI.

After Opening PowerBI, The first page will be about you connecting data through a specific data source.

Landing Page
You can select through various data sources including Excel and Databases including SQL.
After loading your data you will need to spot the errors and inconsistencies in your data before cleaning.
To do this you can select the table mode. For easy analysis and identification.
The red circle is where you select to have the table view:
Table View

"How to Identify Messy Data."

Identifying errors and inconsistencies is key in the data cleaning process. Below are the signs of a messy data.

Signs of Messy Data:

  • Dates Labelled as Texts.
  • Numbers stored as Texts.
  • Single columns holding multiple values.
  • Nulls and blanks.
  • Duplicate rows.

Implementing PowerQuery In PowerBI

In powerBI data is transformed in the powerQuery Editor. To open the PowerQuery editor:

  • Go to the home tab.
  • Click the transform data button. Transform Data

PowerBI provides a feature under powerQuery that helps you assess the errors in your data set quickly, therefore making the data cleaning process efficient and quicker.
To activate this:

  • Go the view tab.
  • Check the column quality checkbox.

Column Quality
As illustrated in the image above: If there were any errors from the data the quality would clearly show.

Step1:Fixing Data types.

Go through your data and ensure that each column falls in to their correct data type.
For example:

  • Date should be in Date format.
  • Quantity should be in Whole number.
  • Revenue should be in Decimal number.

To do this:

  • Right click on a column.
  • Then select change type.
  • Select the appropriate data type from the list.

Step2:Handling missing values.

When replacing missing values, don't just guess the values this is because the data needs to maintain its originality. So, some business logic is applied in order to maintain consistency without altering the original data.
For Example:

  • For a blank CITY field you can replace with "Unknown"or "NotGiven".
  • For a blank DISCOUNT field you can replace the null with 0. This ensures that aggregation calculations can be performed on the column.

To Do This:

  • Go to the transform tab.
  • Select the replace values.
  • Find the value you want to replace then input the value to replace with.
  • Click Ok.

Step3:Cleaning Textual Data.

To clean text data in PowerBI:
-Click transform tab.
-Select format.
Inside the list you will find various text formatting tools, they include:

  1. Trim- this tool removes extra spaces into text.
  2. Lowercase-It converts all text to lowercase.
  3. Uppercase-It converts all text to uppercase.
  4. Capitalize each word-It converts text into Proper case.
  5. Clean-It removes non-printable characters.

Step4:Removing Duplicates.

Duplicate values return inaccurate results when doing calculations.
You can remove duplicate values on a single column or multiple columns.
To do this in a single column:

  • Right click on the column.
  • Then select remove duplicates.

To do this in multiple columns:

  • Hold ctrl then right click the columns you want.
  • Then got to home tab.
  • Select Remove rows.
  • Select Remove duplicates.

Step5:Spliting Columns.

Spliting data in powerBI depends with how your data is mixed up.

1.First Scenario:Spliting by a delimiter.
A delimeter is a character that's used to separate values within text. Delimeters include: comma(,); vertical bar (|),semicolon(;). For example: Laptop|Electronics|LT001 The delimiter is the Vertical bar(|).
To split this:

  • Go to the Home tab.
  • Select split columns.
  • Choose the "by delimiter" option.
  • Select the delimiter, In this case its the vertical bar(|).

2.Second Scenario:Split by Letter Case.
If you have dataset with joined values e.g "FirstNameLastName" you can implement the split by lettercase.
To split this:

  • Lowercase to UpperCase: It splits text at the point where a lowercase letter is followed by an uppercase letter.
  • UpperCase to LowerCase: It splits text at the point where an uppercase letter is followed by a lowercase letter.

3.Third scenario:Split by Digit/Non-digit.
This method works best to split text from numbers. Especially when there is no consistency delimeter.
For example: "Product123"

  • Digit To Non-Digit-splits the text when a number starts. e.g"123product" becomes "123" and "Product".

  • Non-digit to Digit-splits the text wherever a letter transitions to a number. e.g "Product123" becomes "Product" and "123".

Key takeaway:

Data cleaning is a dynamic process, each dataset will always have different requirements. Therefore, by practicing you will be able to build the skills to handle this challenges easily.

DATA ANALYSIS EXPRESSION (DAX)

DAX(Data Analysis Expression)- refers to a formula language in powerBI that's used to create powerful calculations and data models.
DAX can build:

  • Measures-Its a dynamic calculation that is calculated only when a visual is required.
  • Calculated columns-New columns added to a table and calculation performed row by row.
  • Calculated tables-Its derived from existing tables using DAX formula.

Common DAX functions.

Dax has a variety of functions, they are organized into different categorizes for different uses.

  1. Aggregation function- these functions perform calculations on values.

They Include:

  • Sum-adds all numeric values in a column.
  • Average-Returns the average of values in a column.
  • Count-Counts the number of rows or values in a column.
  • Min-Returns the minimum numeric value in a column.
  • Max-Returns the largest numeric value in a column.
  • Product-Returns the product of numbers in a column.

2.Logical Functions-Logical functions in DAX compare values, test conditions and return a TRUE or FALSE.

They Include:

  • IF-Checks a condition and returns one value if TRUE, else it returns a another value.
  • AND- Returns a TRUE if all conditions are TRUE.
  • OR-Checks if one condition is TRUE and returns a TRUE.
  • NOT- The NOT function reverses a logical expression. e.g Reverses TRUE to FALSE or FALSE to TRUE.

3.Filter Functions- Filter functions in DAX are used to control context.
They Include:

  • ALL-It returns all rows or values within a column ignoring any filters applied.
  • CALCULATE-Evaluates an expression in a modified filter context.
  • FILTER-The filter function creates a newtable based on a specific data.
  • ORDERBY-Defines the columns that determine the sort order.

4.Date and time functions-Extracts insights and data from date and time.
They Include:

  • DATE Function-Returns the specified date in datetime format.

  • DAY Function-Extracts the days from a specified date.

  • DATEDIFF-Returns the difference between two dates.

  • YEAR Function-Returns the year of a date in a four digit format.

  • QUARTER FUNCTION-Returns the quarter as a number from 1 to 4.

  • TIME-Converts hours, minutes, and seconds given as a number from 0 to 59.

  • WEEKNUM-Returns the week number for the given date and year according to the return type value.

5.Iterator Functions-They evaluate row by row, to perform calculations.
Common Iterator Functions:

  • SUMX-Iterates through a table then evaluates an expression for each row and then sums the result.

  • AVERAGEX- Calculates the average for an expression.

  • MAXX/MINX-Finds the maximum or minimum value.

  • RANKX-used to rank items based on a specific expression.

DASHBOARDS IN POWERBI.

A dashboard its a single-page document used to visualize key insights drawn from the data. They are used to monitor key performance indicators (KPIs) at a glance.

Characteristics of a Good Dashboard
A good dashboard should be:

  • Relevant-Make sure to focus on key KPIs that align to the business requirements and goals of the analysis.

  • Clear-Make sure to use clear visuals and appropriate charts for easier understanding.

  • Consistent-Make sure to maintain a common color and fonts through the entire page that is relevant to your audience.

  • Interactive-A good dashboard should be responsive, add slicers and filters to enable easy analysis through the data.

Components of a PowerBI dashboard:

  1. Title-Should be in a large font, placed in the top of the page.

2.Key Performance Indicators (KPIs)- These are high-level critical numbers that gauge the performance.
They are structured at the top, mostly after the title. They should be in big and bold characters.

An Example of KPIs:

  • Total Sales.
  • Total Profits.
  • Total Revenue.

3.Charts(Visual Insights)-These are graphical representation of data.

They Include:

  • Card Visual-Displays a single value of a calcualation.

Card visual

  • Multi-row card-Displays multiple values of KPIs together.

Multi-row card

  • Column and Bar Charts-A column chart displays data using vertical and horizontal bars respectively to compare values across categories.

Column charts

  • Pie Chart-Shows distribution of data in different sectors.

Pie chart

  • Line Chart-A line chart shows trends over time.

Line chart

  • Area Chart-An area chart is a line chart, with the area under filled.

Area Chart

  • Funnel Chart-Shows values across sequential stages.

IFunnel Chart

  • Donught Chart-Similar to a pie chart but it has a hole in the middle.

Donught Chart

  • Table-A table shows data in rows and columns.

Table

  • Matrix-Its a pivot table with rows and columns.

Matrix

  • Q&A visual-It self-analyzes your data. Allowing users to ask natural questions.

Q&A

  • Stacked Column Chart-It compares multiple values side by side.

Stacked Column

  • Stacked Bar Chart-Its similar to a Column chart but displays horizontally.

  • Scatter Chart-It shows a relationship between two variables.
    Scatter Chart

  • WaterFall Chart-It shows how values drop or rise over a period of time.
    WaterFall Chart

  • Combo Chart-Its a combination of column and line chart.
    Combo Chart

  • Map-It displays data distribution across different geographic regions.

MAP

  • Python Chart-Implements python to create custom charts.
  • R Visual-Implement R to create custom charts.

Creating Dashboards In PowerBI:

Creating a Dashboard involves organizing multiple visuals like (charts and KPIs) together. This allows you to communicate key insights therefore supporting decision making.

A dashboard should:

  • Consist of one unscrollable page.

  • Shows critical metrics.

  • Be interactive.

  • Update automatically.

Layout And Structure:

  • Title - placed at the top.

  • KPIs - positioned at the top row.

  • Charts -Displayed at the middle.

  • Filters and Slicers- should be aligned at the side (left or right).

Illustration Of A PowerBI Dashboard:

Dashboard

What To Avoid.

Avoid the following when creating dashboards:

  • Avoid cluttering many charts together (3-6 are enough).

  • Repeating charts with the same data.

  • Adding to many decorative colors (Stick to a consistent color that aligns with your audience).

  • Raw data or tables in to your dashboard.

  • Calculated columns-Do not include calculated columns in your dashboard.

Top comments (0)