Introduction
The most common data analytics tool that is popular with beginners is Microsoft Excel. It is simple to operate, versatile, and strong enough to carry out simple data analysis functions without possessing any programming skills.
In this article, I present MS Excel as an analytics tool used in data analysis and how it may be utilized to analyse data using straightforward examples. This tutorial will assist the beginners who are new to Excel and wish to know how it can assist in the analysis of data.
Data Analysis
Data analysis is the process of collecting, cleaning, analysing data to discover important information, trends, and patterns in order to make informed decision.
Data Collection
Data collection is the process of obtaining both primary and secondary data for the purpose of analysis. This can either be in the field (primary) or already stored data from databases (Secondary data).
Once collected, data is stored in a tool(Excel) for analysis. This is called data entry. For example, say you are a HR analyst, and you wish to analyse employee data using Excel. You have to first obtain the employee data from the relevant databases and store it in an excel file as shown below:
Data Cleaning
Data cleaning involves ridding data of any inconsistencies which would otherwise negatively impact the analysis process. This can involve:
- Data formatting
- Getting rid of duplicates (Mainly done in the unique identifier Column)
- Handling Outliers
- Handling Blanks/Missing Values
Data Formatting
This is the process of correcting any formatting problem.
For instance in the HR dataset, the Hire date Column the date data doesn't currently fit in the Hire date column. See Below:
This is fixed by extending the Hire date Column. This is done by right clicking the + sign at the furthest end of the column.
The number formatting also comes into play, For example, the Employee ID column in the HR dataset is initially formatted into number format rather than text format.
How do you know this? - In excel, text is always aligned to the left, while Date and number formats are always aligned to the right for each cell in a column.
To fix this, the Format option is utilized in excel. Select the Employee_ID column Go to:
Home>Font>Click the furthest Arrow>Format Cells>Text
Getting rid of duplicates
This is done by first conditional formatting the Unique Identifier column, where you highlight the duplicates. Once the duplicates are visualized you can use logical arguments to suggest how to deal with duplicates.
- Select the Unique Identifier Column:
- Home>Styles>Conditional Formatting>Highlight Cell Rules>Duplicate Values
- and then highlight with your preferred colour.
Once highlighted, the next step if applying you criteria. For instance, In our example for the HR dataset, we can choose to keep the oldest Unique Identifier (Basically, the first ID entry and delete subsequent duplicate entries).
Handling Blanks/Missing Values
Missing Values can be as a result of data entry problems, or unavailable data for that particular Cell. Say for example you have Sales data set that contains countries and their respective cities. If the a particular country doesn't have data in the city column. How do we handle this?
- We can fill the empty cells with unavailable to prevent adding incorrect values/Cities for a particular set of data.
Next, for values in a particular column, we can fill empty cells using the averages, median or mode of other columns. This is decided by what the data represents. E.g., ratings(median), Salary(average, median) e.t.c.
This is done by first computing the arithmetic's of these particular columns:
= Average ([Cell range])
= Median ([Cell range])
= Mode ([Cell range])
Handling Outliers
1. Identify Outliers
- Boxplots (most common)
- Z-scores (values > ±3 are often outliers)
- IQR method
Lower bound = Q1 − 1.5 × IQR
Upper bound = Q3 + 1.5 × IQR
2. Verify the Outliers
- Before removing anything, check:
- Is it a data entry error? (extra zero, wrong unit)
- Is it a valid but extreme value?
3. Remove Outliers
- Use this only if the value is an error or irrelevant.
- Delete the row
- Filter values outside acceptable ranges
4. Transform the Data
- Reduce the impact instead of removing:
- Log transformation
- Square root transformation
Data Analysis
This is the process of obtaining imortant information/insights from cleaned and transformed data
1. Descriptive Statistics (Start Here)
This summarizes your data so you understand it.
In Excel:
Assume your cleaned prices are in Column B.
Mean (Average)
=AVERAGE(B:B)
Median
=MEDIAN(B:B)
Minimum & Maximum
=MIN(B:B)
=MAX(B:B)
Standard Deviation
=STDEV(B:B)
Interpretation example:
“The average product price is KSh 1,666, with most prices clustered around the mean, as indicated by a standard deviation of …”
2. Identify Outliers (IQR Method – Practical)
This shows whether any prices are unusually high or low.
Step-by-step:
Q1
= QUARTILE(B:B,1)
Q3
= QUARTILE(B:B,3)
IQR
= Q3-Q1
Lower Bound
=Q1-1.5*IQR
Upper Bound
=Q3+1.5*IQR
Any price below the lower bound or above the upper bound is an outlier.
Pivot Tables and Visualizations
PIVOT TABLES
Pivot tables are used to summarize data in excel. Pivot tables come in hand when one wants to group dataset using a particular criteria or to aggregate a given set of grouped data. For example, You may want to calculate the total revenue by a given region, or the average salaries of employees by Region. This can be easily done using pivot tables.
To create one, you select the entire data range>Insert> Pivot table>From table Range.
This inserts the pivot table in a new worksheet or a pre-created worksheet
The next steps involves summarizing the desired data criteria. For example, Average Salary by departments as shown below:
Visualization
This can either be done by raw data, or data from the pivot table. From raw data, a column or columns is/are selected, and a chart is inserted.
This are some of the charts that can be created using raw data in excel:
The charts can also be created using the pivot tables, by heading to:
Pivot table analyze > pivot charts,
and selecting the desired chart.
Dashboards
Lastly, once the visualizations are complete, one can decide to present the entire visual, KPI's on one panel. This is called a dashboard. The charts are clearly and systematically arranged to tell a story about the data. The dashboards are also important since all the required information about your data can be placed in one place which makes it easier for presentation:
See an example below:
The above dashboard clearly and visually presents a story about the employees data.
Conclusion
To sum up, Microsoft Excel is an effective and easy-to-use data analysis tool. It enables one to gather, clean, analyze, and visualize information without having to know how to program. Excel is useful because it allows users to discover valuable insights and trends through its data formatting capabilities, missing values, duplicates, descriptive statistics, pivot tables, and charts. Dashboards also improve knowledge as they display critical information in a single transparent display. Altogether, Excel is a great place to start with when someone wishes to learn more about data analysis and how to use simple and practical tools to make an informed decision.









Top comments (0)