Introduction
In today's fast-paced business environments, data is considered the cornerstone of decision-making, policy formulation, and other organizational needs. MS Excel is a robust spreadsheet developed by Microsoft for organizing, analyzing, and visualizing data in rows and columns. In the data science and analytics domain, MS Excel is critical for analyzing and managing data to generate insights that enhance decision-making. Excel's polarity is characterized by its ease of use, flexibility, automation, and visualization.
Ways Excel Is Used in Real-World Data Analysis
Across the data science and analytics domain, MS Excel is frequently employed in the following ways;
a) Data Cleaning and Preprocessing
At the beginning of every data science and analytics project, data cleaning is required, and MS Excel is the primary tool. Typical Excel features and functions applied during data cleaning include Text to Columns, Remove Duplicates, Find and Replace, and Power Query.
b) Exploratory Data Analysis
Before performing data science and analytics activities, it is crucial to understand the dataset at hand, its structure, and trends. MS Excel features Pivot Tables, Pivot Charts, and Slicers that provide instant aggregation, sorting, and visualizations.
c) Data Analysis and Reporting
Modern organizations and businesses operate based on insights generated from data. MS Excel features such as pivot tables, charts, and conditional formatting help data analysts analyze and visualize data for clear, actionable insights that enhance decision-making.
MS Excel Features or Formulas
The typical MS Excel features and formulas employed in the data science and analytics domain include the following.
Data Cleaning Functions
| Function | Purpose | Example | Result |
|---|---|---|---|
UPPER() |
Converts text to uppercase | =UPPER("john") |
JOHN |
LOWER() |
Converts text to lowercase | =LOWER("JOHN") |
john |
PROPER() |
Capitalizes the first letter of each word | =PROPER("john doe") |
John Doe |
TRIM() |
Removes extra spaces from text | =TRIM(" John Doe ") |
John Doe |
LEFT() |
Extracts characters from the left side | =LEFT("John",2) |
Jo |
RIGHT() |
Extracts characters from the right side | =RIGHT("John",2) |
hn |
MID() |
Extracts characters from the middle of the text | =MID("John",2,2) |
oh |
LEN() |
Returns the number of characters in a text string | =LEN("John") |
4 |
FIND() |
Returns the position of a character or substring (case-sensitive) | =FIND("o", "John") |
2 |
SEARCH() |
Returns the position of a character or substring (not case-sensitive) | =SEARCH("o", "JOHN") |
2 |
SUBSTITUTE() |
Replaces specific text within a string | =SUBSTITUTE("John Doe", "Doe", "Smith") |
John Smith |
REPLACE() |
Replaces text based on position | =REPLACE("John Doe",6,3,"Smith") |
John Smith |
CONCAT() |
Combines multiple text strings | =CONCAT("John"," ", "Doe") |
John Doe |
TEXTJOIN() |
Combines text with a specified delimiter | =TEXTJOIN(", ", TRUE, "John", "Doe") |
John, Doe |
Statistical Functions
| Function | Purpose | Example | Result |
|---|---|---|---|
AVERAGE() |
Calculates the mean value | =AVERAGE(B2:B10) |
Average of values |
MEDIAN() |
Returns the middle value | =MEDIAN(B2:B10) |
Median value |
MODE() |
Returns the most frequent value | =MODE(B2:B10) |
Most common value |
MIN() |
Returns the smallest value | =MIN(B2:B10) |
Minimum value |
MAX() |
Returns the largest value | =MAX(B2:B10) |
Maximum value |
COUNT() |
Counts cells containing numbers | =COUNT(B2:B10) |
Number of numeric cells |
COUNTA() |
Counts non-empty cells | =COUNTA(B2:B10) |
Number of filled cells |
COUNTBLANK() |
Counts empty cells | =COUNTBLANK(B2:B10) |
Number of blank cells |
STDEV.S() |
Calculates sample standard deviation | =STDEV.S(B2:B10) |
Sample variability |
STDEV.P() |
Calculates population standard deviation | =STDEV.P(B2:B10) |
Population variability |
VAR.S() |
Sample variance | =VAR.S(B2:B10) |
Sample variance |
VAR.P() |
Population variance | =VAR.P(B2:B10) |
Population variance |
LARGE() |
Returns the nth largest value | =LARGE(B2:B10,1) |
Largest value |
SMALL() |
Returns the nth smallest value | =SMALL(B2:B10,1) |
Smallest value |
RANK() |
Returns the rank of a value | =RANK(B2,B2:B10) |
Position in list |
PERCENTILE() |
Returns a percentile value | =PERCENTILE(B2:B10,0.75) |
75th percentile |
Personal Reflection
As a beginner data scientist, learning MS Excel as a foundational tool has changed how I interact with data. While previously I thought it was all about big tools like SQL and Python, I have come to appreciate that Excel is a source of clean datasets and can also be used at scale to generate insights that improve decision-making.
Top comments (0)