DEV Community

Cover image for How Excel is Used in Real-World Data Analysis
Ruto Kipkirui Robert
Ruto Kipkirui Robert

Posted on

How Excel is Used in Real-World Data Analysis

Introduction

In today's fast-paced business environments, data is considered the cornerstone of decision-making, policy formulation, and other organizational needs. MS Excel is a robust spreadsheet developed by Microsoft for organizing, analyzing, and visualizing data in rows and columns. In the data science and analytics domain, MS Excel is critical for analyzing and managing data to generate insights that enhance decision-making. Excel's polarity is characterized by its ease of use, flexibility, automation, and visualization.

Ways Excel Is Used in Real-World Data Analysis

Across the data science and analytics domain, MS Excel is frequently employed in the following ways;

a) Data Cleaning and Preprocessing

At the beginning of every data science and analytics project, data cleaning is required, and MS Excel is the primary tool. Typical Excel features and functions applied during data cleaning include Text to Columns, Remove Duplicates, Find and Replace, and Power Query.

b) Exploratory Data Analysis

Before performing data science and analytics activities, it is crucial to understand the dataset at hand, its structure, and trends. MS Excel features Pivot Tables, Pivot Charts, and Slicers that provide instant aggregation, sorting, and visualizations.

c) Data Analysis and Reporting

Modern organizations and businesses operate based on insights generated from data. MS Excel features such as pivot tables, charts, and conditional formatting help data analysts analyze and visualize data for clear, actionable insights that enhance decision-making.

MS Excel Features or Formulas

The typical MS Excel features and formulas employed in the data science and analytics domain include the following.

Data Cleaning Functions

Function Purpose Example Result
UPPER() Converts text to uppercase =UPPER("john") JOHN
LOWER() Converts text to lowercase =LOWER("JOHN") john
PROPER() Capitalizes the first letter of each word =PROPER("john doe") John Doe
TRIM() Removes extra spaces from text =TRIM(" John Doe ") John Doe
LEFT() Extracts characters from the left side =LEFT("John",2) Jo
RIGHT() Extracts characters from the right side =RIGHT("John",2) hn
MID() Extracts characters from the middle of the text =MID("John",2,2) oh
LEN() Returns the number of characters in a text string =LEN("John") 4
FIND() Returns the position of a character or substring (case-sensitive) =FIND("o", "John") 2
SEARCH() Returns the position of a character or substring (not case-sensitive) =SEARCH("o", "JOHN") 2
SUBSTITUTE() Replaces specific text within a string =SUBSTITUTE("John Doe", "Doe", "Smith") John Smith
REPLACE() Replaces text based on position =REPLACE("John Doe",6,3,"Smith") John Smith
CONCAT() Combines multiple text strings =CONCAT("John"," ", "Doe") John Doe
TEXTJOIN() Combines text with a specified delimiter =TEXTJOIN(", ", TRUE, "John", "Doe") John, Doe

Statistical Functions

Function Purpose Example Result
AVERAGE() Calculates the mean value =AVERAGE(B2:B10) Average of values
MEDIAN() Returns the middle value =MEDIAN(B2:B10) Median value
MODE() Returns the most frequent value =MODE(B2:B10) Most common value
MIN() Returns the smallest value =MIN(B2:B10) Minimum value
MAX() Returns the largest value =MAX(B2:B10) Maximum value
COUNT() Counts cells containing numbers =COUNT(B2:B10) Number of numeric cells
COUNTA() Counts non-empty cells =COUNTA(B2:B10) Number of filled cells
COUNTBLANK() Counts empty cells =COUNTBLANK(B2:B10) Number of blank cells
STDEV.S() Calculates sample standard deviation =STDEV.S(B2:B10) Sample variability
STDEV.P() Calculates population standard deviation =STDEV.P(B2:B10) Population variability
VAR.S() Sample variance =VAR.S(B2:B10) Sample variance
VAR.P() Population variance =VAR.P(B2:B10) Population variance
LARGE() Returns the nth largest value =LARGE(B2:B10,1) Largest value
SMALL() Returns the nth smallest value =SMALL(B2:B10,1) Smallest value
RANK() Returns the rank of a value =RANK(B2,B2:B10) Position in list
PERCENTILE() Returns a percentile value =PERCENTILE(B2:B10,0.75) 75th percentile

Personal Reflection

As a beginner data scientist, learning MS Excel as a foundational tool has changed how I interact with data. While previously I thought it was all about big tools like SQL and Python, I have come to appreciate that Excel is a source of clean datasets and can also be used at scale to generate insights that improve decision-making.

Top comments (0)