DEV Community

Cover image for How Excel is Used in Real-World Data Analysis
Kahuthu Muriuki
Kahuthu Muriuki

Posted on

How Excel is Used in Real-World Data Analysis

Introduction

Excel is not always the flashiest choice. There are more specialised tools — Power BI for dashboards, Python for large-scale processing, SQL for database queries. I use several of them. But Excel remains the place where raw data first lands, where quick checks happen, and where non-technical stakeholders can engage with findings without needing a login or a training session.


What Is Excel, and Why Does It Still Matter?

Microsoft Excel is a spreadsheet application that organises data in rows and columns, supports calculation through formulas and functions, and produces visual output through charts and pivot tables. What keeps it relevant despite faster, more capable alternatives is its low barrier to entry and near-universal presence in organisations. Every finance department, every compliance team, and every operations desk I have worked in has had Excel open.

For anyone entering data-related work, Excel builds the right instincts — understanding data types, spotting inconsistencies, and thinking about how records relate to each other. The skills transfer directly to more advanced tools later on.


Where Excel Shows Up in Real Work

Financial Reporting and Reconciliation

In financial services, Excel is where P&L summaries are assembled, cash flow projections are modelled, and month-end variance analyses are performed. Functions like SUMIF, VLOOKUP, and nested IF statements do most of the heavy lifting.

Compliance Tracking and KYC Management

A well-built Excel tracker for compliance work typically uses data validation to restrict entries to approved values, conditional formatting to flag overdue reviews in red, and DATEDIF formulas to calculate how many days remain before a document expires. These are not complex features, but they prevent errors that cost time and create regulatory exposure.


Data Cleaning: The Part Nobody Talks About Enough

In practice, most data arriving in Excel is not clean. Client names have inconsistent spacing, dates are stored as text, and duplicate entries appear across merged files from different systems. Cleaning this data before any analysis is not optional — it directly determines whether the conclusions drawn are accurate.

Features Regularly used

TRIM and CLEAN

TRIM removes extra spaces that accumulate when data is exported from systems like Salesforce or Zendesk. CLEAN removes non-printable characters that sometimes appear in data pulled from legacy platforms. Both are quiet functions that prevent downstream errors.

=TRIM(A2)      — removes leading, trailing and double spaces
=CLEAN(A2)     — removes non-printable characters from text
Enter fullscreen mode Exit fullscreen mode

Find and Replace — Ctrl+H

Particularly useful when a field has inconsistent values — for example, a country column where the same country appears as KE, Kenya, and kenya. A few Find and Replace passes standardise the field before any counting or filtering happens.

Text to Columns

When a field contains combined data — a full name in one cell, a date and a reference number separated by a hyphen — Text to Columns splits it into usable parts. I have used this frequently when pulling client data from onboarding systems that concatenate fields.

Remove Duplicates

In compliance work, duplicate partner records are a real risk. The Remove Duplicates function, combined with conditional formatting to highlight matches first, is a reliable way to check data quality before running analysis.

Data Validation

The best way to reduce cleaning work is to prevent bad data from entering in the first place. Data Validation restricts cells to approved values, specific number ranges, or particular date formats. When I build input templates for teams, data validation is always included.


Transforming Data into Something Usable

Once data is clean, it often needs reshaping before it can be analysed. Transformation in Excel covers formatting for readability, converting data types, and restructuring how records are organised.

Formatting for Readability

Column widths that cut off values, missing headers, inconsistent number formats — these are not cosmetic issues. They slow down work and cause misreadings. Bolding header rows, applying consistent number formatting, and using freeze panes to keep headers visible while scrolling are small habits that save time across large datasets.

Data Type Conversion

Numbers stored as text are a persistent problem in exported data. They look correct, but will not respond to SUM or AVERAGE. The VALUE function converts them. Similarly, dates stored as text need conversion before date functions will work correctly.

=VALUE(A2)       — converts text that looks like a number into an actual number
=DATEVALUE(A2)   — converts text that looks like a date into a date serial number
Enter fullscreen mode Exit fullscreen mode

Text Standardisation

In datasets containing client names or country entries, inconsistent capitalisation creates grouping errors. PROPER, UPPER, and LOWER standardise text fields so that pivot tables and COUNTIF formulas group records correctly.

=PROPER(A2)    — capitalises the first letter of each word
=UPPER(A2)     — converts all text to capitals
=LOWER(A2)     — converts all text to lower case
Enter fullscreen mode Exit fullscreen mode

Analysing Data: From Raw Numbers to Decisions

The analytical stage is where the work becomes visible to others. In financial and compliance roles, this means producing numbers that someone will act on — a funding decision, a risk escalation, a process change. The formulas used here need to be correct, and the logic behind them needs to be defensible.

Core Statistical Functions

The foundational functions cover the majority of day-to-day analytical needs in operations and finance work:

  • =AVERAGE(range) — mean value across a set, used for KPI benchmarking
  • =MEDIAN(range) — middle value, more reliable than average when outliers are present
  • =COUNT(range) — counts numeric entries; COUNTA counts all non-empty cells
  • =SUM(range) — total of a range; SUMIF adds a condition
  • =MAX(range) and =MIN(range) — identify the ceiling and floor of a dataset

Lookup Functions

VLOOKUP and its more capable successor XLOOKUP are essential when reconciling data across multiple sources — matching partner IDs against a reference table, pulling account names from a separate register, or checking whether a client appears on a restricted list.

=XLOOKUP(lookup_value, lookup_array, return_array)
Enter fullscreen mode Exit fullscreen mode

XLOOKUP is the version worth learning now. It handles left-hand lookups, returns custom values when no match is found, and is less sensitive to column order changes than VLOOKUP.

Conditional Logic

IF statements allow analysis to respond to conditions in the data. In compliance work, this might mean classifying clients into risk tiers based on transaction volume, or flagging records where a required document field is empty.

=IF(D2>1000000,"High Risk",IF(D2>100000,"Medium Risk","Low Risk"))
Enter fullscreen mode Exit fullscreen mode

IFS simplifies the logic when there are multiple conditions to evaluate, avoiding deeply nested IF functions that become difficult to read and maintain.

Pivot Tables

Pivot tables are the fastest route from a large dataset to a summary. In a compliance tracker with hundreds of partner records, a pivot table can show the count of partners by risk tier, by country, and by document status in seconds — without touching the original data.

The key discipline is keeping the source data clean and consistently structured. A pivot table is only as reliable as the data feeding it. When I build reporting templates, the source data tab and the pivot summary tab are always separate, and the source data has validated, consistent entries.

Pivot tables are also the starting point for dashboards — once the summary logic is working correctly in a pivot, the chart built from it will update automatically when new data is added.


Formula Quick Reference

Category Formula What It Does Where I Use It
Statistical =AVERAGE(range) Calculates mean values Client KPI dashboards
Statistical =COUNT / COUNTA Counts entries in a range Onboarding completion tracking
Lookup =VLOOKUP / XLOOKUP Retrieves data across tables Matching partner records in KYC files
Logical =IF / IFS Returns results based on a condition Risk-tier classification
Text =TRIM / PROPER Cleans and normalises text Standardising client name fields
Date =DATEDIF / TODAY() Calculates date intervals Monitoring SLA and review deadlines
Aggregation =SUMIF / COUNTIF Conditional sum or count Flagging overdue compliance cases

Data Visualisation: Making Findings Accessible

Not everyone who needs to understand the data wants to scroll through a spreadsheet. Charts and dashboards translate findings into a form that supports faster decisions in meetings, presentations, and reports.

Chart Types and When to Use Them

Bar and Column Charts — suited for comparing values across categories, such as monthly onboarding volumes across different markets or contrasting resolution times across client segments.

Line Charts — best for showing change over time. In operational reporting, line charts work well for tracking weekly ticket volumes, daily transaction counts, or month-on-month revenue trends.

Pie Charts — useful for showing proportional composition, such as the share of total cases by risk category. They become harder to read with more than five or six segments.

Scatter Plots — helpful when exploring relationships between two variables, for example, whether higher transaction values correlate with longer onboarding times.

Building a Dashboard

A functional Excel dashboard connects pivot tables and charts into a single view, using slicers and dropdown controls to filter without modifying the underlying data. Keep the layout consistent, limit the number of metrics on a single page, and make the filters obvious to someone who did not build it.


Closing Thoughts

Excel rewards the person who takes the time to understand it properly. The gap between someone who uses Excel to store numbers and someone who uses it to drive decisions is not about knowing more functions — it is about understanding the data, asking the right questions, and building outputs that other people can trust and act on.

The tool itself is not the skill; the skill is knowing how to move from raw data to a conclusion that holds up under scrutiny. That is a habit worth building early, and Excel is still one of the better places to build it.

Top comments (0)