Ephantus Macharia

Posted on Mar 27

From confusion to clarity;How Excel-Data analysis has Transformed my skills

#data #excell #productivity #beginners

Introduction

The Dataset That Changed Everything
I will be honest. When I first opened the Jumia Kenya product dataset, I had no idea where to begin. There were 115 rows of product data, but the prices were buried inside text strings like "KSh 1,525", the ratings were written as "4.5 out of 5", the review counts were all negative numbers, and a full 50 per cent of the rows had no rating information at all. It looked less like a dataset and more like a problem waiting to punish me.
That experience, the confusion, the slow process of fixing each issue one by one, and the moment when the data finally came alive is exactly what this article is about. Learning Excel data analysis did not just teach me a set of formulas. It changed the way I think, the way I approach problems, and the way I trust my own conclusions. This is my story of how that happened, told through the real data I cleaned, interpreted, and turned into a working dashboard.

Step One:

Working on Messy Data

What Real-World Data Taught Me First

Real-world data is rarely clean I learned that the hard way!
Data analysis means looking at clean and beautiful tables and extracting insights or so I thought!
Well, I was wrong most of the work happens before even a single chart is drawn!

The 6 Problems I Found in the Jumia Dataset
Prices were stored as text every price had a "KSh" prefix attached to it.
Ratings were written as sentences "4.5 out of 5" instead of just "4.5".
Review counts were negative every review count was entered as a minus value.
One product had a price range "KSh 1,620 – KSh 1,980" instead of a single value.
Percentage Discounts were stored as text "38%" instead of a real number.
58 products had no rating at all half of the products were blank for ratings!

What This Taught Me

Every single problem needed a deliberate solution in Excel.
Not only did I learn how to solve these problems, but I also learned how to solve them permanently!
The order in which I solved these problems also taught me how to diagnose the problem before even touching the data!.

=VALUE(SUBSTITUTE(SUBSTITUTE(A2,"KSh",""),",",""))   // Strip KSh and commas from price text, then convert to a true number
=VALUE(LEFT(A2,3))   // Extract just the numeric rating from "4.5 out of 5."

=ABS(A2)   // Convert negative review counts to positive values
=IF(ISBLANK(A2),"No Rating",IF(A2<3,"Poor",IF(A2<4.5,"Average","Excellent")))   //

What Actually Surprised Me About Data Cleaning

The difficulty was not the surprising part. Each fix is straightforward once you know the function
What surprised me is the amount of change the data underwent after the fixes The column of seemingly random characters, the wall of meaningless text, suddenly looked like: ✅ Sortable ✅ Calculable ✅ Chart-ready

The Real Meaning of Data Cleaning

Data cleaning is not about correcting errors. It is about transforming noise into a signal
The moment the data underwent its transformation is the moment I understood the true meaning of cleaning
No explanation or book can do justice to the experience of going through the transformation yourself

Skill Gained: What I Do Differently Now

I instinctively inspect every new dataset for the following six things before I do anything else
It takes five minutes to inspect
It saves hours of confusion
This is not something I learned by reading about it. This is something I learned by going through the experience once.

Step Two:

Creating Meaning with Formulas Data Enrichment

Clean data tells you what exists. Enriched data tells you what it means
This is where Excel formulas started feeling genuinely powerful
I was no longer just correcting errors; I was creating knowledge

New Columns I Added to the Dataset
Discount Amount (KES)

Formula: Current Price subtracted from Old Price
Reveals the real savings in shillings, not just a percentage
Why it matters:

64% off a KES 199 item = only KES 354 saved
39% off a KES 3,750 drill = KES 2,393 saved
Percentage figures alone were hiding this distinction entirely

Rating Category

Used an IFS formula to classify every product into a clear tier:

Poor — rating below 3
Average — rating between 3 and 4.4
Excellent — rating of 4.5 and above

Turns a raw number into a label anyone can read instantly

Discount Category

Grouped every product into one of three discount tiers:

Low Discount — below 20%
Medium Discount — between 20% and 40%
High Discount — above 40%

Why These Columns Mattered

Rating Category and Discount Category became the foundation of almost every comparison in the final analysis
Without them, grouping and comparing products would have required manual sorting every single time
With them, a single AVERAGEIF or COUNTIF formula answers any group-level question instantly

=IFS(D2<3,"Poor",D2<4.5,"Average",D2>=4.5,"Excellent
=IFS(C2<20,"Low Discount",C2<=40,"Medium Discount",C2>40,"High Discount")

This step taught me something important about data analysis: the raw data rarely tells the whole story. The enriched data does. A number like 3.7 says very little on its own. The label "Average" is placed alongside it, in context with 114 other products.

Step Three:

Charts

After cleaning and enriching the data, I ran a full descriptive analysis using AVERAGE, COUNTIF, AVERAGEIF, and CORREL functions. But the moment the analysis truly came alive was when I built the visualizations. The charts below were produced directly from the cleaned Jumia dataset, and each one taught me something that the tables had kept hidden.

This chart showed me immediately that 65 out of 115 products more than half carry a discount above 40%. At first, I assumed this meant they were the best-performing products.

The rating category chart was the most visually striking finding of the entire analysis. The grey "No Rating" segment representing 50% of all products dominates the chart. This is not just a design choice; it is a data quality alarm. Half the products in this dataset have never been reviewed. Any conclusion I draw about ratings applies only to the other half, and I must clearly state this every time I present findings. Learning to read that caveat into a chart and to communicate it honestly felt like a genuine step forward as an analyst.

The top 10 discount chart delivered a surprise. The highest-discounted products are not expensive electronics or premium appliances. They are small everyday items: a bottle opener, a keychain, crochet needles, and a pillowcase. The product with the single highest discount in the entire dataset (64% off) costs just KES 199. That is a powerful reminder that percentage discounts and absolute value are entirely different things, a lesson I learned from the data, not from a textbook

Average rating and average reviews by discount category

Finding that medium-discount products outperform high-discount ones on both measures
This final chart is the one I am most proud of, because it contradicts the most natural assumption in the entire dataset. I expected high-discount products to have the most reviews and the highest ratings. More discounts should mean more buyers, and more buyers should mean more reviews. The data said the opposite. Medium-discount products (20–40% off) had an average rating of 4.28 and 15.3 reviews. High-discount products rated only 3.61 and averaged 11.1 reviews. The correlation between discount percentage and reviews was just −0.14, essentially zero. Higher discounts do not drive customer engagement. Product quality does.

Products Analysed-115
Avg Current Price-1174
Avg Discount-36.96%
Avg Rating-3.89/5

How This Has Made Me a Better Analyst and a Better Thinker
Working through this project from raw CSV to finished dashboard gave me five concrete skills that I did not have before, and that I now use every time I open a spreadsheet.
1) Skills This Project Built

Diagnose the first scan for broken data before touching anything
Write self-explanatory formulas SUBSTITUTE, VALUE, ABS, IFS, AVERAGEIF
Trust the data, not your expectations correlation was −0.14, not what I hoped
Charts are not decoration every chart reveals what the table could not
Declare data gaps honestly 50% missing ratings must be stated. not hidden

The Spreadsheet That Taught Me to Think

Here I Started vs Where I finished
Started, finished 115 rows of messy data. A fully formatted Excel dashboard. Did not know VLOOKUP. Can clean, enrich, analyse, and visualise. Assumed data was neat. Know how to diagnose and fix real problems. Trusted percentages. Know how to check the numbers behind them

What This Project Actually Taught Me

Not just which function to use, but when and why
Not just how to build charts, but how to read and explain them
Not just Excel skills, but thinking skills

Things That Made It Real

Excel gave me the tools
The Jumia dataset gave me the practice
This course gave me the framework

I will carry these skills into every dataset, every report, and every decision I face from here on.

DEV Community