Introduction: The Dataset That Changed Everything
I will be honest. When I first opened the Jumia Kenya product dataset, I had no idea where to begin. There were 115 rows of product data, but the prices were buried inside text strings like "KSh 1,525", the ratings were written as "4.5 out of 5", the review counts were all negative numbers, and a full 50 per cent of the rows had no rating information at all. It looked less like a dataset and more like a problem waiting to punish me.
That experience — the confusion, the slow process of fixing each issue one by one, and the moment when the data finally came alive — is exactly what this article is about. Learning Excel data analysis did not just teach me a set of formulas. It changed the way I think, the way I approach problems, and the way I trust my own conclusions. This is my story of how that happened, told through the real data I cleaned, interpreted, and turned into a working dashboard.
Step One: Working on Messy Data
The first thing data analysis teaches you is that real-world data is rarely clean. Before I started this course, I assumed data analysis meant looking at neat tables and drawing insights. I was wrong. The majority of the work — and the majority of the learning — happens before a single chart is drawn.
The Jumia dataset had six specific problems I had to fix. Prices stored as text with "KSh" prefixes. Ratings are formatted as full English sentences. Review counts entered as negative numbers. One product with a price range instead of a single value. Discount percentages are stored as text with the "%" sign attached. And 58 products with no rating data at all. Every one of these problems required a deliberate Excel solution — and solving each one built a skill I now carry permanently.
=VALUE(SUBSTITUTE(SUBSTITUTE(A2,"KSh",""),",","")) // Strip KSh and commas from price text, then convert to a true number
=VALUE(LEFT(A2,3)) // Extract just the numeric rating from "4.5 out of 5"
=ABS(A2) // Convert negative review counts to positive values
=IF(ISBLANK(A2),"No Rating",IF(A2<3,"Poor",IF(A2<4.5,"Average","Excellent"))) // Classify ratings — handling missing values gracefully
What surprised me most was not the difficulty of these fixes — each one is straightforward once you know the function. What surprised me was how much the data changed once they were applied. The column that had looked like a wall of meaningless text suddenly became a sortable, calculable, chart-ready list of numbers. That transformation — from noise to signal — is what data cleaning actually means. And experiencing it firsthand is something no textbook explanation can fully replicate.
Skill Gained: I now automatically scan any new dataset for these six issues before doing anything else. It takes five minutes and saves hours of confusion later.
Step Two: Creating Meaning with Formulas
Once the data was clean, the next skill I developed was enrichment — the process of deriving new columns that reveal information the raw data does not directly show. This is where Excel formulas began to feel genuinely powerful to me, because I was no longer just correcting errors. I was creating knowledge.
I added three new columns to the dataset. First, a Discount Amount (KES) column calculated by subtracting the current price from the original price — revealing that a 64% discount on a KES 199 item saves only KES 354, while a 39% discount on a KES 3,750 drill saves KES 2,393. Percentage figures alone had been hiding this distinction entirely.
Second, a Rating Category column using an IFS formula to classify products as Poor (below 3), Average (3 to 4.4), or Excellent (4.5 and above). Third, a Discount Category column grouping products into Low, Medium, and High discount tiers. These two columns became the foundation of almost every comparison in my final analysis.
=IFS(D2<3,"Poor",D2<4.5,"Average",D2>=4.5,"Excellent
=IFS(C2<20,"Low Discount",C2<=40,"Medium Discount",C2>40,"High Discount")
This step taught me something important about data analysis: the raw data rarely tells the whole story. The enriched data does. A number like 3.7 says very little on its own. The label "Average" is placed alongside it, in context with 114 other products.
Step Three: Charts
After cleaning and enriching the data, I ran a full descriptive analysis using AVERAGE, COUNTIF, AVERAGEIF, and CORREL functions. But the moment the analysis truly came alive was when I built the visualizations. The charts below were produced directly from the cleaned Jumia dataset, and each one taught me something that the tables had kept hidden.
This chart showed me immediately that 65 out of 115 products — more than half — carry a discount above 40%. At first, I assumed this meant they were the best-performing products.
The rating category chart was the most visually striking finding of the entire analysis. The grey "No Rating" segment — representing 50% of all products — dominates the chart. This is not just a design choice; it is a data quality alarm. Half the products in this dataset have never been reviewed. Any conclusion I draw about ratings applies only to the other half, and I must clearly state this every time I present findings. Learning to read that caveat into a chart — and to communicate it honestly — felt like a genuine step forward as an analyst.
The top 10 discount chart delivered a surprise. The highest-discounted products are not expensive electronics or premium appliances. They are small everyday items — a bottle opener, a keychain, crochet needles, a pillow case. The product with the single highest discount in the entire dataset (64% off) costs just KES 199. That is a powerful reminder that percentage discounts and absolute value are entirely different things — a lesson I learned from the data, not from a textbook.
Figure 4: Average rating and average reviews by discount category— Finding that medium-discount products outperform high-discount ones on both measures
This final chart is the one I am most proud of, because it contradicts the most natural assumption in the entire dataset. I expected high-discount products to have the most reviews and the highest ratings — more discounts should mean more buyers, and more buyers should mean more reviews. The data said the opposite. Medium-discount products (20–40% off) had an average rating of 4.28 and 15.3 reviews. High-discount products rated only 3.61 and averaged 11.1 reviews. The correlation between discount percentage and reviews was just −0.14 — essentially zero. Higher discounts do not drive customer engagement. Product quality does.
Products Analysed-115
Avg Current Price-1174
Avg Discount-36.96%
Avg Rating-3.89/5
Discount vs Reviews -0.14
How This Has Made Me a Better Analyst — and a Better Thinker
Working through this project from raw CSV to finished dashboard gave me five concrete skills that I did not have before, and that I now use every time I open a spreadsheet.
1) I know how to diagnose a dataset before touching it. I look for text-formatted numbers, missing values, inconsistent categories, and impossible values. This takes a few minutes and prevents hours of errors.
2) I can build formulas that are self-explanatory. Using SUBSTITUTE, VALUE, ABS, IFS, and AVERAGEIF in combination means my spreadsheets document their own logic — anyone can click on a cell and understand what it is doing.
3) I have learned to separate what the data says from what I expected it to say. The correlation result of −0.14 was not what I hoped to find. But it was the truth, and presenting it honestly is more valuable than confirming a comfortable assumption.
4) I now understand that a chart is not decoration. Every chart in this article was chosen because it revealed something the table could not — the dominance of high-discount products, the alarming data gap in ratings, the counterintuitive performance of medium-discount items.
5) I have learned to communicate data quality honestly. The 50% missing rating data is not something to hide. It is something to declare prominently — because a reader who does not know about it cannot properly evaluate the conclusions.
“Excel did not just teach me to use formulas. It taught me to be honest about what the data can and cannot tell me.”
Conclusion: The Spreadsheet That Taught Me to Think
The Jumia dataset I cleaned, enriched, and analysed for this assignment started as 115 rows of messy, inconsistent, half-missing data. It ended as a fully formatted Excel dashboard with four original charts, a complete data cleaning log, descriptive analysis across five sheets, and a set of findings that challenged my assumptions at every turn.
That journey — from confusion to clarity — is what data analysis education is really about: memorising which function does what, but developing the habit of questioning your data, verifying your assumptions, and following the evidence wherever it leads. Excel gave me the tools. The Jumia dataset gave me the practice. And this course gave me the framework to put it all together.
I started this programme not knowing what VLOOKUP was. I am finishing it knowing how to clean a real-world dataset, derive meaningful categories from raw numbers, identify counterintuitive patterns using correlation, build charts that tell honest visual stories, and present analysis with the appropriate caveats about data quality. Those are not just Excel skills. They are thinking skills. And I will carry them into every dataset, every report, and every decision I face from here on.



Top comments (0)