DEV Community

Cover image for Building Dashboard: A Beginner's Case Study with E‑commerce Data
Yaw Opoku Mensah Baffoe
Yaw Opoku Mensah Baffoe

Posted on

Building Dashboard: A Beginner's Case Study with E‑commerce Data

I wanted to learn and grow as a data analyst by analysing a well known multi tabled dataset, to grow my portfolio and prove to myself and others my skills as a data analyst. I learned a lot along the way, here’s the step by step process I used, the tools involved, and the surprising insight I found.

Getting the data and defining the goal

The O-list dataset from Kaggle is one of the first big datasets I've ever seen or used, but I never went in depth like I would for this. I decided to use it because it had lots of data with multiple tables, it was far beyond the scope of skills that I thought clients would expect from me. I picked overarching questions that I thought would push me to develop as a data analyst.

  • Which product categories should Olist prioritize?
  • When should marketing be scheduled?

Cleaning and Organizing the Data with SQL

I used PostgreSQL to clean and organise the dataset. I made tables and uploaded the correct schema into the server and I used pgAdmin to query my database. Throughout the project I used smaller scale queries and excel to validate my SQL logic.

Calculating key metrics

I joined various tables depending on what question I was looking for. For example, when I wanted to know the top performing products, I joined the order_items, orders and products tables, and used aggregate functions when needed.

SELECT SUM(oi.price) AS total_revenue, n_trns.product_category_name_english,
SUM(oi.freight_value)
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
JOIN category_name_translation n_trns ON p.product_category_name = n_trns.product_category_name
GROUP BY n_trns.product_category_name_english
ORDER BY total_revenue DESC
LIMIT 10;

Enter fullscreen mode Exit fullscreen mode

I calculated other key statistics such as the most popular hours of business, and the order funnel, showing where the business would lose customers in the ordering process.

SELECT '1. Placed' as stage, COUNT(order_id) as count FROM orders WHERE order_purchase_timestamp IS NOT NULL
UNION ALL
SELECT '2. Approved' as stage, COUNT(order_id) as count FROM orders WHERE order_approved_at IS NOT NULL
UNION ALL
SELECT '3. Shipped' as stage, COUNT(order_id) as count FROM orders WHERE order_delivered_carrier_date IS NOT NULL
UNION ALL
SELECT '4. Delivered' as stage, COUNT(order_id) as count FROM orders WHERE order_status = 'delivered'
ORDER BY count DESC;

Enter fullscreen mode Exit fullscreen mode

Building the dashboard

The dashboard is organized into two pages: a high-level overview, and a deep dive into product performance.


In the top KPI cards show key metrics, like total revenue, and the week in which the most orders were made. The page also contains key graphs like the order funnel and the hour order count statistic.

The next page showcases the statistics the items are generating for the business. I researched and made an equation for theoretical profit based on the profit that businesses in the same sector are likely to make, making it 25% of the revenue. I also showcase the top performing items and the percentage of revenue each item brings in for the business. I also showcase the overall growth of the business overtime.

The Key Insight & Why It Matters

The first page shows that only roughly 6% of customers return for repeat purchases, this was easily my most surprising find. Why? Because a healthy business will retain roughly 30% of customers, higher end-businesses will want to retain closer to 40%, but despite the massive growth and revenue, the business is failing to bring customers back onto their website or into their shops. This would showcase a massive retention opportunity for the business.

Conclusion and takeaways

This project taught me that data analysis isn't just about writing SQL queries or building charts. It's about asking the right questions and translating numbers into business decisions.

The O-list dataset gave me a sandbox to practice every step of the analyst workflow. Cleaning messy data, joining multiple tables, validating results, designing a dashboard, and most importantly, finding an insight that actually matters.

That insight only 6% of customers return for a second purchase is not just a number. It's a signal. It tells the business that their marketing budget might be focused on acquiring new customers when the real opportunity is keeping the ones they already have.

For me, this project was proof. Proof that I can take a raw dataset, turn it into something useful, and deliver a clear recommendation. That's what I want to do for e‑commerce brands: help them see what's hidden in their data so they can grow smarter.

If you're an e‑commerce owner struggling to understand why customers don't come back, or a fellow data analyst just starting out, I'd love to connect. You can find my dashboard here and my portfolio on GitHub.

Thanks for reading. Let me know what you think.

Olist Sales Analysis

datastudio.google.com

Top comments (0)