DEV Community

Mutheu
Mutheu

Posted on • Updated on

AWESOME CHOCOS EDA USING SQL WORKBENCH.

One of the main skills one should have to become a successful data analyst is Structured Query Laanguage or SQL, this is a query language used in analysing large datasets.If you are a beginner,the best place to begin is knowing the basics of SQL, the SELECT, FROM and WHERE clause statements and maybe advance to JOINS, AGGREGATE FUNCTIONS such as SUM,COUNT,AVG,MAX,MIN etc. CTE(Common Table Expressions) the use of WITH.
Enter fullscreen mode Exit fullscreen mode

In this article, I will talk about a dataset of Awesome Chocolates.You can find this dataset here;(https://files.chandoo.org/sql/awesome-chocolates-data.sql)
The dataset contains 4 tables; Geo, People, Products and Sales.
The data was clean;meaning there was no null values ,duplicates and no formatting was needed but I'll soon do an article about data cleaning using SQL.

  1. I queried a show table; to show me all the tables in Awesome Chocolates --

show tables;

```# Tables_in_awesome chocolates
'geo'
'people'
'products'
'sales'
The results



2. I wanted to see the table fields in each table:

For geo table:
`SELECT *
FROM geo;
`



Enter fullscreen mode Exit fullscreen mode

GeoID, Geo, Region

'G6', 'UK', 'Europe'
'G5', 'Australia', 'APAC'
'G4', 'New Zealand', 'APAC'
'G3', 'Canada', 'Americas'
'G2', 'USA', 'Americas'
'G1', 'India', 'APAC'



The results
-- Do the same for the rest of the tables i.e people, products and sales.
PERFOMING EDA( EXPLOTARY DATA ANALYSIS)
a.)The first question I answered was what top 10 products are making the most sales;

`SELECT p.Product, s.Amount,p.Category,s.PID
FROM products AS P
INNER JOIN sales AS s
ON p.pid = s.pid
ORDER BY s.Amount DESC
LIMIT 10;`



Enter fullscreen mode Exit fullscreen mode

Product, Amount, Category, PID

'Almond Choco', '27146', 'Bars', 'P03'
'Spicy Special Slims', '25207', 'Bites', 'P10'
'Peanut Butter Cubes', '24633', 'Bites', 'P22'
'After Nines', '24451', 'Bites', 'P11'
'Organic Choco Syrup', '24367', 'Other', 'P16'
'Milk Bars', '23912', 'Bars', 'P01'
'99% Dark & Pure', '23268', 'Bars', 'P08'
'Smooth Sliky Salty', '23184', 'Bars', 'P20'
'Milk Bars', '22897', 'Bars', 'P01'
'Almond Choco', '22715', 'Bars', 'P03'



b.)Which top 10 regions had the highest sales
`SELECT g.GeO, s.Amount, g.GeoID
FROM Geo AS g
INNER JOIN sales AS s
ON g.GeoID = s.GeoID
ORDER BY s.Amount DESC
LIMIT 10;`



Enter fullscreen mode Exit fullscreen mode

GeO, Amount, GeoID

'New Zealand', '27146', 'G4'
'Australia', '25207', 'G5'
'Australia', '24633', 'G5'
'India', '24451', 'G1'
'USA', '24367', 'G2'
'New Zealand', '23912', 'G4'
'New Zealand', '23268', 'G4'
'New Zealand', '23184', 'G4'
'India', '22897', 'G1'
'Canada', '22715', 'G3'



The Results
c.)Top 10 salesperson by sales
`SELECT pp.Salesperson, s.Amount, pp.SPID
FROM people AS pp
INNER JOIN sales AS s
ON pp.spid = s.spid
ORDER BY s.Amount DESC
LIMIT 10;`



Enter fullscreen mode Exit fullscreen mode

Salesperson, Amount, SPID

'Van Tuxwell', '27146', 'SP23'
'Madelene Upcott', '25207', 'SP18'
'Gunar Cockshoot', '24633', 'SP03'
'Madelene Upcott', '24451', 'SP18'
'Madelene Upcott', '24367', 'SP18'
'Roddy Speechley', '23912', 'SP24'
'Rafaelita Blaksland', '23268', 'SP17'
'Andria Kimpton', '23184', 'SP09'
'Dotty Strutley', '22897', 'SP14'
'Gunar Cockshoot', '22715', 'SP03'



The results

Conclusion: Almond choco and Milk bars should be in plenty because they are the most sold products, they appear twice in the top 10 sold products.
New Zealand and Australia are most leading geographies in terms of sale amount.

The whole sql dataset queries can be found on my GitHub account:
[](https://github.com/mutheu-eng/SQL-PROJECTS)  

You can find more content on Data Analytics by following me on Linkedin.
[](naomi-mutheu39001430)





Enter fullscreen mode Exit fullscreen mode

Top comments (0)