DEV Community

somtochukwu ibuodinma
somtochukwu ibuodinma

Posted on

Short review of a sample sales dataset for retail outlets

Introduction

This dataset was written originally by Maria Carina Roldán and modified by Gus Segura. The dataset tends to collate sales data of some companies in countries spanning North America, Europe and Asia. It reflects the sales data of the documented companies from year 2003 to 2005. This review provides a cursory view of the dataset with the aim of gleaning initial insights. It serves as one of the requirement for the HNG 11 internship program, click here https://hng.tech/internship to learn more. The dataset can be found here https://www.kaggle.com/datasets/kyanyoga/sample-sales-data?resource=download

Structure of the dataset

This dataset has 25 columns and 2823 rows. It has a total of 16 categorical variables and 9 numerical variables.

Observations

Upon initial observations some anomalies were discovered and they are,

  1. The ORDERDATE variable was not properly formatted. Cell F53 has a number 38598 instead of a date.
  2. The dataset set was spanned 3 years (2003-2005)
  3. Some phone numbers are wrongly formatted for example cells O13-O15 are not in the accurate USA phone number format.
  4. There are 76 empty cells in the POSTALCODE column, 1486 in STATE column, and 16 inaccurate postal code (just had the number 2) for Ireland.

Insights and Trends

Based on the preliminary observation, certain insights and trends were seen. Notably, the US has the most orders placed followed by Spain and France. These accounted for more than 50% of the total orders placed.

Figure 1: A chart of Country versus total orders placed.
Figure 1: A chart of Country versus total orders placed.
Again, Classic cars and Vintage Cars accounted for most ordered product line. They accounted for 56% of the overall products ordered.

Figure 2: A graph of Product Line and the Quantity Ordered
Figure 2: A graph of Product Line and the Quantity Ordered

In addition, Using 2003 as a base year, there was 34.% increase in sales in 2004 and a sharp drop of 49% in 2005.

Figure 3: Sales Trend from 2003-2005
Figure 3: Sales Trend from 2003-2005

Finally, many companies did not sell at the Manufacturer’s Suggested Retail Price (MSRP) only about 27 orders were sold at the MSRP.

Conclusion

Effectively, several insights could be gleaned from the selected dataset, but to achieve this, a thorough cleaning and standardization process must be done to avoid wrong results. This approach can be done by asking some questions, dealing with blanks, removing duplicates and handling wrong formats and inaccurate data. Furthermore, further analytical procedures needs to be applied to provide better insights. Hopefully, more experience will be garnered from the HNG premium network https://hng.tech/premium

Top comments (0)