DEV Community

Isaac
Isaac

Posted on

DATA EXPLORATORY ANALYSIS

Just as the name might suggest ,Data exploratory analysis is a method of data analysis that involves the highlighting of key insights from a dataset and incorporating data visualizing tools to display them.

Most datasets you will find you need to work on will need to be worked on before you can actually use them for the data analysis itself. This will mainly include the removal of row or columns with missing values, removal of duplicate value and other necessary data cleaning practices needed. First we will explore the pre-EDA activities and then the post-EDA activities.

PRE-EDA Activities

1. Data Cleaning
Raw data is mainly 'raw data' and needed to be 'cooked' before its actually ready for consumption :) . Data analysis is actually 80% data cleaning and data analysis that means that this is the most crucial step and need lots of attention so as to make sure the data being used is free from anomalies. Just as stated earlier data cleaning mainly involves the removal or correction of inaccurate records. The major data cleaning practices include:

  • Parsing-> Converting data to the required format acceptable to the application in use

  • Duplicate Elimination -> Removal of repeating entries of a key in the dataset

  • Removal or Updating of missing values -> Some entries might be blank and may need to be removed or filled systematically to ensure a true representation of the collected dataset as much as possible. If the missing values are too many you may consider removing them but if the number is minimal use various methods to provide them eg. use the median or the mode

  • Data Transformation -> This involve reformatting hoe the data had been represented to away that can be used by the analytics tool.

Data cleaning is mainly done using major programming languages like Python using Numpy and Pandas , MS Excel, Tableau and many more.

POST EDA ACTIVITIES

When now using EDA the analysis can be graphical or non-graphical in that it may contain a table with an analysis of lets say measures of central tendency such as mean, mode, median, or the quartiles of the data. Graphical ones can contains graphs, pie charts, histograms, line-graphs, funnels, scatter plots or any other graphical representation of the data.

NON-GRAPHICAL EDA
Non-Graphical EDA

Graphical EDA
Graphical EDA

Top comments (0)