DEV Community

Marvin Ewarn Okwaro
Marvin Ewarn Okwaro

Posted on

Exploratory Data Analysis, The Ultimate Guide (With Python)

What id EDA?
Exploratory Data Analysis (EDA) involves Data Analytic Process used to understand data in depth, learn the different characteristics of data (data visualization) and finding useful patterns in data.

Why perform EDA
There are several reasons for performing EDA on a data set. These include:

  • Removing any irregularities and unnecessary values in the data set, identifying faulty points and noise in data early
  • Preparing the dataset for analysis -Allowing machine learning models to better predict the dataset -Getting more accurate results from a dataset -EDA also helps in choosing a better Machine Learning model -We can use EDA to filter for redundancies -EDA can help stakeholders to know if they are asking the right questions

There are three major steps involved in EDA;

  1. Understanding the data Here we get to understand the variables in the data, and also know parameters such as the number of columns and rows.
  2. Cleaning the data In this step, we get to remove any outliers, any irregularities and any none-useful parts/fields that may affect the end model. Outliers include datasets that fall outside/ differ significantly from the main observations.
  3. Analyzing relationship between variables. We can use tools such as Co-Relation matrix, which is a table showing the co-relation co-efficient between variables, each cell showing the relation between 2 variables.

Using Python To Perform EDA

Assuming you have Python installed, here are some of the steps and code samples used in performing EDA using Python Programming Language.

Libraries
There are several libraries that may be used to perform EDA:

Top comments (0)