DEV Community

Kipkosgei
Kipkosgei

Posted on

Exploratory Data Analysis using Data Visualization Techniques.

Introduction

Think of what data is for a moment, from gathering it through sharing the final report with the domain or company to enable them make informed decision-making.

Data collection has its dos and don'ts, and in order to make sense of the data so that you may derive insights, a method known as exploratory data analysis (EDA) must be used.

EDA aids in learning more about the data set and discovering patterns, trends, and connections between them. In order to obtain high-quality data, it also helps to discover missing values, outliers, and anomalies.

We employ statistical functions and tools to complete all of this.

What is EDA

Exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main
characteristics, often using statistical graphics
and other data visualization methods this is according to wikipedia

With EDA, you may examine data, find patterns, anomalies, and outliers, as well as fill in missing values to get the proper data for insights and machine learning models (both supervised and unsupervised).

Steps of performing EDA

  1. Understanding Domain
  2. Get the right libraries and Read file
  3. Cleaning Data & Preparing data
  4. Feature Engineering 
  5. Data Visualization & Interpretation
Enter fullscreen mode Exit fullscreen mode

Tools Used For EDA

There are a several tools used to do EDA, including Tableau, Power BI, and Python libraries, among many others.

Python Libraries

  - Numpy
  - Pandas
  - Matplotlib
  - Seaborn
Enter fullscreen mode Exit fullscreen mode

Others

   Tableau
   PowerBI
Enter fullscreen mode Exit fullscreen mode

Statistical Function & Techniques for EDA tools

 - Clustering and Dimension
 - Univariate Viausalization
 - Bivariate Visualization
 - Multivaraite
 - K-means Clustering
 - Predictive Models
Enter fullscreen mode Exit fullscreen mode

To get more understanding of how this techniques read through IBM

Data Visualization in EDA

Graphical techniques:

  Box plot, Histogram, Multi-vari chart
  Run chart, Pareto chart, 
  Scatter plot (2D/3D),Heat map
  Bar chart
Enter fullscreen mode Exit fullscreen mode

Dimensionality reduction:

Multidimensional scaling
Principal component analysis (PCA)
Multilinear PCA
Nonlinear dimensionality reduction (NLDR)
Iconography of correlations
Enter fullscreen mode Exit fullscreen mode

Typical quantitative:

Median polish
Trimean
Ordination
Enter fullscreen mode Exit fullscreen mode

Conclusion

This article may not include much information, but it will offer you a general idea of what to have and when to use certain tools and basic actions. EDA is essential for understanding data insights and obtaining clean data for modeling (ML). Google is your friend outside of this, though.

  REMEMBER: "A picture is worth a thousand words"
Enter fullscreen mode Exit fullscreen mode

Top comments (0)