DEV Community

Kevin Kipkurui
Kevin Kipkurui

Posted on • Edited on

Exploratory Data Analysis using Data Visualization Techniques

To understand data, we must be able to clean and remove unnecessary noise as soon as raw data is received to convert the data into useful data and later use for presenting and predicting useful features. To be able to handle this, we use a method called Exploratory Data Analysis (EDA), which simply means describing data in the context of statistics (descriptive) and visualization to bring meaningful data that showcases the important aspects for further analysis. Data Visualization, on the other hand, is a technique or way of representing data in pictorial or graphical format, which is important in helping data scientists analyze data visually for patterns or trends.

EDA is very helpful in uncovering any Missing values, anomalies, duplicates, outliers, or incorrect data types while cleaning and preparing the raw datasets. This helps data scientists discover insights, spot anomalies and patterns/trends, test hypotheses, and assumptions.

EDA Is divided into four well-known types primarily:

  • Univariate non-graphical: This involves a data set with one variable; hence, no relationship can be analyzed from it. We can only find insights and patterns in it.

  • Univariate Graphical: Most common are:

    • Stem and leaf plots – show data values and shape of the distribution.
    • Histograms – Bar plots in which every bar represents frequency count for a value range.
    • Box plot – Graphically showcases a five-number summary of the minimum, first quartile, median, third quartile, and maximum.
  • Multivariate non-graphical: When there is more than one variable, it shows the relational statistics between two or more variables.

  • Multivariate Graphical: The use of graphical methods to display the relationship between two or more data sets. A grouped bar plot or bar chart is most commonly used, where each bar chart with each group represents one level of one of the variables, and each bar within a group represents levels of the other variable.

Exploratory Data Analysis uses Data Visualization to represent and present data, and to properly achieve these, there are steps and techniques required when using Python in this case:

  1. Understanding the why?: This will solve the business question, need, or problem and question at hand.

  2. Importing necessary libraries: Import the necessary libraries such as Pandas, Seaborn, and Matplotlib.

  3. Loading the dataset: Load the dataset that you want to visualize based on the requirement needs.

  4. Data Cleaning: This involves handling the missing values, removing outliers, and ensuring data quality across the data sets.

  5. Data Exploration and Summary: Examining summary statistics, visualizing data distributions, and identifying patterns or relationships.

  6. Feature Engineering: Transforming variables, creating new features, or selecting relevant variables for analysis.

  7. Data Visualization: Presenting insights through plots, charts, and graphs to communicate findings effectively about the data.

Top comments (0)