DEV Community

Cover image for Exploratory Data Analysis using Data Visualization Techniques.
Edina Bwari
Edina Bwari

Posted on • Updated on

Exploratory Data Analysis using Data Visualization Techniques.

The better you know your data the better is your analysis. Data needs to be analyzed so as to produce good results. Exploratory data analysis (EDA) is an approach to analyze and summarize data in order to gain insights and identify patterns or trends. It is often the first step in data analysis and is used to understand the structure of the data, detect outliers and anomalies, and inform the selection of appropriate statistical models.

Objectives of EDA.

  1. Confirm if the data is making sense in context of the business problem.
  2. It uncovers and resolves data quality issues like missing data, duplicate and incorrect values.
  3. Data scientists can use exploratory analysis to ensure the results they produce are valid and applicable to any desired business outcomes and goals.
  4. EDA helps stakeholders by confirming they are asking the right questions.
  5. EDA can help answer questions about standard deviations, categorical variables, and confidence intervals.

Types of exploratory data analysis.

EDA can be classified into two category this is graphical and non-graphical each having Univariable and multivariable type.
four types of EDA
Univariate non-graphical.
Data being analyzed consists of just one variable and it doesn’t deal with causes or relationships. The main purpose of univariate analysis is to describe the data and find patterns that exist within it.
Univariate graphical.
They provide a full picture of the data. Common types of univariate graphics include: Stem-and-leaf plots, Histograms and box plots.
Multivariate non graphical.
Multivariate data arises from more than one variable. Multivariate non-graphical EDA techniques generally show the relationship between two or more variables of the data through cross-tabulation or statistics
Multivariate graphical.
Multivariate data uses graphics to display relationships between two or more sets of data. Example is a grouped bar plot or bar chart.

Exploratory Data Analysis Tools.

In this article I will only focus on Python: We used python programming language for exploratory data analysis. Python offers a variety of libraries and some of them uses great visualization tool. Visualization process can make it easier to create the clear report.
To use python for EDA here are some of the steps you will use;
Step 1: Imports and Reading Data.

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
plt.style.use('ggplot')
pd.set_option('max_columns', 200)
df = pd.read_csv('filename.data.csv')
Enter fullscreen mode Exit fullscreen mode

With these code and libraries imported, you're ready to start working with data and creating visualizations in your Python environment. Make sure you have the necessary data loaded and continue with your data analysis and visualization tasks.
Step 2: Data Understanding.
This involves getting a grasp of the data you're working with, its characteristics, structure, and content. Here are some of the ways to archive data understanding using python code.

  • Dataframe shape df.shape
  • head and tail df.head(5)
  • dtypes df.dtypes
  • describe df.describe()

Step 3: Data Preparation.
In this step you will be focusing on dropping irrelevant columns and rows, identifying duplicated columns etc. In this phase, you transform and clean the raw data to make it suitable for analysis.

Step 4: Feature Understanding.
This step falls into Univariate analysis which involves creating, selecting, and transforming features (variables or attributes) in your dataset to improve the performance and interpretability of machine learning models or enhance the effectiveness of data analysis. Thus, plotting Feature Distributions, Histograms, KDE and Boxplot.

Step 5: Feature Relationships.
Here, you will be focusing on understanding how different features (variables) in your dataset relate to each other. This step helps you uncover patterns, dependencies, and interactions between features, which can be valuable for model building, feature selection, and gaining insights from your data. In this step you will be able to come up with Scatterplot, Heatmap Correlation, Pair plot and Group by comparisons.

Top comments (0)