DEV Community

ram vnet
ram vnet

Posted on

Statistics - Uni - variate Graphical Exploratory Data Analysis (EDA) :

Uni-variate data involves only one variable (feature/column) at a time.

Definition of Univariate Data
Univariate data is data that contains only one variable (one feature or one characteristic) collected from multiple observations.

πŸ‘‰ The word β€œuni” means one.
πŸ‘‰ So, univariate = one variable.

Simple Definition:
Univariate data is a type of data where analysis is done on a single variable without considering relationships with other variables.
2️⃣ What is a Variable?
A variable is any measurable characteristic that can take different values.

Examples of Variables:
Age
Height
Salary
Marks
Temperature
Gender
If we analyze only one of these at a time, it becomes univariate data.

3️⃣ Examples of Univariate Data
Example 1: Student Marks
Student Marks: A75 B82 C60 D90

βœ” Only Marks is analyzed
βœ” No comparison with other variables

➑ This is univariate numerical data

Example 2: Gender of Employees
Employee Gender 1.Male 2.Female 3.Male

βœ” Only Gender
➑ This is univariate categorical data

Examples:
Age of customers
Salary of employees
Marks of students
Daily temperature
πŸ‘‰ No relationship with other variables is studied here.

2️⃣ What is Exploratory Data Analysis (EDA)?
EDA is the process of:

Understanding data
Summarizing data
Finding patterns, trends, and anomalies
Detecting outliers and errors
before applying machine learning or statistical models.

3️⃣ What is Uni-variate Graphical EDA?
Uni-variate Graphical EDA uses graphs and plots to visually analyze one variable.

Purpose:
βœ” Understand data distribution
βœ” Identify outliers
βœ” Detect skewness
βœ” Find data spread
βœ” See frequency patterns

4️⃣ Why Use Graphical Methods?
Humans understand visuals faster than numbers
Easy to detect patterns & anomalies
Simplifies complex datasets
Essential first step in Data Science workflows

5️⃣ Types of Uni-variate Graphical EDA
Uni-variate graphical methods depend on data type: Data Type Common Graphs Categorical Bar Chart, Pie Chart Numerical Histogram, Box Plot, Density Plot

πŸ“Œ A. Bar Chart (Categorical Data)
πŸ”Ή Definition:
A bar chart shows frequency or count of each category.

πŸ”Ή Example:
Gender = {Male, Female}
Department = {HR, IT, Sales}

πŸ”Ή Interpretation:
Height of bar β†’ frequency
Taller bar β†’ more observations
πŸ”Ή What We Learn:
βœ” Most frequent category
βœ” Least frequent category
βœ” Class imbalance (important in ML)

πŸ”Ή Advantages:
Simple & clear
Best for discrete categories
πŸ”Ή Limitations:
Not suitable for continuous data
πŸ“Œ B. Pie Chart (Categorical Data)
πŸ”Ή Definition:
Shows percentage contribution of each category.

πŸ”Ή Example:
Market share of companies

πŸ”Ή Interpretation:
Each slice represents proportion
Total = 100%
πŸ”Ή What We Learn:
βœ” Relative proportion
βœ” Contribution comparison

πŸ”Ή Limitations:
❌ Difficult with many categories
❌ Not good for precise comparison

πŸ‘‰ In Data Science, bar charts are preferred over pie charts.

πŸ“Œ C. Histogram (Numerical Data)
πŸ”Ή Definition:
Histogram shows frequency distribution of numerical data using bins.

πŸ”Ή Example:
Marks of students
Salary distribution

πŸ”Ή Key Components:
X-axis β†’ Value ranges (bins)
Y-axis β†’ Frequency
πŸ”Ή What We Learn:
βœ” Data distribution shape
βœ” Skewness (Left / Right / Symmetric)
βœ” Central tendency
βœ” Presence of outliers

πŸ”Ή** Types of Distribution:**
Normal (Bell-shaped)
Right-skewed (Positive skew)
Left-skewed (Negative skew)
Uniform
πŸ”Ή Importance in ML:
Many ML algorithms assume normal distribution.

πŸ“Œ D. Box Plot (Numerical Data)
πŸ”Ή Definition:
Box plot summarizes data using five-number summary:

Minimum
Q1 (First Quartile)
Median
Q3 (Third Quartile)
Maximum
πŸ”Ή Visual Elements:
Box β†’ IQR (Q3 - Q1)
Line inside box β†’ Median
Dots outside β†’ Outliers
πŸ”Ή What We Learn:
βœ” Data spread
βœ” Median position
βœ” Outliers
βœ” Skewness

πŸ”Ή Advantages:
Excellent for detecting outliers
Compact summary
πŸ”Ή Limitations:
Doesn’t show distribution shape clearly
πŸ“Œ E. Density Plot (Numerical Data)
πŸ”Ή Definition:
Smooth curve showing probability density of data.

πŸ”Ή Difference from Histogram:
Histogram β†’ bars
Density plot β†’ smooth curve
πŸ”Ή What We Learn:
βœ” Distribution shape
βœ” Peaks (modes)
βœ” Smooth visualization

πŸ”Ή Use Case:
Comparing distributions
Understanding continuous patterns
6️⃣ Skewness & Distribution Shape
Type Meaning Symmetric Mean β‰ˆ Median-Right Skewed-mean > Median Left Skewed Mean < Median

πŸ‘‰ Important for feature transformation (log, sqrt).

7️⃣ Outliers in Uni-variate EDA
What are Outliers?
Extreme values that differ significantly from others.

Detected Using:
Box plot
Histogram
Why Important?
❗ Can distort:

Mean
Variance
ML model performance
8️⃣ Role in Data Science & ML Pipeline
Uni-variate Graphical EDA helps to:
βœ” Decide data cleaning strategy
βœ” Choose transformations
βœ” Identify feature issues
βœ” Improve model accuracy

9️⃣ Real-World Example
Dataset: Student Marks
Histogram β†’ Understand score distribution
Box plot β†’ Detect very low/high scores
Bar chart β†’ Grade distribution
πŸ‘‰ Before applying prediction models.

πŸ”Ÿ Summary
Uni-variate Graphical EDA:
Focuses on one variable
Uses visual tools
Helps understand:
Distribution
Spread
Outliers
Skewness
Most Important Graphs:
βœ” Bar Chart
βœ” Histogram
βœ” Box Plot
βœ” Density Plot

Read More...

Top comments (0)