Uni-variate data involves only one variable (feature/column) at a time.
Definition of Univariate Data
Univariate data is data that contains only one variable (one feature or one characteristic) collected from multiple observations.
π The word βuniβ means one.
π So, univariate = one variable.
Simple Definition:
Univariate data is a type of data where analysis is done on a single variable without considering relationships with other variables.
2οΈβ£ What is a Variable?
A variable is any measurable characteristic that can take different values.
Examples of Variables:
Age
Height
Salary
Marks
Temperature
Gender
If we analyze only one of these at a time, it becomes univariate data.
3οΈβ£ Examples of Univariate Data
Example 1: Student Marks
Student Marks: A75 B82 C60 D90
β Only Marks is analyzed
β No comparison with other variables
β‘ This is univariate numerical data
Example 2: Gender of Employees
Employee Gender 1.Male 2.Female 3.Male
β Only Gender
β‘ This is univariate categorical data
Examples:
Age of customers
Salary of employees
Marks of students
Daily temperature
π No relationship with other variables is studied here.
2οΈβ£ What is Exploratory Data Analysis (EDA)?
EDA is the process of:
Understanding data
Summarizing data
Finding patterns, trends, and anomalies
Detecting outliers and errors
before applying machine learning or statistical models.
3οΈβ£ What is Uni-variate Graphical EDA?
Uni-variate Graphical EDA uses graphs and plots to visually analyze one variable.
Purpose:
β Understand data distribution
β Identify outliers
β Detect skewness
β Find data spread
β See frequency patterns
4οΈβ£ Why Use Graphical Methods?
Humans understand visuals faster than numbers
Easy to detect patterns & anomalies
Simplifies complex datasets
Essential first step in Data Science workflows
5οΈβ£ Types of Uni-variate Graphical EDA
Uni-variate graphical methods depend on data type: Data Type Common Graphs Categorical Bar Chart, Pie Chart Numerical Histogram, Box Plot, Density Plot
π A. Bar Chart (Categorical Data)
πΉ Definition:
A bar chart shows frequency or count of each category.
πΉ Example:
Gender = {Male, Female}
Department = {HR, IT, Sales}
πΉ Interpretation:
Height of bar β frequency
Taller bar β more observations
πΉ What We Learn:
β Most frequent category
β Least frequent category
β Class imbalance (important in ML)
πΉ Advantages:
Simple & clear
Best for discrete categories
πΉ Limitations:
Not suitable for continuous data
π B. Pie Chart (Categorical Data)
πΉ Definition:
Shows percentage contribution of each category.
πΉ Example:
Market share of companies
πΉ Interpretation:
Each slice represents proportion
Total = 100%
πΉ What We Learn:
β Relative proportion
β Contribution comparison
πΉ Limitations:
β Difficult with many categories
β Not good for precise comparison
π In Data Science, bar charts are preferred over pie charts.
π C. Histogram (Numerical Data)
πΉ Definition:
Histogram shows frequency distribution of numerical data using bins.
πΉ Example:
Marks of students
Salary distribution
πΉ Key Components:
X-axis β Value ranges (bins)
Y-axis β Frequency
πΉ What We Learn:
β Data distribution shape
β Skewness (Left / Right / Symmetric)
β Central tendency
β Presence of outliers
πΉ** Types of Distribution:**
Normal (Bell-shaped)
Right-skewed (Positive skew)
Left-skewed (Negative skew)
Uniform
πΉ Importance in ML:
Many ML algorithms assume normal distribution.
π D. Box Plot (Numerical Data)
πΉ Definition:
Box plot summarizes data using five-number summary:
Minimum
Q1 (First Quartile)
Median
Q3 (Third Quartile)
Maximum
πΉ Visual Elements:
Box β IQR (Q3 - Q1)
Line inside box β Median
Dots outside β Outliers
πΉ What We Learn:
β Data spread
β Median position
β Outliers
β Skewness
πΉ Advantages:
Excellent for detecting outliers
Compact summary
πΉ Limitations:
Doesnβt show distribution shape clearly
π E. Density Plot (Numerical Data)
πΉ Definition:
Smooth curve showing probability density of data.
πΉ Difference from Histogram:
Histogram β bars
Density plot β smooth curve
πΉ What We Learn:
β Distribution shape
β Peaks (modes)
β Smooth visualization
πΉ Use Case:
Comparing distributions
Understanding continuous patterns
6οΈβ£ Skewness & Distribution Shape
Type Meaning Symmetric Mean β Median-Right Skewed-mean > Median Left Skewed Mean < Median
π Important for feature transformation (log, sqrt).
7οΈβ£ Outliers in Uni-variate EDA
What are Outliers?
Extreme values that differ significantly from others.
Detected Using:
Box plot
Histogram
Why Important?
β Can distort:
Mean
Variance
ML model performance
8οΈβ£ Role in Data Science & ML Pipeline
Uni-variate Graphical EDA helps to:
β Decide data cleaning strategy
β Choose transformations
β Identify feature issues
β Improve model accuracy
9οΈβ£ Real-World Example
Dataset: Student Marks
Histogram β Understand score distribution
Box plot β Detect very low/high scores
Bar chart β Grade distribution
π Before applying prediction models.
π Summary
Uni-variate Graphical EDA:
Focuses on one variable
Uses visual tools
Helps understand:
Distribution
Spread
Outliers
Skewness
Most Important Graphs:
β Bar Chart
β Histogram
β Box Plot
β Density Plot
Top comments (0)