STATISTICS FOR DATA ANALYTICS - 15
Covariance And Correlation : -
x={2,4,6,8}
y={3,5,7,9}
What is the relationship between x and y ?
Case of Relationship can be
x ⬆️ y ⬆️
x ⬆️ y ⬇️
x ⬇️ y ⬆️
x ⬇️ y ⬇️
Covariance ( x , y ) =
x ⬆️ y ⬆️ + ve covariance
x ⬇️ y ⬇️ + ve covariance
x ⬆️ y ⬇️ - ve covariance
x ⬇️ y ⬆️ - ve covariance
Covariance (x, x) = spread
Advantages :-
Relationship
Disadvantages :-
Covariance does not have a specific limit value.
To Overcome the disadvantages we use Correlation
Correlation Technique :-
We choose different correlation coefficient based on :-
Linearity of the relationship.
Level of measurement of variable i.e categorical or continuous
Distribution of data.
Pearson Correlation Coefficient ( -1 to 1)
The more towards the +1 value the more correlated the value.
The more towards the -1 value the less correlated the value.
It tells us the linear of the dataset.
Assumption to use Pearson :-
Both variable should be continuous and numerical
Data from both variables follow normal distributions.
Your data have no outliers.
Your data is from a random or representative sample.
You expect a linear relationship between the two variables.
Spearman Rank Correlation ( -1 to 1)
The more towards the +1 value the more correlated the value.
The more towards the -1 value the less correlated the value.
It tells us the monotonousness of the dataset, data can be both linear or non- linear.
.
This technique is used for feature selection in machine learning models.
Correlation coefficient
Type of relationship
Level of measurement of variable
Data distribution
Pearson’s r
Linear
Continuous variable
Normal distribution
Spearman’s rho
Non-linear
Categorical variable
Any distribution.
Follow me on this where every day will be added if i learn something new about it :- https://dev.to/nitinbhatt46
Thank you for your Time.
Top comments (0)