Correlation and Covariance are two commonly used statistical concepts majorly wont to measure the linear relation between two variables in data. When wont to compare samples from different populations, covariance is employed to spot how two variables vary together whereas correlation is employed to work out how change in one variable affects the change in another variable. albeit there are certain similarities between these two mathematical terms, these two are different from one another . Read further to know the difference between covariance and correlation.
Covariance:
It is an indicator of the degree to which two variables change with reference to one another i.e.., it measures the direction of linear relationship between these two variables .The values of covariance can lies within the range of -? to +?
Here,
Xi – values of X variable
yj – values of Y variable
X?- mean of x variable
Y?- mean of y variable
N- Number of knowledge points ( n-1 for sample covariance)
Now let’s see the way to calculate an equivalent in python using inbuilt functions:
Here, Covariance for the variable itself is that the variance for an equivalent .
Correlation:
Correlation measures the strength and direction of linear relationship between two variables or we will say it’s a normalized version of covariance. By dividing the covariance with variance of the variables it scales down the range to -1 to +1 , comparatively correlation values are more interpretable.
As we will see from the formula itself, correlation is calculated from standardising covariance results; allow us to just execute an equivalent in python and see the difference.
Here , the correlation results on original data is analogous to covariance on standardized data ( with deviation in decimal values ) . For any of our applications like PCA , we will use either of them which yields an equivalent results. Alternatively, we will use function from NumPy modules also Covariance : numpy.cov(a,b) Correlation: numpy. corrcoef (a,b)
Difference between Correlation and Covariance:
Covariance is suffering from the change in scale as opposite to an equivalent correlation values aren't influenced by change in scale. Correlation values are dimensionless with unit free and scale free measure of strength and direction between two variables.
Conclusion:
Both covariance and correlation are closely associated with one another and differ tons when it involves making a choice between these two. Most of the analysts prefer correlation because it is more interpretable and can not be suffering from scale and units within the data.
For more info : https://www.excelr.com/blog/data-science/statistics-for-data-scientist/Correlation-vs-covariance#
Top comments (0)