In this blog post, we will explore how to calculate the correlation coefficient between two arrays. The correlation coefficient is a statistical measure that helps us understand the strength and direction of the relationship between two variables. Ranging from -1 to +1, this coefficient indicates whether the variables are positively or negatively correlated.
Understanding the Correlation Coefficient
The correlation coefficient ( r ) can be calculated using the following formula:
r = (n * (Σxy) - (Σx)(Σy)) / sqrt([(n * Σx² - (Σx)²) * (n * Σy² - (Σy)²)])
Where:
- ( n ) is the number of pairs of scores,
- ( Σ xy ) is the sum of the product of paired scores,
- ( Σ x ) and ( Σ y ) are the sums of the individual scores,
- ( Σ x^2 ) and ( Σ y^2 ) are the sums of the squares of the scores.
Example Calculation
Let's consider the following dataset:
X | Y |
---|---|
15 | 25 |
18 | 25 |
21 | 27 |
24 | 31 |
27 | 32 |
From this data:
- ( Σ X = 105 )
- ( Σ Y = 140 )
Using the formula, we can calculate the correlation coefficient.
Example Inputs and Outputs
-
Input 1:
- ( X[] = {43, 21, 25, 42, 57, 59} )
- ( Y[] = {99, 65, 79, 75, 87, 81} )
Output 1:
0.529809
-
Input 2:
- ( X[] = {15, 18, 21, 24, 27} )
- ( Y[] = {25, 25, 27, 31, 32} )
Output 2:
0.953463
Python Program to Calculate the Correlation Coefficient
Here's how you can implement this in Python:
import math
# Function to calculate the correlation coefficient
def correlationCoefficient(X, Y, n):
sum_X = 0
sum_Y = 0
sum_XY = 0
squareSum_X = 0
squareSum_Y = 0
for i in range(n):
# Sum of elements in X
sum_X += X[i]
# Sum of elements in Y
sum_Y += Y[i]
# Sum of X[i] * Y[i]
sum_XY += X[i] * Y[i]
# Sum of squares of elements in X
squareSum_X += X[i] * X[i]
# Sum of squares of elements in Y
squareSum_Y += Y[i] * Y[i]
# Calculate the correlation coefficient
corr = (n * sum_XY - sum_X * sum_Y) / \
math.sqrt((n * squareSum_X - sum_X ** 2) * (n * squareSum_Y - sum_Y ** 2))
return corr
# Driver code
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
# Size of the array
n = len(X)
# Calculate and print the correlation coefficient
print('{0:.6f}'.format(correlationCoefficient(X, Y, n)))
Output
When you run the code, you should see:
0.953463
Complexity
- Time Complexity: ( O(n) ), where ( n ) is the size of the given arrays.
- Auxiliary Space: ( O(1) ).
This simple program effectively calculates the correlation coefficient, allowing you to analyze the relationship between two sets of data. Whether you're working on statistical analysis, financial data, or any other fields, understanding correlation is crucial for making informed decisions based on your data.
For more content, follow me at — https://linktr.ee/shlokkumar2303
Top comments (0)