DEV Community

Shlok Kumar
Shlok Kumar

Posted on

Program to Find the Correlation Coefficient

In this blog post, we will explore how to calculate the correlation coefficient between two arrays. The correlation coefficient is a statistical measure that helps us understand the strength and direction of the relationship between two variables. Ranging from -1 to +1, this coefficient indicates whether the variables are positively or negatively correlated.

Understanding the Correlation Coefficient

The correlation coefficient ( r ) can be calculated using the following formula:

r = (n * (Σxy) - (Σx)(Σy)) / sqrt([(n * Σx² - (Σx)²) * (n * Σy² - (Σy)²)])
Enter fullscreen mode Exit fullscreen mode

Where:

  • ( n ) is the number of pairs of scores,
  • ( Σ xy ) is the sum of the product of paired scores,
  • ( Σ x ) and ( Σ y ) are the sums of the individual scores,
  • ( Σ x^2 ) and ( Σ y^2 ) are the sums of the squares of the scores.

Example Calculation

Let's consider the following dataset:

X Y
15 25
18 25
21 27
24 31
27 32

From this data:

  • ( Σ X = 105 )
  • ( Σ Y = 140 )

Using the formula, we can calculate the correlation coefficient.

Example Inputs and Outputs

  • Input 1:
    • ( X[] = {43, 21, 25, 42, 57, 59} )
    • ( Y[] = {99, 65, 79, 75, 87, 81} )
  • Output 1: 0.529809

  • Input 2:

    • ( X[] = {15, 18, 21, 24, 27} )
    • ( Y[] = {25, 25, 27, 31, 32} )
  • Output 2: 0.953463

Python Program to Calculate the Correlation Coefficient

Here's how you can implement this in Python:

import math

# Function to calculate the correlation coefficient
def correlationCoefficient(X, Y, n):
    sum_X = 0
    sum_Y = 0
    sum_XY = 0
    squareSum_X = 0
    squareSum_Y = 0

    for i in range(n):
        # Sum of elements in X
        sum_X += X[i]
        # Sum of elements in Y
        sum_Y += Y[i]
        # Sum of X[i] * Y[i]
        sum_XY += X[i] * Y[i]
        # Sum of squares of elements in X
        squareSum_X += X[i] * X[i]
        # Sum of squares of elements in Y
        squareSum_Y += Y[i] * Y[i]

    # Calculate the correlation coefficient
    corr = (n * sum_XY - sum_X * sum_Y) / \
           math.sqrt((n * squareSum_X - sum_X ** 2) * (n * squareSum_Y - sum_Y ** 2))

    return corr

# Driver code
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]

# Size of the array
n = len(X)

# Calculate and print the correlation coefficient
print('{0:.6f}'.format(correlationCoefficient(X, Y, n)))
Enter fullscreen mode Exit fullscreen mode

Output

When you run the code, you should see:

0.953463
Enter fullscreen mode Exit fullscreen mode

Complexity

  • Time Complexity: ( O(n) ), where ( n ) is the size of the given arrays.
  • Auxiliary Space: ( O(1) ).

This simple program effectively calculates the correlation coefficient, allowing you to analyze the relationship between two sets of data. Whether you're working on statistical analysis, financial data, or any other fields, understanding correlation is crucial for making informed decisions based on your data.

For more content, follow me at —  https://linktr.ee/shlokkumar2303

5 Playwright CLI Flags That Will Transform Your Testing Workflow

  • 0:56 --last-failed
  • 2:34 --only-changed
  • 4:27 --repeat-each
  • 5:15 --forbid-only
  • 5:51 --ui --headed --workers 1

Learn how these powerful command-line options can save you time, strengthen your test suite, and streamline your Playwright testing experience. Click on any timestamp above to jump directly to that section in the tutorial!

Top comments (0)

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

👋 Kindness is contagious

If this article connected with you, consider tapping ❤️ or leaving a brief comment to share your thoughts!

Okay