DEV Community

Sum Byron
Sum Byron

Posted on

CHI-SQUARE TESTS AND DEGREES OF FREEDOM

Chi-Square Tests and Degrees of Freedom — Explained with Football

When analyzing data in sports like football (soccer), we often want to answer questions like:

  • Is there a relationship between a team's playing style and their win rate?
  • Do red cards occur more frequently in away games than home games?
  • Is possession percentage independent of final match outcomes?

To answer these, the Chi-Square Test is one of the most powerful tools in the statistician’s playbook.


📊 What is a Chi-Square Test?

The Chi-Square Test is a statistical method used to test if there's a significant association between categorical variables. It compares the observed frequencies in a contingency table with the expected frequencies if the variables were independent.

🎯 Example: Home vs Away Red Cards

Let’s say we collect data on red cards in 100 football matches:

Red Card No Red Card Total
Home Team 20 30 50
Away Team 35 15 50
Total 55 45 100

You might ask: Is receiving a red card dependent on whether the team is playing home or away?

A Chi-Square Test of Independence helps us test that.


🧮 Chi-Square Formula

[
\chi^2 = \sum \frac{(O - E)^2}{E}
]

  • O = Observed frequency
  • E = Expected frequency

Expected values are calculated under the assumption of independence:

[
E_{ij} = \frac{\text{(Row total)} \times \text{(Column total)}}{\text{Grand total}}
]


🎓 Degrees of Freedom in Chi-Square Tests

To interpret a chi-square test, we need the degrees of freedom (df). This value determines the shape of the chi-square distribution used to calculate the p-value.

There are three common ways to calculate degrees of freedom depending on the context.


1. Contingency Table (Test of Independence)

Formula:

[
df = (r - 1) \times (c - 1)
]

  • r = number of rows (e.g., Home, Away)
  • c = number of columns (e.g., Red Card, No Red Card)

Football Example:

For the 2x2 table above:

[
df = (2 - 1) \times (2 - 1) = 1
]


2. Goodness-of-Fit Test

This checks if an observed frequency distribution matches an expected one. Often used when analyzing goal distribution patterns, or shot attempts across zones.

Formula:

[
df = k - 1
]

  • k = number of categories (e.g., zones on the pitch: left, center, right)

Football Example:

Suppose you're testing shot distribution from 3 zones:

  • Left wing
  • Center
  • Right wing

Then:

[
df = 3 - 1 = 2
]


3. Adjusted Degrees of Freedom with Estimated Parameters

If you're estimating parameters (e.g., mean, variance) before applying the test, you subtract those from the degrees of freedom.

Formula:

[
df = k - 1 - p
]

  • p = number of parameters estimated from the data

Football Example:

You’re testing whether shot conversions follow a known distribution, but you estimate mean shot conversion rate from your data.

If you had 4 zones and 1 parameter estimated:
[
df = 4 - 1 - 1 = 2
]


⚠️ Interpreting the Result

Once you calculate your chi-square statistic and degrees of freedom:

  • Use a chi-square distribution table or Python's scipy.stats.chi2.sf() to get the p-value.
  • If p < 0.05, reject the null hypothesis — there’s likely a relationship.

🧠 Final Whistle: Key Takeaways

  • Chi-Square Tests are great for analyzing football match events based on categories like home vs. away, win/loss, fouls, and more.
  • The degrees of freedom depend on the number of categories and whether you're estimating parameters.
  • Choose the correct formula based on your test type:
    • Independence: ((r - 1)(c - 1))
    • Goodness-of-fit: (k - 1)
    • Adjusted: (k - 1 - p)

Top comments (0)