A Practical Guide to Hypothesis Testing I : Association - From Fisher’s Exact Test to the Chi-Square

In experimental research and A/B testing, analysts frequently compare two independent groups on a binary outcome such as success/failure, conversion/no conversion, or alive/dead. The situation has three ingredients:

Two independent groups (for example treatment and control)
A binary outcome variable (for example death / survival)
The comparison of proportions between groups

The data naturally form a 2×2 contingency table, and the central inferential question is whether any observed difference in proportions reflects a real effect or mere chance.

Two of the most prominent methods for answering this question are Fisher's Exact Test and the Chi-Square Test of Independence. While both tests address the same core hypothesis, they are founded on different statistical philosophies and assumptions. This article provides a practical guide to navigating this choice. We will demystify the underlying principles of each test, clarify their assumptions, and address common misconceptions.

To illustrate the practical application and differences between these tests, we will use a clinical trial example throughout this guide. Imagine a study comparing a new treatment to a control with a one-sided hypothesis $p_T < p_C$ (example adapted from MITx 6.419x):

	Treatment	Control
Death	39	63
Survive	30,961	30,937
Total	31,000	31,000

The observed mortality rates are:

Treatment: $\hat p_T = 39/31{,}000 \approx 0.126\%$
Control: $\hat p_C = 63/31{,}000 \approx 0.203\%$
Risk difference: $\hat p_T - \hat p_C \approx -0.077\%$

Although the treatment group shows a lower mortality rate, we must employ statistical testing to determine whether this difference is statistically significant or likely to have occurred by chance under the null hypothesis.

Fisher's Exact Test

Fisher's exact test is a non-parametric method that calculates the exact probability of observing a table as extreme as, or more extreme than, the one observed, given the fixed marginal totals. It is ideal for small sample sizes or rare events, because it does not rely on large-sample approximations.

The hypergeometric probability for observing exactly $a$ events in the treatment group is

P(A=a)=\frac{\binom{n_T}{a}\,\binom{n_C}{s-a}}{\binom{n}{s}}

where $n_T$ and $n_C$ are the treatment and control sample sizes, $s$ is the total number of events and $n=n_T+n_C$ .

For one-sided testing of $p_T < p_C$ we sum hypergeometric probabilities for tables with treatment-group counts less than or equal to the observed count.

from scipy.stats import fisher_exact

# Table format: [[Deaths_T, Survived_T], [Deaths_C, Survived_C]]
table = [[39, 30961],
         [63, 30937]]

odds_ratio, p_value_fisher = fisher_exact(table, alternative='less')

print("Fisher's exact test (one-sided, p_T < p_C):")
print("  Odds ratio:", odds_ratio)
print("  p-value:", p_value_fisher)

Assumption: Fisher's test assumes that both row and column margins are fixed. Under this model the total number of events and the group sizes are treated as fixed, and the probability of the observed cell counts follows a hypergeometric distribution. Fisher’s test gives exact p-values under this conditioning assumption.

Barnard's test

Barnard's test treats each group as an independent binomial experiment and constructs an exact test without conditioning on the column margin. In many small-sample settings it is more powerful than Fisher's test because it uses the larger sample space of possible outcomes.

Assumption: Barnard's test models each group as an independent binomial with probabilities $p_T$ and $p_C$ . It does not condition on the column margin (total events). As a result Barnard's test is unconditional and often has higher power than Fisher in small samples because it does not restrict attention to fixed column totals

The Chi-Square Test of Independence and the pooled Z-Test

The Pearson chi-square test and the pooled two-proportion z-test are asymptotic methods that test the hypothesis of equal proportions. For a 2×2 table, they are mathematically equivalent.

Pooled proportion

\hat p = \frac{a + c}{n_T + n_C}

Pooled standard error

SE_{\text{pooled}} = \sqrt{\hat p(1-\hat p)\left(\frac{1}{n_T} + \frac{1}{n_C}\right)}

Z-statistic for the difference in proportions (treatment minus control)

z = \frac{\hat p_T - \hat p_C}{SE_{\text{pooled}}}

Note on one-sided versus two-sided p-values

The pooled z-test can produce a one-sided p-value directly from the z-statistic.

The chi-square routine typically returns a two-sided p-value based on the chi-square upper tail. If the z-statistic has the direction you expect (for example negative when testing $p_T < p_C$ ), the one-sided p-value in that direction equals half the chi-square two-sided p-value. In other words, when the direction of the effect matches the alternative, one-sided p ≈ chi-square p / 2.

Treat counts as approximately normal via the central limit theorem when expected counts are sufficiently large. The pooled z-test uses a pooled variance estimate under the null. The chi-square compares observed to expected counts under independence.
These methods are fast and accurate for moderate to large samples, but they are approximations. The usual rule of thumb is expected counts at least 5, but this is a heuristic. For rare events or very skewed margins, check approximations against exact methods or simulation.

DEV Community

A Practical Guide to Hypothesis Testing I : Association - From Fisher’s Exact Test to the Chi-Square

Fisher's Exact Test

Barnard's test

The Chi-Square Test of Independence and the pooled Z-Test

Top comments (0)