DEV Community

julie
julie

Posted on • Updated on

I’m still hung up on p-value

If you are reading this, you are probably in the same boat as me; overwhelmed with all the statistical jargon. Why are there so many letter terminologies (p-value, z-score, z-test, f-test, t-test, etc.)?? For now, we will just focus on p-value.

First, let’s go over some definitions:

A null hypothesis is when we want to discredit an idea by assuming it is true and showing it is false with proof by contradiction. Null means nothing so the null hypothesis says there is no effect.

Example: There is no difference in test scores whether having studied vs. not studying.

An alternative hypothesis is the contradiction of the null hypothesis. It's usually the hypothesis we want to show.

Example: Studying for an exam will increase your test score vs. not studying.

The p in p-value stands for probability. A p-value is the probability of getting “extreme” data given that the null hypothesis is true. So, we would want to aim for a low p-value since we are trying to show the null hypothesis is false.

In the case of the example above, let’s say the p-value = 0.002 with an alpha value of 0.05. Here we are saying that there is a 0.2% chance of showing that there is no difference in test scores having studied or not.

Note: p-value does not tell us the probability that the null hypothesis is true or false! When we calculate our p-value, we have already assumed that our null is true.

Image description

This illustrates a right-sided p-value where T is the data and our observed value, T_o. [p-value = P(T ≥ T_o | null is true)]

Note: P-values are often two-sided. This allows us to reject the null hypothesis if our observed value significantly deviates from the mean, whether it's higher or lower.
[For two-sided: p-value = P(|T| ≥ |T_o| | null is true)]

Alpha Value
In null hypothesis significant testing, p-values need a cutoff to reduce the risk of drawing a false conclusion. This is known as the significance level aka the alpha value. Generally, we use an alpha value of 0.05.

A p-value less than the alpha value is sufficient evidence to allow us to “reject” the idea that the null hypothesis is true. When we reject the null hypothesis, we consider our result to be statistically significant. Statistical significance means it is not due to a random chance/luck in choosing the sample.

For example, alpha value = 0.05 means that there is a 5% chance of making the wrong conclusion under the true null.

How to select the alpha value:
An alpha value is between [0, 1].

It is just an arbitrary number but there are tradeoffs (so choose wisely):

  • If we select an alpha value that is too low, there is a higher risk of failing to reject the null hypothesis even though it is false.
  • If the alpha value is too high, there is a higher risk of rejecting the null hypothesis even though it is true.

Image description

This illustrates a type I error and a type II error.

Type I Error (aka false positive): Mistakenly rejecting the true null hypothesis, and accepting the alternative when it is false. We think we have detected an effect, but there isn’t one. The associated probability for this is alpha.

Type II Error (aka false negative): Mistakenly accept a false null hypothesis and reject the alternative when it is true. There was an effect, but we didn’t see it. The associated probability for this is beta.
Beta is related to something called power. Power = 1 - beta. It's the probability of rejecting the false null hypothesis.

Interpreting the p-value:

  • When p-value < alpha value: Reject the null hypothesis and in favor of the alternative hypothesis
  • When p-value >= alpha value: Fail to reject the null hypothesis

Is it possible to ever accept the null hypothesis?
There are only two options: either reject or fail to reject. This is because you can’t prove the null is true. It doesn’t mean that there isn’t an effect or relationship. We just did not have enough evidence to say there is one with certainty. The absence of evidence IS NOT the evidence of absence.

Source(s):
First image
Second image

Top comments (0)