P-Value (Significane Value)

#pvalue #significancevalue #statistics #datascience

P-value is the probability b/w 0 and 1 that quantifies how confident we are that our Null Hypothesis is True. The larger the value the more confidence we are that our Null Hypothesis is true and vice versa.

Explanation

Say we have a mean of a traffic coming to your website, then you made some changes and want to know that mean of traffic changes or not. So, you will start by establishing the Null hypothesis and Alternative Hypothesis. Null hypothesis is default and Alternative hypothesis is something which we want to prove. In this case, Null Hypothesis is Average Traffic doesn't change and Alternative Hypothesis is on Average Traffic to the website increases.

We will decide the significance Level. Now, you in order to check your alternative Hypothesis hold or not you take in the sample and calculate it's Mean/Average for Traffic. Then, we will calculate the probability of getting that Mean/Average given that Null hypothesis is True this is nothing but p-value. If it's less than significance level we will reject the Null Hypothesis if not we will not reject the Null Hypothesis. Rejecting Null Hypothesis is same as saying that we are confident that taking the random sample again we will get almost the same value that is not too far from the mean which is usually 3 z-score (3 standard deviation for normal distribution).

Interpreting P-value

The close the p-value is to zero. The more confidence we will be that Null Hypothesis is True and Alternative Hypothesis is False.

If p-value is less than the significance value which is usually 0.05. Then, we say that event for which we are getting this value is much away from the mean and it's so much extreme that we really need to reject the Null hypothesis.

Here, the grey shaded area represent the area in which value falls if our p-value for an event falls under a significance level say 0.05 which is equivalent of saying that we will be rejecting the Null Hypothesis.

The p-value doesn't tell us far our value is from the actual value but only tells us how confidence we are on our value it's correct or not.

Significance Level

Usually, value for Significance level is 0.05. However, it may get suggested by a domain expert. This is also known as Decision Threshold.

If we can allow a greater number of False positive given our problem is not that sensitive we can have larger value for Significance Level such as 0.20. Similarly, if we have a sensitive problem such as predicting a cancer we will try to have a smaller value for Significance level such as 0.01.

Rejecting a null hypothesis at .01 level meaning that there is less than a 1 in 100 chance of observing a result in this range if the Null Hypothesis were true.