Statistics Challenge for Data Scientists
Hypothesis testing sounds scary, but it’s basically a math way of asking:
“Is this thing really happening, or is it just random chance?”
You assume something is true → test it with sample data → decide if evidence is strong enough to reject it.
What is Hypothesis Testing?
Think of it like a court case:
| Term | Meaning (Simple) |
|---|---|
| Null Hypothesis (H0) | Default assumption. “Nothing has changed.” |
| Alternative Hypothesis (H1) | Opposite claim. “Something has changed.” |
| p-value | Probability that the result happened by chance. |
| Significance Level (α) | Cutoff (usually 0.05). If p < 0.05 → reject H0. |
| Test Statistic | A number calculated from data to judge the claim. |
Why Do We Use Hypothesis Testing?
You cannot test entire populations. So you take a sample and check if the sample result is strong enough to represent the population.
Examples:
- Does a new medicine work better than the old one?
- Is the average salary different in two cities?
- Is customer churn related to subscription type?
- Are two features correlated?
👉 Today’s Focus: T-Test and Chi-Square Test
1️⃣ T-Test (Also called Student’s t-test)
What does it check?
It checks whether the mean (average) of two groups is different.
When do we use it?
Use the t-test when:
- The variables are numerical
- Sample size is small (< 30)
- Population variance is unknown
Example (super simple)
You want to test if average marks of:
- Students in Class A
- Students in Class B are different.
Use a t-test.
What does T-test output mean?
If p < 0.05 → difference is real.
If p ≥ 0.05 → difference is probably due to chance.
2️⃣ Chi-Square (χ²) Test
What does it check?
It checks if two categorical variables are related.
Examples of categorical variables:
- Gender (Male/Female)
- Payment mode (UPI/Card/Cash)
- Pass/Fail
- Yes/No
When do you use Chi-square?
Use it when:
- Both variables are categories
- You want to test independence (“Are these two things connected or completely unrelated?”)
Example
You want to know if gender affects shopping preference.
| Gender | Likes Online | Likes Offline |
|---|---|---|
| Male | 35 | 25 |
| Female | 40 | 20 |
Use Chi-square.
Interpretation
- If p < 0.05 → the two variables are dependent (related). Example: Gender does affect preference.
- If p ≥ 0.05 → variables are independent (not related).
Summary Table (Easy to remember)
| Test | Use Case | Data Type | What It Checks |
|---|---|---|---|
| T-Test | Compare 2 groups’ means | Numerical | Difference in averages |
| Chi-Square | Check relation between categories | Categorical | Dependency / independence |
🧡 A Simple Visual View (Mental Model)
T-Test
Imagine two classrooms took the same exam.
You compare their average marks and ask:
“Is one class truly scoring higher, or is the difference just chance?”
Chi-Square
Imagine men and women choosing between online and offline shopping.
You ask:
“Is the choice different because of gender, or is it unrelated?”
Final Word
Hypothesis testing is not about proving you are right.
It is about checking whether the data strongly disagrees with the default assumption (H0).
If the disagreement is strong → H0 gets rejected.
I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!
Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots



Top comments (0)