DEV Community: Chanchal Singh

Day 5 : Is Your Model Actually Good? - Evaluation Metrics

Chanchal Singh — Thu, 22 Jan 2026 09:30:00 +0000

You prepare for an exam.

You give a mock test.
You get 72 marks.

Now the real question is not:

“Did I pass or fail?”

The real question is:

“How good is 72?”

Is it:

Better than before?
Good enough?
Just lucky?

That’s exactly what model evaluation is about.

Why We Need Evaluation

A model can always give predictions.

But prediction alone means nothing.

We must ask:

Can I trust this model?
Will it work on new data?
Is it learning patterns or memorizing data?

Evaluation answers these questions.

R-squared (R²): The Most Popular Metric

Imagine this.

You’re trying to predict house prices.

Before using ML, your best guess is:

“All houses cost around ₹50 lakh.”

That’s your baseline.

Now your model predicts different prices for different houses.

R² asks:

“How much better is your model compared to this dumb guess?”

R² in simple words

R² tells how much of the problem your model explains.

R² = 0.80 → model explains 80% pattern
R² = 0.20 → model explains very little
R² = 1 → perfect (rare, suspicious)

Important Truth About R²

High R² does not always mean good model.

Why?

It can overfit
It can memorize
It can fail on new data

That’s why we never trust R² alone.

Residuals: Listening to the Model’s Mistakes

Residual = actual value − predicted value.

Think of residuals as model’s complaints.

If residuals look:

Random → model is healthy
Patterned → model is missing something

Residual plots help us see:

“Is the model behaving logically?”

Standard Error (SE): How Confident Is the Model?

Imagine two friends predicting house prices.

Friend A:

Usually wrong by ₹5,000

Friend B:

Usually wrong by ₹50,000

Who do you trust more?

Standard Error tells:

“On average, how far predictions are from truth.”

Lower SE = more reliable model.

Train vs Test Performance (Very Important)

If:

Training accuracy is very high
Testing accuracy is low

That means:

Model memorized instead of learning.

This is how we detect overfitting.

Like a student who learns answers by heart but fails when questions change.

This problem is called overfitting — the model knows the past too well,
but can’t handle anything new.

Tiny Real-Life Thought 🧠

If someone always scores high in practice tests
but fails in the real exam —

you know something is wrong.

Same with ML models.

3-Line Takeaway

Evaluation tells if model is trustworthy
R² shows explained variation
SE shows prediction reliability

What’s Coming Next 👀

Now the big question:

Why do some models fail even when metrics look good?

That leads us to:

👉 Day 6 — Why Linear Regression Breaks (Assumptions & Multicollinearity)

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Day 4 : How Machines Learn From Their Mistakes

Chanchal Singh — Tue, 20 Jan 2026 09:30:00 +0000

Imagine you are standing on a hill at night 🌙.
It’s dark. Fog everywhere.

Your goal?

Reach the lowest point of the hill.

But there’s a problem:

You can’t see the whole hill
You can only see one step ahead

So what do you do?

You take a small step downwards.
Then another.
Then another.

Slowly… you reach the bottom.

That is Gradient Descent.

What Problem Is Gradient Descent Solving?

From Day 3, we learned:

Every model makes mistakes
Those mistakes are measured using loss

Now the big question is:

How does the model reduce this loss?

Answer:

By slowly adjusting itself in the right direction.

That adjustment process is called Gradient Descent.

Think Like the Model 🧠

The model keeps asking:

“Am I too high?”
“Am I too low?”
“Which direction reduces my mistake?”

Then it moves step by step to reduce loss.

Not randomly.
Not all at once.
Slowly and carefully.

What Is Actually Moving?

Remember the straight line from Day 2?

Chanchal Singh

Jan 17

Day 2 — Linear Regression: How a Straight Line Learns From Data

#datascience #machinelearning #beginners #ai

3 min read

That line depends on:

Coefficients
Intercept

Gradient Descent:

Tweaks these values
Checks loss again
Tweaks again

Until loss becomes as small as possible.

Learning Rate: Size of the Step 👣

Now comes an important choice.

How big should each step be?

That choice is called learning rate.

If the learning rate is too big 🚀

You jump too far.

Miss the bottom
Bounce around
Never settle

Like jumping down stairs instead of walking.

If the learning rate is too small 🐢

You move very slowly.

You’ll reach the bottom
But it’ll take forever

Like taking baby steps on a long road.

📌 Good learning rate = steady, confident steps

Why Feature Scaling Matters Here

Imagine walking downhill:

One step forward = 1 meter
One step sideways = 1 kilometer

Movement becomes awkward.

Same with data.

If one feature is very large and another is very small:

Gradient Descent struggles
Learning becomes slow or unstable

Feature scaling makes all features:

Speak the same language

When Gradient Descent Stops

Gradient Descent stops when:

Loss stops decreasing
Model is no longer improving

That point is called:

Minimum loss

That’s the “bottom of the hill”.

Tiny Thought Experiment 🧠

Trying to lose weight:

Sudden extreme dieting ❌
Slow, consistent effort ✅

Gradient Descent believes in consistency, not shortcuts.

3-Line Takeaway

Gradient Descent reduces loss step by step
Learning rate controls step size
Feature scaling helps learning move smoothly

What’s Coming Next 👀

Now the question becomes:

How do we know if the model we trained is actually good?

That’s where evaluation metrics come in.

👉 Day 5 — Is Your Regression Model Any Good? (Evaluation Metrics)

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Day 3 — Errors & Loss Functions: Measuring How Wrong a Model Is

Chanchal Singh — Mon, 19 Jan 2026 09:30:00 +0000

You’re trying to guess your monthly electricity bill.

You think:

“Maybe around ₹1,500 this month.”

The bill arrives.

Actual bill: ₹1,620

You smile and say:

“Hmm… close, but not exact.”

That gap between what you guessed and what actually happened
is called error.

So, What Is Error Really?

In simple human language:

Error is how far your guess is from reality.

That’s it.

Predicted number → your guess
Actual number → truth
Difference → error

Every prediction has an error.
Even humans make them.

Why Errors Are Normal (And Not a Problem)

Real life is not neat.

People behave differently
Weather changes
Markets move randomly

So expecting perfect predictions is unrealistic.

Machine learning doesn’t try to be perfect.
It tries to be less wrong every time.

Absolute Error: “Just Tell Me How Wrong I Am”

Imagine your friend asks:

“I don’t care if you guessed more or less.
Just tell me how off you were.”

That thinking is called Absolute Error.

If:

You predicted too high → error
You predicted too low → error

Only the size of the mistake matters.

One Guess Is Not Enough

Now imagine this:

You guessed the bill every month for a year.

Some months: Very close
Some months: Way off

Now the question becomes:

“Overall, how good are my guesses?”

To answer that, we need a single score.

That score is called a loss.

Loss Function: The Model’s Report Card

Think of a loss function like a report card.

It looks at all mistakes together
Gives one number
Lower number = better performance

Models don’t feel emotions.
They only understand numbers.

Loss tells them:

“You’re doing okay”
or
“You’re doing badly — improve.”

Mean Squared Error: Why Big Mistakes Hurt More

Now here’s the clever part.

Imagine two mistakes:

One mistake of ₹50
One mistake of ₹500

Which one should worry you more?

Obviously, ₹500.

Mean Squared Error (MSE) thinks the same way.

It:

Makes small mistakes small
Makes big mistakes very big

This forces the model to say:

“I must avoid big blunders.”

That’s why MSE is widely used in linear regression.
Not because it’s fancy.
Because it matches human common sense.

One-Line Memory Hook

"MSE shouts at big mistakes and whispers at small ones."

How This Chooses the Best Line

Remember the straight line for Linear Regression from Day 2?

Linear regression:

Tries many possible lines
Calculates loss for each line
Picks the line with lowest loss

That’s how the “best line” is chosen.

Not by looks.
By least mistake.

Tiny Thought Experiment 🧠

If your predictions are:

Always off by ₹20 → acceptable
Sometimes off by ₹500 → dangerous

Loss functions feel the same.

Final Takeaways (Remember These)

Error = mistake for one prediction
Loss = overall mistake score
MSE punishes big mistakes more

What’s Coming Next 👀

Now the big question:

How does the model actually reduce this loss?

That’s where training begins.

👉 Day 4 — Teaching the Model to Improve (Gradient Descent)

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Day 2 — Linear Regression: How a Straight Line Learns From Data

Chanchal Singh — Sat, 17 Jan 2026 09:30:00 +0000

Riya is in school.
Exams are coming.

Her elder sister notices something interesting.

Study Hours	Marks
1 hour	20
2 hours	40
3 hours	60

The sister laughs and says:

“Arre, the more you study, the more marks you get — very predictable!”

Without knowing it, Riya’s sister just did Linear Regression.

So… What Is Linear Regression Really?

Forget the big name.

Linear Regression simply means:

Finding a straight-line relationship between input and output.

In normal human language:

Input increases
Output increases (or decreases)
In a steady, predictable way

That steady behavior is the key.

Why a “Straight Line”?

Because life is sometimes simple.

Think about:

More work experience → more salary
Bigger house → higher price
More units used → higher electricity bill

Your brain already expects a straight pattern.

Linear regression just draws that pattern using data.

What Is the Model Actually Doing?

Imagine a board with many dots on it 📍
Each dot is one real example.

Linear regression’s job is:

“Let me draw ONE straight line that passes as close as possible to all these dots.”

Not touching every dot.
Not perfect.
Just the best overall line.

That’s it. That’s the model.

Simple vs Multiple Linear Regression

1. Simple Linear Regression

One input → one output

Example:

Hours studied → Marks

2. Multiple Linear Regression

Many inputs → one output

Example:

House size
Number of rooms
Location

→ House price

Same idea.
Just more information.

Coefficients — The Real Power

Imagine an HR manager deciding your salary.

She looks at two things:

Your experience
Your skills

But she doesn’t treat them equally.

Imagine this formula (don’t fear it):

Salary =
(Experience × 5000) + (Skills × 3000) + Base Pay

Those numbers 5000 and 3000 are called coefficients.

She thinks:

“Experience adds a lot of value.”
“Skills add value too, but a little less.”

Those hidden importance levels are called coefficients.

If something changes the salary more, it gets a bigger number.
If it changes the salary less, it gets a smaller number.

Just like cooking:

Salt affects taste a lot
Chili affects taste, but less

That’s why companies love linear regression.
It doesn’t just predict a number — it explains why that number makes sense.

Bigger coefficient = bigger influence.

Simple.

Intercept — The Starting Point

What if someone has:

0 experience
0 skills

Will salary be zero?

No.

There’s usually a base salary.

That base value is called the intercept.

In simple words:

Intercept is where the line starts.

Why Linear Regression Is Everywhere

Because it is:

Easy to understand
Fast to train
Easy to explain to managers
Very popular in interviews

Interview truth:

They don’t care if you remember the formula.
They care if you understand the behavior.

When This Straight Line Becomes a Bad Idea

Now imagine:

Salary jumps suddenly
Prices go up and down randomly
Data looks like curves

Trying to force a straight line there is like:

“Using a ruler to measure a circle.”

It won’t work well.

We’ll break this properly later.

Tiny Brain Exercise 🧠

Think about your monthly mobile bill.

More data used → higher bill
Less data → lower bill

You already expect a straight relationship.

That expectation is linear regression intuition.

3 Things You Must Remember

Linear regression fits a straight line
Coefficients show importance
Intercept is the starting value

What’s Coming Next 👀

Now that we have a line…

Big question:

How do we know if this line is good or terrible?

That’s where errors and loss functions enter.

👉 Day 3 — Errors & Loss Functions: Measuring How Wrong a Model Is

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Day 1: Regression — The Art of Prediction

Chanchal Singh — Fri, 16 Jan 2026 09:30:00 +0000

Imagine this 👇

You run a small chai stall ☕.
Every day people come and ask:

“Bhaiya, aaj kitni chai bikegi?”

You think for a second and say:
“Yesterday it was cold, more people came… today it’s sunny, maybe less.”

Without knowing it, you are already doing regression.

1️⃣ What is Regression?

Regression means:

Using past information to predict a number in the future.

That’s it. No fancy definition.

Examples:

Question	Type
Predict house price	Regression
Predict salary	Regression
Predict temperature	Regression
Predict pass/fail	❌ Not regression

👉 If the output is a NUMBER → it’s regression

2️⃣ Why Do We Need Regression?

Because humans:

Guess roughly
Forget patterns
Get biased

Machines:

Remember all data
See patterns clearly
Give consistent predictions

So we let the machine learn from past data and predict for us.

3️⃣ Input & Output

Think of regression like a juice machine

Part	ML Term
Fruits you put in	Input / Features
Juice you get	Output / Target

Example:

Inputs: House size, number of rooms, location
Output: House price

Regression learns:

“If inputs look like this → output is usually that”

4️⃣ Regression vs Classification

Regression	Classification
Predicts numbers	Predicts labels
Salary = ₹50,000	Spam / Not Spam
House price	Yes / No
Temperature	Pass / Fail

📌 Interview rule:

If output is continuous → Regression

5️⃣ Real-Life Use Cases

Field	Regression Use
Finance	Loan amount prediction
Healthcare	Recovery time
Real estate	House prices
E-commerce	Demand forecasting
Weather	Rainfall amount

Regression is everywhere, quietly working.

6️⃣ Supervised Learning

Imagine a child is learning maths.

The teacher:

Shows a question
Shows the correct answer
Corrects mistakes

Slowly, the child learns:

“When I see this kind of question, the answer is usually this.”

That’s supervised learning.

Now Apply This to Regression

In regression, the machine is the child.

We give the machine:

Inputs → house size, rooms, location
Correct output → actual house price

So the machine learns:

“When these inputs appear together, this is the price.”

It is called Supervised Learning because:

The model is not guessing blindly
We already know the right answers
We “supervise” the learning by correcting it

Simple Rule to Remember

If the data already has correct answers → it’s supervised learning

Tiny Real-Life Analogy

Situation	Learning Type
Teacher checks homework	Supervised
Child learns alone by trial	Unsupervised

Regression = teacher checking homework.

Regression is a supervised learning algorithm because the model learns from labeled data where the correct output is already known.

Supervised learning = learning with answers
Regression always learns this way

7️⃣ Tiny Intuition Practice

Think about:

Your phone price
Inputs: RAM, storage, brand
Output: Price

Your brain already does regression.
ML just does it faster and better.

8️⃣ 3-Line Takeaway (Remember This)

Regression predicts numbers, not labels
It learns patterns from past data
You already use regression in daily life

What’s Coming Next

Now that we know what regression is, next question is:

“How does a machine actually learn the best prediction?”

That’s where Linear Regression comes in.

👉 Day 2: How a Straight Line Learns From Data

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Statistics Day 9: Bootstrapping Made Simple: The Easiest Way to Understand Resampling

Chanchal Singh — Tue, 25 Nov 2025 09:30:00 +0000

What do you do when your dataset is small, you can’t collect more data, and every conclusion feels unreliable?

Most beginners think the only answer is: “Get more data.”
But statisticians discovered a smarter trick decades ago.

They learned how to squeeze hundreds of new datasets out of one tiny dataset—
without changing a single value in it.

This trick is called Bootstrapping,
and once you understand it, your confidence intervals, model stability, and estimates will instantly make more sense.

Let’s break it down in the simplest way possible.

What is Resampling?

Resampling means:
Taking samples from your existing data again and again to learn more about the population.

It is used when:

Data is small
You can’t collect more data
You want to estimate accuracy or uncertainty

Two main types:

Method	Meaning
Bootstrapping	A resampling method where you create many new datasets by sampling with replacement to estimate a statistic’s accuracy and uncertainty.

Jackknife	A resampling method where you repeatedly drop one data point at a time to estimate a statistic’s stability, bias, or variance.

What is Bootstrapping?

Imagine you have one small dataset.
Bootstrapping lets you create hundreds or thousands of new datasets from it.

How?

You randomly pick values from your original data WITH replacement
(meaning an item can repeat).

Example:
Original data = [5, 8, 9, 6]

A bootstrap sample could be:

[5, 9, 9, 6] or
[8, 5, 8, 9]

Each new sample has the same length as the original.

Why do this?

Because it lets you:

Estimate the true mean
Estimate confidence intervals
Measure uncertainty even when you don’t have a large dataset.

Why Do We Use Bootstrapping?

Goal	Why Bootstrapping Helps
Estimate confidence intervals	Works even with small sample sizes
Test hypotheses	No need for normal distribution assumption
Assess model stability	Train models on bootstrap samples
Estimate error	Helps measure variance and bias

Bootstrapping is used widely in ML:

Random Forest (bootstrap aggregation)
Bagging models
Model variance estimation

Super Simple Example

Imagine you have only 10 students’ marks.
You want to estimate the true class average.

But 10 students is too small.

So you:

Randomly pick 10 marks with replacement
Calculate the average
Repeat 1,000 times
Look at all 1,000 averages

These 1,000 averages show:

How stable the average is
What range it falls in
How uncertain your estimate is

This helps you say something like:

"There is a 95% chance the true average lies between 72 and 79."

Why Bootstrapping Is So Powerful

Works even for tiny datasets
No assumptions about data shape
Very easy to compute
Used in many ML ensemble models

Bootstrapping basically says:

“If I could collect more data, this is what it might look like.”

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Statistics Day 8: Understanding A/B Testing and Market Basket Analysis Without the Jargon

Chanchal Singh — Sat, 22 Nov 2025 09:30:00 +0000

Statistics Challenge for Data Scientists

Today, we’ll understand two very practical ideas:

A/B Testing – how to compare two options and choose the better one using data.
Market Basket Analysis – how to find which items are often bought together.

A simple concept, but still useful for data scientist.

1. What is A/B Testing?

A/B testing is like a fair competition between two versions of something to see which one works better.

You create:

Version A
Version B

Then you show A to some people, B to some other people, and compare results.

We do this to answer questions like:

Which button gets more clicks?
Which headline makes more people sign up?
Which page keeps users longer?

Simple example

Imagine you have a website with a “Sign Up” button.

You are not sure which button color works better:

Version A: Red button
Version B: Green button

You do not just guess. You:

Show the red button to 50% of visitors (Group A).
Show the green button to the other 50% (Group B).
Count how many people clicked Sign Up in each group.

Example numbers:

Version	Visitors	Sign Ups	Conversion Rate
Red	1,000	80	8%
Green	1,000	120	12%

Here, Version B (green) seems better because 12% > 8%.

Then you use a statistical test (like a t-test or z-test) to check:
“Is this difference real, or could it be just random?”

If the result is statistically significant (p < 0.05), you choose the better version with confidence.

Key ideas in A/B testing (in simple words)

Term	Simple meaning
Conversion	The action we care about (click, signup, buy)
Conversion rate	Conversions ÷ total visitors
Significance	The result is unlikely to be just random

2. What is Market Basket Analysis?

Market Basket Analysis (MBA) is used to find which items are often bought together.

It answers questions like:

“If a customer buys X, what else are they likely to buy?”
“Which items should we place together in the store?”
“Which product combos should we recommend online?”

This is heavily used in retail and e-commerce.

Simple example

Imagine a small grocery shop.
You collect data from different bills (transactions).

Example transaction data:

Bill No.	Items Bought
1	Bread, Butter, Milk
2	Bread, Eggs
3	Milk, Bread
4	Bread, Butter
5	Milk, Eggs
6	Bread, Milk, Butter

From this, you might notice:

Bread appears in many bills.
Bread and Butter appear together often.
Bread and Milk also appear together.

So the shop learns:
“If someone buys Bread, there is a good chance they will also buy Butter.”

This is exactly what Market Basket Analysis is about.

Important terms in Market Basket Analysis

Let’s say we are interested in the rule:

“If a customer buys Bread, then they also buy Butter.”

We write this as:
Bread → Butter

1. Support

How often do Bread and Butter appear together in all bills?
Example:
- Total bills = 6
- Bills with Bread and Butter together: 3 (Bills 1, 4, 6)
- Support = 3/6 = 0.5 (50%)

2. Confidence

When Bread is bought, how often is Butter also bought?
Bills with Bread: Bills 1, 2, 3, 4, 6 → 5 bills
Bills with Bread and Butter: 3
Confidence = 3/5 = 0.6 (60%)
Interpretation: If someone buys Bread, there is a 60% chance they also buy Butter.

3. Lift

How much more likely is Butter bought when Bread is bought, compared to buying Butter normally?
If Lift > 1: Bread and Butter are positively associated (good combo).
If Lift = 1: No special relationship.
If Lift < 1: They appear together less than expected.

You do not need to go deep into the formula right away.
At beginner level, just remember:

Support: How often together?
Confidence: If A, how likely B?
Lift: How strong is the relationship?

Where is Market Basket Analysis used?

Online stores:
- “Customers who bought this also bought…”
Supermarkets:
- Placing chips near soft drinks
- Placing bread near butter and jam
Food delivery apps:
- Suggesting sides with a main dish (fries with burger, dessert with pizza)

Quick Comparison

Concept	Question it answers	Data type mainly used
A/B Testing	Which version works better?	Conversions, click rates etc.
Market Basket Analysis	Which items are often bought together?	Transactions (lists of items)

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Statistics Day 7 : Hypothesis Testing Made Super Simple

Chanchal Singh — Fri, 21 Nov 2025 09:30:00 +0000

Statistics Challenge for Data Scientists

Hypothesis testing sounds scary, but it’s basically a math way of asking:

“Is this thing really happening, or is it just random chance?”

You assume something is true → test it with sample data → decide if evidence is strong enough to reject it.

What is Hypothesis Testing?

Think of it like a court case:

Term	Meaning (Simple)
Null Hypothesis (H0)	Default assumption. “Nothing has changed.”
Alternative Hypothesis (H1)	Opposite claim. “Something has changed.”
p-value	Probability that the result happened by chance.
Significance Level (α)	Cutoff (usually 0.05). If p < 0.05 → reject H0.
Test Statistic	A number calculated from data to judge the claim.

Why Do We Use Hypothesis Testing?

You cannot test entire populations. So you take a sample and check if the sample result is strong enough to represent the population.

Examples:

Does a new medicine work better than the old one?
Is the average salary different in two cities?
Is customer churn related to subscription type?
Are two features correlated?

👉 Today’s Focus: T-Test and Chi-Square Test

1️⃣ T-Test (Also called Student’s t-test)

What does it check?

It checks whether the mean (average) of two groups is different.

When do we use it?

Use the t-test when:

The variables are numerical
Sample size is small (< 30)
Population variance is unknown

Example (super simple)

You want to test if average marks of:

Students in Class A
Students in Class B are different.

Use a t-test.

What does T-test output mean?

If p < 0.05 → difference is real.
If p ≥ 0.05 → difference is probably due to chance.

2️⃣ Chi-Square (χ²) Test

What does it check?

It checks if two categorical variables are related.

Examples of categorical variables:

Gender (Male/Female)
Payment mode (UPI/Card/Cash)
Pass/Fail
Yes/No

When do you use Chi-square?

Use it when:

Both variables are categories
You want to test independence (“Are these two things connected or completely unrelated?”)

Example

You want to know if gender affects shopping preference.

Gender	Likes Online	Likes Offline
Male	35	25
Female	40	20

Use Chi-square.

Interpretation

If p < 0.05 → the two variables are dependent (related). Example: Gender does affect preference.
If p ≥ 0.05 → variables are independent (not related).

Summary Table (Easy to remember)

Test	Use Case	Data Type	What It Checks
T-Test	Compare 2 groups’ means	Numerical	Difference in averages
Chi-Square	Check relation between categories	Categorical	Dependency / independence

🧡 A Simple Visual View (Mental Model)

T-Test

Imagine two classrooms took the same exam.
You compare their average marks and ask:
“Is one class truly scoring higher, or is the difference just chance?”

Chi-Square

Imagine men and women choosing between online and offline shopping.
You ask:
“Is the choice different because of gender, or is it unrelated?”

Final Word

Hypothesis testing is not about proving you are right.
It is about checking whether the data strongly disagrees with the default assumption (H0).

If the disagreement is strong → H0 gets rejected.

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Statistics Day 6: Your First Data Science Superpower: Feature Selection with Correlation & Variance

Chanchal Singh — Thu, 20 Nov 2025 09:30:00 +0000

Feature selection is one of the most important steps before building any machine learning model.

And one of the simplest tools to do this is correlation.

But correlation alone doesn’t tell the whole story.
To use it correctly, you also need to understand variance, standard deviation, and a few other related statistical terms.

This blog breaks everything down in the simplest way possible — no heavy maths, just practical understanding.

1. What Is Correlation?

Correlation tells us how two numerical features move together.

If they grow together → positive correlation
If one grows while the other falls → negative correlation
If they don’t move in any clear pattern → zero correlation

Correlation ranges from –1 to +1:

+1 → perfectly move together
–1 → perfectly opposite
0 → no relationship

In feature selection, correlation helps you answer:

“Which features are actually related to the target?”
“Which features are repeating the same information?”

2. How Do We Use Correlation for Feature Selection?

A. Select Features That Are Correlated With the Target

If you're predicting house price, and size_in_sqft has high correlation with price, that feature is useful.

Example:

Feature	Correlation with Price
Size (sqft)	0.82
No. of rooms	0.65
Age of house	–0.20
Zip code	0.05

High correlation → strong predictive power.

B. Remove Features That Are Highly Correlated With Each Other

When two features are too similar, they cause multicollinearity, which confuses models (especially regression).

Example:

height and total_floors → correlation 0.95
They’re giving the same information.
You keep only one.

This makes your model:

simpler
faster
less noisy
more stable

**C. The Big Warning: Correlation Only Catches Linear Relationships**

If a feature has a non-linear relationship with the target, correlation may say “0”, even when the feature is useful.

Example:
Predicting salary based on experience — relationship grows but flattens → non-linear curve.

Low correlation does not mean useless feature.

Best practice:
Include the feature anyway and check feature importance using:

Random Forest
XGBoost
SHAP values

3. Variance — How Spread Out the Data Is

Variance tells you how much the values are spread from the average.

Low variance → values are almost the same
High variance → wide variety of values

Example:

Values	Variance
50, 50, 50, 50	Very low
10, 80, 120, 200	Very high

In feature selection:

Features with extremely low variance (almost constant features) should be removed.

Example:

A column with 99% “No” and 1% “Yes”
Gives almost no information

This is called low-variance filtering.

4. Standard Deviation — The More Interpretable Version of Variance

Standard deviation (SD) is the square root of variance.

Why do we use SD?

Because SD is in the same units as the data, so it’s easier to interpret.

Example:

Variance = 2500
SD = 50 SD = “On average, values are 50 units away from the mean.”

In data science:

High SD → more spread
Low SD → less spread

SD is important in:

normal distribution
Z-score normalization
outlier detection

5. Practical Use Cases in Real Data Science

A. Feature Engineering

Remove highly correlated features
Keep features that correlate with the target
Remove low-variance features
Treat outliers using SD

B. Model Stability (Regression Models)

High correlation among features (multicollinearity):

inflates coefficients
makes the model unstable
reduces interpretability

Solution:

Correlation matrix
Variance Inflation Factor (VIF)

C. Detecting Outliers

Using SD:

Any value > 3 SD from the mean is often considered an outlier This helps clean the dataset before modeling.

D. Normalization

Z-score = (value – mean) ÷ SD
Used heavily in:

KNN
SVM
Gradient descent-based models

Because these models depend on distance, standardization is essential.

6. Quick Summary Table

Concept	Meaning	Why It Matters for Feature Selection
Correlation	How two features move together	Helps identify useful or redundant features
Variance	How spread out the data is	Remove near-constant features
Standard Deviation	Average spread from the mean	Used in scaling and outlier detection
High Feature-to-Target Correlation	Strong predictor	Keep it
High Feature-to-Feature Correlation	Redundant	Remove one
Low Correlation	Not always useless	Check with ML model importance

7. Final Takeaways

Use correlation to pick predictive features.
Remove features that are too similar to each other.
Use variance and standard deviation to spot boring or noisy features.
Always validate with ML models because correlation misses non-linear relationships.

Feature selection is not just theory — it’s one of the most practical skills in data science.

If you understand correlation, variance, and SD, you're already ahead.

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Statistics Day5: The Super-Simple Guide to Random Variables and Correlation for Data Science Beginners

Chanchal Singh — Wed, 19 Nov 2025 09:30:00 +0000

If you’re learning statistics for data science, you’ll hear words that sound very big: random variables, PDF, correlation, and more.

But don’t worry.
Today, we’ll break everything down in simple language so even a 10-year-old can follow.

What Is a Random Variable?

A random variable is just a number that comes from a random activity.

Think of it like this:
You do something uncertain → you get a number as a result.

Example: Roll a dice → you get 1, 2, 3, 4, 5, or 6.
That number is your random variable.

There are two types:

1. Discrete Random Variables

Discrete means you can count the possible values.
They come in separate chunks — no in-between values.

Examples:

Number of chocolates in a box (you can’t have 4.6 chocolates)
Number of students absent
Dice outcome (1–6)

Why it matters in data science?
You use discrete random variables when your feature takes clear, countable values.

2. Continuous Random Variables

Continuous means the values can be anything in a range — even decimals.

Examples:

Height (160.25 cm is possible)
Temperature (34.7°C, 34.75°C…)
Weight

Why it matters?
Many ML models assume continuous data follows patterns like the normal distribution.

What Is a Normal Distribution?

A normal distribution is the famous bell-shaped curve.

It looks like a hill that is:

highest in the middle
smooth
symmetric
values near the mean are more common

Example: Most people’s heights cluster around an average.
Only few are extremely short or extremely tall.

What Is the Probability Density Function (PDF)?

The PDF is simply a formula that tells us:

“How likely is a value to appear in a continuous distribution?”

For a normal distribution, the PDF looks complicated, but the meaning is simple:

It helps us find probabilities for continuous values
The highest point is at the mean (most likely)
The sides go down smoothly (less likely)

You cannot take one point and say “this value has 10% probability.”
For continuous data, we talk about areas under the curve.

Think of the curve as a mountain.
Probability = how much area lies under that mountain between two points.

This helps in:

calculating confidence intervals
computing z-scores
understanding statistical tests

Pearson's Correlation Coefficient (r)

Pearson’s correlation tells us:

“How strongly are two numerical variables related?”

It gives a number between -1 and +1:

Value (r)	Meaning
+1	Perfect positive relationship
0	No linear relationship
-1	Perfect negative relationship

Examples:

Height vs weight → positive correlation
Age vs toy preference → negative correlation
Shoe size vs IQ → almost zero correlation

In simple terms:
If one goes up and the other goes up too → positive.
If one goes up and the other goes down → negative.

Practical Use Cases

Concept	Real-Life Use	Data Science Use
Discrete RV	Counting customers	Classification features
Continuous RV	Measuring weight or speed	Regression, clustering
PDF	Finding chances in continuous data	Hypothesis testing, probability models
Pearson Correlation	See if two things are linked	Feature selection, EDA

When Are These Useful in Machine Learning?

1. Feature Engineering

Correlation helps detect:

predictive features
multicollinearity (when features are too similar)

2. Understanding Your Dataset

Random variables and distributions help decide:

Which visualization to use
Which model suits the data
Whether scaling/normalization is required

3. Statistical Testing

PDF + normal distribution help compute:

z-scores
p-values
confidence intervals

Simple Examples to Lock the Concepts

Example 1: Discrete

Number of pets in a house:

0,1,2,3… Countable. No decimals.

Example 2: Continuous

Time taken to run 100 meters:

12.5s, 12.51s, 12.512s Infinite possibilities.

Example 3: Pearson Correlation

Study time vs test score → high positive
Ice cream sales vs temperature → positive
Mobile use vs sleep → negative

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Statistics Day 4: Z-Score vs Min-Max Normalization — Making Data Fair for ML Models

Chanchal Singh — Sat, 15 Nov 2025 09:30:00 +0000

Welcome back to the Statistics Challenge for Data Scientists!

Today, we’re learning something that makes our data fair — Normalization.

What is Normalization?

Imagine you and your friend are running a race.

You run 100 meters
Your friend runs 1 kilometer (1000 meters)

Can we directly compare who runs faster?
Not really — because the units and scales are different.

That’s exactly what happens with data — some numbers are small (like age), and some are huge (like salary).

Normalization means scaling data so that all values fit into a similar range and can be compared fairly.

Why Do We Need Normalization?

Think of a teacher giving marks to students:

Math score: 100 marks
Science score: 50 marks

If we add them directly, Math will dominate because its maximum is higher.

To treat both subjects fairly, we scale the marks — that’s normalization.

In data science, normalization helps machine learning models:

Work faster
Learn better
Give fair importance to each feature

Two Popular Normalization Methods

Let’s understand the two most common types — Min-Max Normalization and Z-Score Normalization.

1. Min-Max Normalization (Feature Scaling)

It squeezes all data values between 0 and 1.

Formula:

X' = (X - Xmin) / (Xmax - Xmin)

Example:
Let’s say we have ages: 10, 20, 30, 40, 50.

Minimum = 10
Maximum = 50

For age = 30

X' = (30 - 10) / (50 - 10) = 20 / 40 = 0.5

So, the normalized value is 0.5.

When to Use:

When your data has a fixed range (like 0 to 100 marks).
Best for algorithms that depend on distance (like KNN, K-Means, Neural Networks).

2. Z-Score Normalization (Standardization)

This method centers the data around mean = 0 and standard deviation = 1.
It shows how far each value is from the average.

Formula:

Z = (X - μ) / σ

Where:

μ = Mean of the data
σ = Standard deviation

Example:
Let’s say heights (in cm): 150, 160, 170, 180, 190

Mean (μ) = 170
Standard deviation (σ) = 14.14

For height = 150

Z = (150 - 170) / 14.14 = -1.41

So, 150 cm is 1.41 standard deviations below the mean.

When to Use:

When data doesn’t have a fixed range.
Works well with algorithms assuming normal distribution (like Linear Regression, Logistic Regression, PCA).

Min-Max vs Z-Score — Quick Comparison

Feature	Min-Max Normalization	Z-Score Normalization
Range	0 to 1	Can be negative or positive
Depends on	Min & Max values	Mean & Standard Deviation
Sensitive to outliers	Yes	Less sensitive
Best for	Bounded data (e.g. exam scores)	Unbounded data (e.g. height, salary)

Summary

Normalization makes data fair by bringing all features to a similar scale.
Use Min-Max when data has clear limits (like percentages).
Use Z-Score when data spreads freely and you care about distance from average.

Quick Recap Example

Original Value	Min-Max (0-1)	Z-Score
10	0.0	-1.41
30	0.5	0.0
50	1.0	+1.41

In short:
Normalization is like giving everyone the same playing field so that your machine learning model doesn’t play favorites!

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

Statistics Day 3: Understanding P-Value — The Heart of Hypothesis Testing

Chanchal Singh — Fri, 14 Nov 2025 05:30:00 +0000

Have you ever tried to prove a point to your friends?
Maybe you said — “I think this coin is magic! It always lands on heads!”

Your friends would say — “Really? Let’s test it!”

That’s kind of how data scientists use P-Value — to check if something is truly special or just luck.

Step 1: The Simple Idea

P-Value helps us decide whether what we see in data is real or just a coincidence.

Let’s say you flip a coin 10 times.
It lands on heads 9 times. 😮

Now you wonder — “Is this coin really unfair, or did I just get lucky?”

That’s when P-Value comes in.

Step 2: How P-Value Works

Imagine a little helper called P-Val, who whispers to you how “surprising” your result is.

If your coin result is... P-Val says... What it means
Very normal (like 5 heads, 5 tails) “That’s common!” Nothing special here
A bit unusual (like 7 heads, 3 tails) “Hmm, slightly surprising.” Could be luck
Super weird (like 9 heads, 1 tail) “Whoa! That’s rare!” Maybe the coin is unfair

So, the smaller the P-Value, the more unusual your result is — and the more likely you’ve found something real!

Step 3: The Magic Number — 0.05

Scientists often use 0.05 (5%) as a magic line.

P-Value What We Decide
Less than 0.05 “Wow! Probably something real happening here!”
More than 0.05 “Hmm, might just be luck.”

So if your P-Value is 0.03, you’d say —
👉 “This is rare! Maybe my coin is really unfair.”

But if it’s 0.20, you’d say —
👉 “That’s not rare enough. Probably just chance.”

Step 4: In Technical Terms

Null Hypothesis (H₀) = Nothing special happening.

Alternative Hypothesis (H₁) = Something special happening.

P-Value tells us how likely our data would be if H₀ (nothing special) was actually true.

So when P-Value is tiny, it means our result is too rare to be just chance, so we reject H₀.

Step 5: Real-Life Example

Let’s say a company says —
“Our new cookie recipe makes people 10% happier!” 🍪😁

We test it on 100 people.
If the P-Value comes out less than 0.05, it means —
→ The happiness difference is real, not just random luck.

If it’s higher than 0.05,
→ Maybe the cookies are tasty… but not that special. 😅

TL;DR

P-Value Tells how surprising your result is
Small P-Value (< 0.05) Rare → probably something real
Big P-Value (> 0.05) Common → probably just luck
Helps with Deciding if your finding is real or coincidence

🧭 Final Thought

Think of P-Value like a surprise meter.
It doesn’t prove anything 100%, but it helps you know whether your data is whispering “hey, look deeper!” or “nah, just a coincidence.”

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots