Chanchal Singh

Posted on Nov 20

Statistics Day 6: Your First Data Science Superpower: Feature Selection with Correlation & Variance

#statistics #machinelearning #datascience #beginners

Feature selection is one of the most important steps before building any machine learning model.

And one of the simplest tools to do this is correlation.

But correlation alone doesn’t tell the whole story.
To use it correctly, you also need to understand variance, standard deviation, and a few other related statistical terms.

This blog breaks everything down in the simplest way possible — no heavy maths, just practical understanding.

1. What Is Correlation?

Correlation tells us how two numerical features move together.

If they grow together → positive correlation
If one grows while the other falls → negative correlation
If they don’t move in any clear pattern → zero correlation

Correlation ranges from –1 to +1:

+1 → perfectly move together
–1 → perfectly opposite
0 → no relationship

In feature selection, correlation helps you answer:

“Which features are actually related to the target?”
“Which features are repeating the same information?”

2. How Do We Use Correlation for Feature Selection?

A. Select Features That Are Correlated With the Target

If you're predicting house price, and size_in_sqft has high correlation with price, that feature is useful.

Example:

Feature	Correlation with Price
Size (sqft)	0.82
No. of rooms	0.65
Age of house	–0.20
Zip code	0.05

High correlation → strong predictive power.

B. Remove Features That Are Highly Correlated With Each Other

When two features are too similar, they cause multicollinearity, which confuses models (especially regression).

Example:

height and total_floors → correlation 0.95
They’re giving the same information.
You keep only one.

This makes your model:

simpler
faster
less noisy
more stable

**C. The Big Warning: Correlation Only Catches Linear Relationships**

If a feature has a non-linear relationship with the target, correlation may say “0”, even when the feature is useful.

Example:
Predicting salary based on experience — relationship grows but flattens → non-linear curve.

Low correlation does not mean useless feature.

Best practice:
Include the feature anyway and check feature importance using:

Random Forest
XGBoost
SHAP values

3. Variance — How Spread Out the Data Is

Variance tells you how much the values are spread from the average.

Low variance → values are almost the same
High variance → wide variety of values

Example:

Values	Variance
50, 50, 50, 50	Very low
10, 80, 120, 200	Very high

In feature selection:

Features with extremely low variance (almost constant features) should be removed.

Example:

A column with 99% “No” and 1% “Yes”
Gives almost no information

This is called low-variance filtering.

4. Standard Deviation — The More Interpretable Version of Variance

Standard deviation (SD) is the square root of variance.

Why do we use SD?

Because SD is in the same units as the data, so it’s easier to interpret.

Example:

Variance = 2500
SD = 50 SD = “On average, values are 50 units away from the mean.”

In data science:

High SD → more spread
Low SD → less spread

SD is important in:

normal distribution
Z-score normalization
outlier detection

5. Practical Use Cases in Real Data Science

A. Feature Engineering

Remove highly correlated features
Keep features that correlate with the target
Remove low-variance features
Treat outliers using SD

B. Model Stability (Regression Models)

High correlation among features (multicollinearity):

inflates coefficients
makes the model unstable
reduces interpretability

Solution:

Correlation matrix
Variance Inflation Factor (VIF)

C. Detecting Outliers

Using SD:

Any value > 3 SD from the mean is often considered an outlier This helps clean the dataset before modeling.

D. Normalization

Z-score = (value – mean) ÷ SD
Used heavily in:

KNN
SVM
Gradient descent-based models

Because these models depend on distance, standardization is essential.

6. Quick Summary Table

Concept	Meaning	Why It Matters for Feature Selection
Correlation	How two features move together	Helps identify useful or redundant features
Variance	How spread out the data is	Remove near-constant features
Standard Deviation	Average spread from the mean	Used in scaling and outlier detection
High Feature-to-Target Correlation	Strong predictor	Keep it
High Feature-to-Feature Correlation	Redundant	Remove one
Low Correlation	Not always useless	Check with ML model importance

7. Final Takeaways

Use correlation to pick predictive features.
Remove features that are too similar to each other.
Use variance and standard deviation to spot boring or noisy features.
Always validate with ML models because correlation misses non-linear relationships.

Feature selection is not just theory — it’s one of the most practical skills in data science.

If you understand correlation, variance, and SD, you're already ahead.

Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

DEV Community

Statistics Day 6: Your First Data Science Superpower: Feature Selection with Correlation & Variance

1. What Is Correlation?

2. How Do We Use Correlation for Feature Selection?

A. Select Features That Are Correlated With the Target

B. Remove Features That Are Highly Correlated With Each Other

**C. The Big Warning: Correlation Only Catches Linear Relationships**

3. Variance — How Spread Out the Data Is

4. Standard Deviation — The More Interpretable Version of Variance

5. Practical Use Cases in Real Data Science

A. Feature Engineering

B. Model Stability (Regression Models)

C. Detecting Outliers

D. Normalization

6. Quick Summary Table

7. Final Takeaways

Top comments (0)