DEV Community

Cover image for Statistics Day 6: Your First Data Science Superpower: Feature Selection with Correlation & Variance
Chanchal Singh
Chanchal Singh

Posted on

Statistics Day 6: Your First Data Science Superpower: Feature Selection with Correlation & Variance

Feature selection is one of the most important steps before building any machine learning model.

And one of the simplest tools to do this is correlation.

But correlation alone doesn’t tell the whole story.
To use it correctly, you also need to understand variance, standard deviation, and a few other related statistical terms.

This blog breaks everything down in the simplest way possible — no heavy maths, just practical understanding.


1. What Is Correlation?

Correlation tells us how two numerical features move together.

  • If they grow together → positive correlation
  • If one grows while the other falls → negative correlation
  • If they don’t move in any clear pattern → zero correlation

Correlation ranges from –1 to +1:

  • +1 → perfectly move together
  • –1 → perfectly opposite
  • 0 → no relationship

In feature selection, correlation helps you answer:

“Which features are actually related to the target?”
“Which features are repeating the same information?”


2. How Do We Use Correlation for Feature Selection?

A. Select Features That Are Correlated With the Target

If you're predicting house price, and size_in_sqft has high correlation with price, that feature is useful.

Example:

Feature Correlation with Price
Size (sqft) 0.82
No. of rooms 0.65
Age of house –0.20
Zip code 0.05

High correlation → strong predictive power.

Correlation Heatmap


B. Remove Features That Are Highly Correlated With Each Other

When two features are too similar, they cause multicollinearity, which confuses models (especially regression).

Example:

  • height and total_floors → correlation 0.95
  • They’re giving the same information.
  • You keep only one.

This makes your model:

  • simpler
  • faster
  • less noisy
  • more stable

C. The Big Warning: Correlation Only Catches Linear Relationships

If a feature has a non-linear relationship with the target, correlation may say “0”, even when the feature is useful.

Example:
Predicting salary based on experience — relationship grows but flattens → non-linear curve.

Low correlation does not mean useless feature.

High vs Low Correaltion

Best practice:
Include the feature anyway and check feature importance using:

  • Random Forest
  • XGBoost
  • SHAP values

3. Variance — How Spread Out the Data Is

Variance tells you how much the values are spread from the average.

  • Low variance → values are almost the same
  • High variance → wide variety of values

Example:

Values Variance
50, 50, 50, 50 Very low
10, 80, 120, 200 Very high

In feature selection:

Features with extremely low variance (almost constant features) should be removed.

Variance graph

Example:

  • A column with 99% “No” and 1% “Yes”
  • Gives almost no information

This is called low-variance filtering.


4. Standard Deviation — The More Interpretable Version of Variance

Standard deviation (SD) is the square root of variance.

Why do we use SD?

Because SD is in the same units as the data, so it’s easier to interpret.

Example:

  • Variance = 2500
  • SD = 50 SD = “On average, values are 50 units away from the mean.”

In data science:

  • High SD → more spread
  • Low SD → less spread

SD is important in:

  • normal distribution
  • Z-score normalization
  • outlier detection

5. Practical Use Cases in Real Data Science

A. Feature Engineering

  • Remove highly correlated features
  • Keep features that correlate with the target
  • Remove low-variance features
  • Treat outliers using SD

B. Model Stability (Regression Models)

High correlation among features (multicollinearity):

  • inflates coefficients
  • makes the model unstable
  • reduces interpretability

Solution:

  • Correlation matrix
  • Variance Inflation Factor (VIF)

C. Detecting Outliers

Using SD:

  • Any value > 3 SD from the mean is often considered an outlier This helps clean the dataset before modeling.

D. Normalization

Z-score = (value – mean) ÷ SD
Used heavily in:

  • KNN
  • SVM
  • Gradient descent-based models

Because these models depend on distance, standardization is essential.


6. Quick Summary Table

Concept Meaning Why It Matters for Feature Selection
Correlation How two features move together Helps identify useful or redundant features
Variance How spread out the data is Remove near-constant features
Standard Deviation Average spread from the mean Used in scaling and outlier detection
High Feature-to-Target Correlation Strong predictor Keep it
High Feature-to-Feature Correlation Redundant Remove one
Low Correlation Not always useless Check with ML model importance

7. Final Takeaways

  • Use correlation to pick predictive features.
  • Remove features that are too similar to each other.
  • Use variance and standard deviation to spot boring or noisy features.
  • Always validate with ML models because correlation misses non-linear relationships.

Feature selection is not just theory — it’s one of the most practical skills in data science.

If you understand correlation, variance, and SD, you're already ahead.


Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/
Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots

I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you're into learning AI in a beginner-friendly way, make sure to follow for more!

Top comments (0)