Introduction
Linear regression is a fundamental technique in data science that models relationships between variables. In house price prediction, we have features like house size, number of bedrooms, and location to estimate prices. While basic linear regression works well in simple scenarios, it often struggles with real-world complexities like noisy data, correlated features, and overfitting. This article explores two powerful solutions to these problems: Ridge and Lasso regression.
1. Ordinary Least Squares (OLS) - The Foundation
What is OLS?
Ordinary Least Squares (OLS) is the standard method for training linear regression models. It works by finding the line (or hyperplane in higher dimensions) that minimizes the sum of squared differences between predicted and actual values.
Objective: Minimize the sum of squared residuals:
Loss = Σ(y_actual - y_predicted)²
The Overfitting Problem
Imagine predicting house prices using not only relevant features (size, location) but also irrelevant ones (color of front door, street name). OLS will try to use all these features to fit the training data perfectly. This creates two problems:
- Unstable coefficients: Small changes in data cause large coefficient swings
- Poor generalization: The model memorizes training data noise instead of learning patterns
Example: If we include "house number" as a feature, OLS might find patterns that don't generalize to new houses.
2. The Power of Regularization
Solving the Overfitting Problem
Regularization adds a penalty term to the loss function that discourages overly complex models. Think of it as adding "training wheels" to prevent the model from overcomplicating itself.
Why Penalties Help:
- They shrink coefficients toward zero
- They reduce model variance
- They improve generalization to new data
3. Ridge Regression (L2 Regularization)
The Ridge Loss Function
Loss = Σ(y_actual - y_predicted)² + λ * Σ(coefficients²)
Where λ (lambda) controls regularization strength - higher λ means more penalty.
How L2 Penalty Works
Ridge adds the sum of squared coefficients to the loss. This:
- Shrinks all coefficients proportionally
- Never sets coefficients exactly to zero
- Works like a gentle pull toward zero
Why No Feature Selection?
Because squaring small coefficients makes them even smaller but never zero. All features remain in the model, just with reduced influence.
4. Lasso Regression (L1 Regularization)
The Lasso Loss Function
Loss = Σ(y_actual - y_predicted)² + λ * Σ|coefficients|
How L1 Differs from L2
Instead of squaring coefficients, Lasso uses absolute values. This subtle change has dramatic effects:
- Creates "corner solutions" in optimization
- Can set coefficients exactly to zero
- Performs automatic feature selection
Why Zero Coefficients Matter
When a coefficient hits zero, that feature is completely removed from the model. Lasso automatically selects only the most important features - perfect for identifying which house characteristics truly matter.
5. Ridge vs Lasso: Key Differences
| Aspect | Ridge (L2) | Lasso (L1) |
|---|---|---|
| Feature Selection | No - keeps all features | Yes - can eliminate features |
| Coefficient Behavior | Shrinks evenly, never zero | Can shrink to exactly zero |
| Interpretability | All features remain, harder to interpret | Fewer features, simpler model |
| Best For | Many useful features | Few important features |
6. House Price Prediction Application
Scenario A: All Features Contribute
If we believe all our features (size, bedrooms, distance, schools) genuinely affect price, Ridge regression is preferable. It will use all available information while preventing any single feature from dominating unreasonably.
Why Ridge? It preserves all features while controlling their influence.
Scenario B: Few Important Features
If many features are noisy or irrelevant (like "neighbor's car color"), Lasso regression excels. It will identify and keep only the truly important predictors while eliminating noise.
Why Lasso? It acts like a feature detective, separating signal from noise and giving us a simpler, more interpretable model.
7. Model Evaluation Strategies
Detecting Overfitting
Train-Test Split Method:
- Split data into training (80%) and testing (20%) sets
- Train model on training data
- Compare performance:
- Good: Similar performance on both sets
- Overfit: Much better on training than testing
- Underfit: Poor performance on both
Example: If your model predicts training houses perfectly but fails on new houses, it's overfitting.
The Role of Residuals
Residuals (errors) = Actual price - Predicted price
What Residuals Tell Us:
- Patterned residuals: Model missing something (maybe non-linear relationships)
- Random residuals: Good model fit
- Large residuals: Poor predictions
Residual analysis helps diagnose whether our regularization is working properly.
Conclusion
Choosing between Ridge and Lasso depends on your problem context:
- Use Ridge when you believe most features contribute meaningfully
- Use Lasso when you suspect many features are irrelevant
- Use OLS only with few features and plenty of clean data
For house price prediction, Lasso often works well because only certain features (size, location, bedrooms) strongly influence prices, while others (exact age in days, specific street names) add mostly noise. Regularization techniques give us the control we need to build models that generalize well from training data to real-world predictions.
The goal isn't perfect training performance, but accurate predictions on houses we haven't seen before. Regularization helps us achieve this balance between complexity and generalizability.
Top comments (0)