cristopher Njuguna

Posted on Jan 25

Ridge Regression vs Lasso Regression

#machinelearning #regression

House Price Prediction Context

Linear regression is a common method for predicting continuous values such as house prices.
While Ordinary Least Squares (OLS) is the standard approach for fitting linear models, it often struggles with real-world data that contains many features, correlated variables, or noise. Regularization techniques such as Ridge and Lasso regression improve model performance by controlling complexity and reducing overfitting.

Ordinary Least Squares (OLS) What is OLS and what objective does it minimize?

Ordinary Least Squares (OLS) is a method used to estimate the coefficients of a linear regression model by minimizing the sum of squared residuals—the squared differences between actual and predicted values.

Loss=i=1∑n(yi−y^i)2

Here, yi represents the actual house price and y^i represents the predicted price.

Why can OLS lead to overfitting in real-world datasets?

OLS can lead to overfitting when the model becomes too complex, especially in datasets with many features or when features are highly correlated. In such cases, the model captures noise rather than the underlying data patterns, resulting in poor generalization to unseen data.

Concept of Regularization What problem does regularization solve in linear regression?

Regularization reduces overfitting and model instability by discouraging large coefficient values. It helps the model generalize better to new data, especially when features are numerous or correlated.

Why is adding a penalty term helpful?

Adding a penalty term helps to constrain the magnitude of the coefficients. This keeps them small and reduces the model complexity, leading to better generalization to new data. Regularization techniques like Lasso and Ridge help in stabilizing the coefficient estimates.

Regularized Loss = OLS Loss + 𝜆 × Penalty

The parameter λ controls how strong the penalty is.

Ridge Regression (L2 Regularization) Ridge regression loss function Loss=∑(yi−y^i)2+λ∑βj2

How the L2 penalty affects coefficients

The L2 penalty shrinks all coefficients toward zero by penalizing their squared values. Larger coefficients receive stronger penalties, which stabilizes the model—especially when features are correlated.

Why Ridge regression does not perform feature selection

Ridge regression does not set coefficients exactly to zero. Instead, it reduces their magnitude. As a result, all features remain in the model, meaning Ridge does not perform feature selection.

Lasso Regression (L1 Regularization) Lasso regression loss function Loss=∑(yi−y^i)2+λ∑∣βj∣

How the L1 penalty differs from L2

The L1 penalty uses the absolute value of coefficients rather than squared values. This creates a stronger pull toward zero compared to L2.

Why Lasso can set coefficients exactly to zero

Lasso regression can set some coefficients exactly to zero because the L1 penalty creates a situation where the optimization problem can yield coefficients that do not contribute at all to the prediction, effectively excluding those features from the model.

Comparison: Ridge vs. Lasso Regression

Feature Selection:
Ridge: Retains all features, making it less suitable for dimensionality reduction.
Lasso: Performs feature selection by driving some coefficients to zero, making it more suitable for simpler models.

Coefficient Behavior:
Ridge: Coefficients are shrunk but not zeroed out, which means all variables remain in the model.
Lasso: Coefficients can be zeroed out, allowing for the model to focus on a subset of features.

Model Interpretability:
Ridge: Models are more complex and can be harder to interpret due to all features being included.
Lasso: Models are simpler and more interpretable since irrelevant features can be excluded entirely.

Application: House Price Prediction

Assume features include:

House size

Number of bedrooms

Distance to city

Number of nearby schools

Several noisy or weak features

a. Which regression technique would you choose if all features are believed to contribute to the price? Explain why.

If all features are believed to contribute to house prices, I would choose Ridge Regression. Since Ridge retains all features and simply shrinks their coefficients, it would provide a robust prediction without excluding any potentially valuable information.

b. Which technique would you choose if only a few features are truly important and others add noise? Explain why.

I would opt for Lasso Regression. Lasso's ability to reduce some coefficients to zero allows us to eliminate the irrelevant features, thus simplifying the model and improving its interpretability.

Model Evaluation Detecting Overfitting Using Training and Testing Data

To detect overfitting, the dataset is divided into two parts: a training set and a testing set. The model is trained using the training data and then evaluated on the testing data, which the model has not seen before. If the model shows very low error on the training data but significantly higher error on the testing data, this indicates overfitting. In contrast, similar performance on both datasets suggests that the model generalizes well. Evaluation metrics such as Mean Squared Error (MSE) or R² can be used to compare performance across the two datasets.

Role of Residuals in Evaluating Model Performance

Residuals are the differences between the actual house prices and the prices predicted by the model. Analyzing residuals helps determine how well the model fits the data. A good model produces residuals that are randomly scattered around zero, indicating that the model has captured the underlying relationship. If residuals show patterns, trends, or large outliers, this suggests issues such as missing important variables, non-linear relationships, or the presence of noise that the model has not handled well.

OLS is effective for simple datasets but struggles with noise and high dimensionality. Ridge regression improves stability by shrinking coefficients, while Lasso regression improves simplicity by removing irrelevant features. In house price prediction, Ridge is ideal when all features matter, while Lasso is better when only a few key predictors drive prices.

DEV Community

Ridge Regression vs Lasso Regression

Top comments (0)