DEV Community

Beatrice Njagi
Beatrice Njagi

Posted on

Ridge Regression vs Lasso Regression

Explained Through a House Price Prediction Problem

Predicting house prices is a classic machine learning and statistics problem. Imagine you want to predict the price of a house using features such as its size, number of bedrooms, distance to the city center, nearby schools, and several other indicators. Linear regression is often the first model we try— but real-world data introduces challenges like overfitting and noise.

Ordinary Least Squares (OLS)

Ordinary Least Squares (OLS) is the standard form of linear regression. It estimates the relationship between input features (e.g., house size, bedrooms) and the target variable (house price) by fitting a straight line (or hyperplane) that best explains the data. OLS minimizes the sum of squared residuals, where a residual is the difference between the actual house price and the predicted price. OLS tries to make the predicted prices as close as possible to the actual prices by minimizing squared errors.

But this can lead to overfitting in real-world datasets. In the case of real housing data, many features may be correlated (e.g. size and number of bedrooms), some features may be noisy or irrelevant or the dataset may have limited samples.

In such cases, OLS has no built-in mechanism to control model complexity. If the data includes many weak, irrelevant or noisy features, the model then learns the training data too closely—capturing noise and random patterns instead of the true relationship and patterns. The model performs very well on training data but poorly on unseen (test) data.

To address these limitations, regularization techniques such as Ridge Regression (L2 regularization) and Lasso Regression (L1 regularization) are used.

Regularization

Regularization addresses the overfitting problem by discouraging overly complex models. In linear regression, complexity usually shows up as very large coefficients or high sensitivity to small changes in data.

Regularization works by adding a penalty term to the model’s loss function that penalizes large coefficients. This penalty discourages the model from relying too heavily on any single feature, forcing it to learn simpler, more general patterns that perform better on new data. Let’s look at two types of regularization bellow:

Ridge Regression (L2 Regularization)

Ridge adds a penalty that makes the model keep coefficients small. The penalty shrinks all coefficients toward zero, but never makes them exactly zero. Therefore, every feature stays in the model, just with reduced strength.

Because the penalty only shrinks coefficients, it doesn’t fully remove them. So Ridge is useful when you believe all feature matter a little bit.

Lasso Regression (L1 Regularization)

Lasso also adds a penalty, but this one can completely remove features. Lasso penalizes coefficients based on their absolute size. This penalty can force some coefficients to become exactly zero. Zero coefficients = feature removed from the model.

The L1 penalty makes it “cheaper” for the model to get rid of a weak feature rather than keep shrinking it. For example: If the number of trees in a compound doesn’t really affect house price → Lasso sets its coefficient to 0 and removes it.
So Lasso is useful when you think only a few features truly matter.

Summary: Ridge vs Lasso Regression

Method What It Does Effect on Features When to Use
Ridge (L2) Shrinks coefficients Keeps all features When all features contribute a bit
Lasso (L1) Shrinks + removes coefficients Drops unimportant features When only a few features matter

Key Differences Between Ridge and Lasso Regression

Aspect Ridge Regression (L2) Lasso Regression (L1)
Regularization type L2 L1
Feature selection No Yes
Coefficient behavior Shrinks coefficients Shrinks and sets some to zero
Best for Many useful features Few strong features
Interpretability Moderate High

Application Scenario: House Price Prediction

You are using features such as:

• Size of the house
• Number of bedrooms
• Distance to the city
• Number of schools nearby
• Several noisy or weak features

Choosing the Right Model

a. If all features are believed to contribute, choose Ridge Regression

Why?
• Ridge reduces overfitting without removing features
• Ideal when many features have small but real effects

b. If only a few features are truly important, choose Lasso Regression

Why?
• Automatically removes noisy and irrelevant features
• Produces a simpler, more interpretable model
• Focuses on the strongest predictors of house price

Top comments (0)