Ridge Regression vs Lasso Regression

#algorithms #datascience #machinelearning #tutorial

Explained Through a House Price Prediction Problem

Predicting house prices is a classic machine learning and statistics problem. Imagine you want to predict the price of a house using features such as its size, number of bedrooms, distance to the city center, nearby schools, and several other indicators. Linear regression is often the first model we try— but real-world data introduces challenges like overfitting and noise.

Ordinary Least Squares (OLS)

Ordinary Least Squares (OLS) is the standard form of linear regression. It estimates the relationship between input features (e.g., house size, bedrooms) and the target variable (house price) by fitting a straight line (or hyperplane) that best explains the data. OLS minimizes the sum of squared residuals, where a residual is the difference between the actual house price and the predicted price. OLS tries to make the predicted prices as close as possible to the actual prices by minimizing squared errors.

But this can lead to overfitting in real-world datasets. In the case of real housing data, many features may be correlated (e.g. size and number of bedrooms), some features may be noisy or irrelevant or the dataset may have limited samples.

In such cases, OLS has no built-in mechanism to control model complexity. If the data includes many weak, irrelevant or noisy features, the model then learns the training data too closely—capturing noise and random patterns instead of the true relationship and patterns. The model performs very well on training data but poorly on unseen (test) data.

To address these limitations, regularization techniques such as Ridge Regression (L2 regularization) and Lasso Regression (L1 regularization) are used.

Regularization

Regularization addresses the overfitting problem by discouraging overly complex models. In linear regression, complexity usually shows up as very large coefficients or high sensitivity to small changes in data.

Regularization works by adding a penalty term to the model’s loss function that penalizes large coefficients. This penalty discourages the model from relying too heavily on any single feature, forcing it to learn simpler, more general patterns that perform better on new data. Let’s look at two types of regularization bellow:

Ridge Regression (L2 Regularization)

Ridge adds a penalty that makes the model keep coefficients small. The penalty shrinks all coefficients toward zero, but never makes them exactly zero. Therefore, every feature stays in the model, just with reduced strength.

Because the penalty only shrinks coefficients, it doesn’t fully remove them. So Ridge is useful when you believe all feature matter a little bit.

Lasso Regression (L1 Regularization)

Lasso also adds a penalty, but this one can completely remove features. Lasso penalizes coefficients based on their absolute size. This penalty can force some coefficients to become exactly zero. Zero coefficients = feature removed from the model.

The L1 penalty makes it “cheaper” for the model to get rid of a weak feature rather than keep shrinking it. For example: If the number of trees in a compound doesn’t really affect house price → Lasso sets its coefficient to 0 and removes it.
So Lasso is useful when you think only a few features truly matter.

Summary: Ridge vs Lasso Regression

Method	What It Does	Effect on Features	When to Use
Ridge (L2)	Shrinks coefficients	Keeps all features	When all features contribute a bit
Lasso (L1)	Shrinks + removes coefficients	Drops unimportant features	When only a few features matter

Key Differences Between Ridge and Lasso Regression

Aspect	Ridge Regression (L2)	Lasso Regression (L1)
Regularization type	L2	L1
Feature selection	No	Yes
Coefficient behavior	Shrinks coefficients	Shrinks and sets some to zero
Best for	Many useful features	Few strong features
Interpretability	Moderate	High

Application Scenario: House Price Prediction

You are using features such as:

• Size of the house
• Number of bedrooms
• Distance to the city
• Number of schools nearby
• Several noisy or weak features

Choosing the Right Model

a. If all features are believed to contribute, choose Ridge Regression

Why?
• Ridge reduces overfitting without removing features
• Ideal when many features have small but real effects

b. If only a few features are truly important, choose Lasso Regression

Why?
• Automatically removes noisy and irrelevant features
• Produces a simpler, more interpretable model
• Focuses on the strongest predictors of house price

DEV Community

Ridge Regression vs Lasso Regression

Ordinary Least Squares (OLS)

Regularization

Choosing the Right Model

Top comments (0)