DEV Community

Cover image for What is Lasso Regression?
Tamal Barman
Tamal Barman

Posted on • Edited on

What is Lasso Regression?

Lasso regression, also known as L1 regularization, is a linear regression technique that adds a penalty term to the ordinary least squares (OLS) cost function. The penalty term is the absolute value of the regression coefficients multiplied by a tuning parameter, lambda. The purpose of this penalty term is to shrink the coefficients towards zero, which helps to reduce overfitting and select a subset of the most important features.

Lasso regression can be used for feature selection, as it tends to set the coefficients of less important features to zero. This is because the penalty term encourages sparsity in the coefficient estimates, meaning that it favors solutions where many of the coefficients are exactly zero. As a result, Lasso regression is particularly useful when working with high-dimensional data sets where there may be many irrelevant features.

Compared to Ridge regression, which uses a penalty term based on the square of the regression coefficients, Lasso regression is more likely to result in a sparse model with only a subset of the features having non-zero coefficients. However, Lasso regression may not perform as well as Ridge regression in situations where all the features are relevant, as it can be more prone to producing biased estimates of the coefficients.

Lasso Meaning:

In machine learning, Lasso or Lasso regression is a technique used for linear regression that adds a penalty term to the cost function. This penalty term is the absolute value of the sum of the regression coefficients multiplied by a tuning parameter, which is typically set through cross-validation. The purpose of this penalty term is to shrink the coefficients towards zero, thereby reducing the effect of less important features and selecting a subset of the most relevant features for the model.
The word “LASSO” stands for Least Absolute Shrinkage and Selection Operator.

Regularization:

Lasso regularization is particularly useful for feature selection in high-dimensional data sets, where there are many irrelevant features. It tends to set the coefficients of less important features to zero, which makes the resulting model simpler and easier to interpret. Compared to Ridge regression, which uses a penalty term based on the square of the regression coefficients, Lasso regression is more likely to produce sparse models with fewer non-zero coefficients.
The purpose of regularization in Lasso is to prevent overfitting, which occurs when the model fits the training data too closely and performs poorly on new data. Regularization achieves this by shrinking the coefficients towards zero, which reduces the effect of less important features and helps to select a subset of the most important features.

Lasso Regularization Technique:

Lasso regularization is a technique used in linear regression to prevent overfitting and improve the predictive performance of the model. It achieves this by adding a penalty term to the ordinary least squares (OLS) cost function, which is a linear combination of the sum of squared errors and the sum of the absolute values of the regression coefficients.

The Lasso penalty term is given by the formula:

lambda * (|b1| + |b2| +... + |bp|)

where lambda is a tuning parameter that controls the degree of regularization, b1, b2,..., bp are the regression coefficients, and p is the number of predictors or features in the model.

The purpose of the penalty term is to shrink the coefficients towards zero, which reduces the effect of less important features and helps to select a subset of the most important features. The Lasso penalty term has the beneficial property of promoting sparsity in coefficient estimates and supporting solutions where a significant portion of the coefficients are exactly zero.

Lasso regularization can be used for feature selection, as it tends to set the coefficients of less important features to zero. This is particularly useful when working with high-dimensional data sets where there may be many irrelevant features. Compared to Ridge regression, which uses a penalty term based on the square of the regression coefficients, Lasso regression is more likely to produce sparse models with fewer non-zero coefficients.

L1 Regularization:

L1 regularization, also known as Lasso regularization, is a technique used in linear regression to prevent overfitting and improve the predictive performance of the model. It involves adding a penalty term to the ordinary least squares (OLS) cost function, which is a linear combination of the sum of squared errors and the sum of the absolute values of the regression coefficients.

The L1 penalty term is given by the formula:

lambda * (|b1| + |b2| + ... + |bp|)

where lambda is a tuning parameter that controls the degree of regularization, b1, b2, ..., bp are the regression coefficients, and p is the number of predictors or features in the model.

Mathematical equation of Lasso Regression:

The mathematical equation for Lasso regression can be written as:

minimize (1/2m) * ||y - Xβ||^2 + λ * ||β||

where:

  • y is the vector of response variables
  • X is the matrix of predictor variables
  • β is the vector of regression coefficients
  • m is the number of observations
  • λ is the regularization parameter, controlling the strength of the penalty term
  • ||.|| is the L1 norm, which is the sum of the absolute values of the coefficients

The first term in the equation is the ordinary least squares (OLS) cost function, which measures the difference between the predicted values of y and the actual values. The second term is the L1 penalty term, which is the sum of the absolute values of the regression coefficients multiplied by the regularization parameter λ.

The objective of Lasso regression is to find the values of the regression coefficients that minimize the cost function while simultaneously keeping the size of the coefficients small. The L1 penalty term encourages sparsity in the coefficient estimates, meaning that it tends to set many of the coefficients to zero. This results in a simpler model with fewer non-zero coefficients, which is easier to interpret and less prone to overfitting.

Lasso Regression Example:

import pandas as pd
Enter fullscreen mode Exit fullscreen mode

Creating a New Train and Validation Datasets:

from sklearn.linear_model import LassoCV
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error, r2_score
Enter fullscreen mode Exit fullscreen mode

Load the data:

data = pd.read_csv('data.csv')
Enter fullscreen mode Exit fullscreen mode

Split the data into k subsets:

kf = KFold(n_splits=5, shuffle=True)
Enter fullscreen mode Exit fullscreen mode

Create the lasso regression model:

model = LassoCV(cv=kf, random_state=0)
Enter fullscreen mode Exit fullscreen mode

Fit the model:

X = data.drop('response_variable', axis=1)
y = data['response_variable']
model.fit(X, y)
Enter fullscreen mode Exit fullscreen mode

Predict and evaluate the model:

y_pred = model.predict(X)
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print(f'MSE: {mse:.2f}, R-squared: {r2:.2f}')
Enter fullscreen mode Exit fullscreen mode

Identify the subset of predictors:

coef = pd.Series(model.coef_, index=X.columns)
selected_features = coef[coef != 0].index.tolist()
print(f'Selected features: {selected_features}')
Enter fullscreen mode Exit fullscreen mode

Output:

0.7335508027883148

The Lasso Regression attained an accuracy of 73% with the given Dataset.

Top comments (0)