source: https://godzilla.dev/learning/ai_quant_traders_series_7/
See below for godzilla.dev materials about: AI x Quant Trader Series - Day 7
The Swiss Army Knife of Linear Models: Lasso Regression¶
Reading time: ~15 minutes
Prerequisites: basic linear algebra, Python, NumPy
Focus: engineering intuition, quant usage (not ML hype)
Part 1: Introduction to Regularized Linear Models¶
We now move from data processing to one of the most important modeling tools in quantitative trading and applied machine learning: regularized linear models.
In real-world financial modeling, the main difficulty is rarely computation. Instead, it is almost always structure:
Too many features
Strong multicollinearity
Limited samples
High noise-to-signal ratio
A plain linear regression model can fit the data extremely well in-sample, yet fail catastrophically out-of-sample.
This is where Lasso regression becomes indispensable.
Part 2: From Linear Regression to Lasso¶
2.1 Ordinary Least Squares (OLS)¶
The objective function of ordinary least squares is:
OLS attempts to minimize prediction error only.
It places no constraint on model complexity.
As a result:
Coefficients become unstable when features are correlated
Noise features receive non-zero weights
Overfitting is almost guaranteed in high-dimensional settings
2.2 Why Regularization Is Necessary¶
In quantitative finance, feature sets often include:
Dozens of technical indicators
Overlapping factors
Lagged signals
Many of these features carry redundant or spurious information.
Regularization explicitly penalizes complexity, forcing the model to prefer simpler and more stable solutions.
Part 3: Lasso Regression — Core Idea¶
3.1 Objective Function¶
Lasso (Least Absolute Shrinkage and Selection Operator) modifies OLS by adding an L1 penalty:
Where:
The first term measures fit quality
The second term penalizes coefficient magnitude
controls the strength of regularization
3.2 What Makes Lasso Different¶
Unlike Ridge regression (L2 regularization), Lasso drives some coefficients exactly to zero.
This leads to:
Automatic feature selection
Sparse models
Improved interpretability
From an engineering perspective:
Lasso is not just a regression model — it is a structured filter.
Part 4: Intuition — Why Lasso Produces Sparsity¶
The L1 penalty creates a sharp constraint geometry.
When optimization occurs under this constraint, solutions naturally land on coordinate axes.
The practical consequence is simple:
Unimportant features are dropped entirely.
This behavior is extremely valuable in quant trading, where fewer signals often outperform noisy combinations.
Part 5: Implementing Lasso in Python¶
We now implement Lasso using scikit-learn.
Imports¶
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
5.1 Generate Example Data¶
import numpy as np
np.random.seed(42)
X = np.random.randn(100, 10)
true_beta = np.array([3, 0, 0, 1.5, 0, 0, 0, 2, 0, 0])
y = X @ true_beta + np.random.randn(100) * 0.5
5.2 Standardize Features¶
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
5.3 Fit the Lasso Model¶
from sklearn.linear_model import Lasso
import pandas as pd
lasso = Lasso(alpha=0.1)
lasso.fit(X_scaled, y)
pd.Series(lasso.coef_)
the output:
0 2.85
1 0.00
2 0.00
3 1.42
4 0.00
5 0.00
6 0.00
7 1.95
8 0.00
9 0.00
dtype: float64
Noise features are eliminated automatically, while true signals are retained.
Part 6: The Role of Alpha (λ)¶
6.1 Effect of Regularization Strength¶
Small α → weak regularization → overfitting
Large α → aggressive shrinkage → underfitting
for a in [0.01, 0.1, 1.0]:
model = Lasso(alpha=a)
model.fit(X_scaled, y)
print(a, (model.coef_ != 0).sum())
the output:
0.01 7
0.1 3
1.0 0
6.2 Cross-Validation (Recommended)¶
from sklearn.linear_model import LassoCV
lasso_cv = LassoCV(cv=5)
lasso_cv.fit(X_scaled, y)
lasso_cv.alpha_
lasso_cv.coef_
Cross-validation improves robustness across different market regimes.
Part 7: Limitations of Lasso¶
Lasso is not universally optimal:
Performs poorly when features are highly correlated
Cannot model non-linear interactions
Sensitive to outliers
Common remedies include:
Elastic Net (L1 + L2)
PCA + Lasso
Lasso for feature selection followed by non-linear models
Top comments (0)