Jenil Sheth

Posted on Jan 26

L1 L2 Regularization

#ai #datascience #machinelearning #tutorial

Prompt:

L1 and L2 regularization ke baare me samjaao aur uska derivatives bhi nikaalo

Response:

ठीक है 🙂

मैं L1 और L2 regularization को intuition + maths दोनों के साथ समझाता हूँ, और फिर derivatives भी निकालते हैं।

1️⃣ Regularization क्यों ज़रूरी है?

Machine Learning models (especially linear / deep models) में overfitting की problem आती है —

model training data तो बहुत अच्छा सीख लेता है, लेकिन new data पर fail कर देता है।

👉 Regularization का idea है:

“Weights को unnecessarily बड़ा होने से रोको”

इससे model simple + generalizable बनता है।

2️⃣ Base Loss Function (without regularization)

मान लो linear regression है:

\hat{y} = wx + b

और loss function (MSE):

J(w) = \frac{1}{2m}\sum_{i=1}^{m}(y_i - \hat{y}_i)^2

अब regularization add करेंगे।

🔹 L2 Regularization (Ridge Regression)

🔸 Definition

L2 regularization में weights के square का penalty add करते हैं:

J(w) = \frac{1}{2m}\sum_{i=1}^{m}(y_i - \hat{y}i)^2 + \frac{\lambda}{2m}\sum{j=1}^{n} w_j^2

जहाँ:

$\lambda$ = regularization strength
बड़ा $\lambda$ ⇒ ज्यादा penalty ⇒ छोटे weights

🔸 Intuition (feel it)

Large weights ⇒ heavy penalty
Weights zero के पास जाते हैं, लेकिन exact zero नहीं होते
Smooth solution देता है

📌 इसलिए L2 = weight decay

🔸 Derivative of L2 Regularization

Penalty term:

\frac{\lambda}{2m} w^2

Derivative w.r.t. $w$ :

\frac{d}{dw}\left(\frac{\lambda}{2m} w^2\right) = \frac{\lambda}{m} w

🔸 Final Gradient (L2)

\frac{\partial J}{\partial w} = \frac{1}{m}\sum ( \hat{y} - y )x + \frac{\lambda}{m}w

👉 Gradient descent update:

w := w - \alpha \left( \frac{1}{m}\sum ( \hat{y} - y )x + \frac{\lambda}{m}w \right)

🔹 L1 Regularization (Lasso Regression)

🔸 Definition

L1 regularization में absolute value of weights का penalty होता है:

J(w) = \frac{1}{2m}\sum (y - \hat{y})^2 + \frac{\lambda}{m}\sum |w_j|

🔸 Intuition

Absolute value penalty
Weights को exact zero तक push कर देता है
Feature selection करता है automatically ✨

📌 इसलिए L1 = Sparse model

🔸 Derivative of L1 Regularization

Penalty term:

\frac{\lambda}{m}|w|

Derivative:

\frac{d}{dw}|w| = \begin{cases} +1 & w > 0 \ -1 & w < 0 \ \text{undefined} & w = 0 \end{cases}

So,

\frac{d}{dw}\left(\frac{\lambda}{m}|w|\right) = \frac{\lambda}{m}\,\text{sign}(w)

🔸 Final Gradient (L1)

\frac{\partial J}{\partial w} = \frac{1}{m}\sum (\hat{y} - y)x + \frac{\lambda}{m}\,\text{sign}(w)

3️⃣ L1 vs L2 (Quick Comparison)

Property	L1	L2
Penalty	(	w
Weights	Exact zero	Small but non-zero
Feature Selection	✅ Yes	❌ No
Stability	Less	More
Use when	Many useless features	All features important

4️⃣ Geometry Intuition (Interview gold ⭐)

L1 → Diamond-shaped constraint → hits axes → sparse solution
L2 → Circular constraint → smooth shrinkage

5️⃣ Deep Learning Context

Neural Networks में:
- L2 most common (called weight decay)
- L1 used when sparsity needed

अगर चाहो तो मैं:

L1/L2 को vector form में derive करूँ
Elastic Net (L1 + L2) समझाऊँ
या PyTorch / TensorFlow code के साथ explain करूँ 🔥

बताओ आगे क्या चाहिए 😊

DEV Community

L1 L2 Regularization

Prompt:

Response:

1️⃣ Regularization क्यों ज़रूरी है?

2️⃣ Base Loss Function (without regularization)

🔹 L2 Regularization (Ridge Regression)

🔸 Definition

🔸 Intuition (feel it)

🔸 Derivative of L2 Regularization

🔸 Final Gradient (L2)

🔹 L1 Regularization (Lasso Regression)

🔸 Definition

🔸 Intuition

🔸 Derivative of L1 Regularization

🔸 Final Gradient (L1)

3️⃣ L1 vs L2 (Quick Comparison)

4️⃣ Geometry Intuition (Interview gold ⭐)

5️⃣ Deep Learning Context

Top comments (0)