Regularization
Regularization is a technique used in Machine Learning to reduce overfitting and improve model generalization. It helps prevent machine learning models from becoming too complex by adding a penalty term to the loss function.
When a model learns training data too closely, it may perform well on training data but fail on new unseen data. Regularization helps solve this problem by controlling model complexity.
Why Regularization is Important
Regularization helps:
- Reduce overfitting
- Improve model generalization
- Simplify models
- Improve prediction accuracy on unseen data
- Reduce unnecessary feature impact
Regularization prevents machine learning models from memorizing training data.
What is Overfitting?
Overfitting occurs when a model performs extremely well on training data but poorly on testing data.
Example
| Dataset | Accuracy |
|---|---|
| Training Data | 99% |
| Testing Data | 60% |
This indicates overfitting.
How Regularization Works
Regularization works by adding a penalty term to the model’s loss function so that the machine learning model does not become overly complex while learning from training data.
Normally, a machine learning model tries to minimize prediction error. During this process, the model may create very large coefficient values to fit the training data perfectly. This can lead to overfitting.
Regularization controls this behavior by penalizing large coefficient values.
Without Regularization:
Model Goal:
Minimize Error Only
With Regularization:
Model Goal:
Minimize Error + Keep Model Simple
General Formula
Loss = Error + Penalty
Where:
- Error → Prediction error made by the model
- Penalty → Additional cost added for large coefficients
The penalty discourages very large coefficient values and reduces model complexity.
Why Large Coefficients are a Problem
Suppose we are predicting house prices.
A model may create something like:
Price=5000(Area)+900000(Bedrooms)
Here, the coefficient for Bedrooms is extremely large.
This means:
- The model depends too heavily on one feature
- Small changes in Bedrooms may produce huge prediction changes
- The model becomes unstable
- Overfitting may occur
Regularization reduces such excessively large coefficients.
Example: Imagine a student preparing for an exam.
Without Regularization : The student memorizes every answer exactly.Result:
- Performs well on known questions
- Fails on slightly different questions
This is similar to: Overfitting
With Regularization : The student focuses on understanding concepts instead of memorizing everything.Result:
- Performs well even on new questions
- Better generalization
This is similar to: Regularization
Types of Regularization
1. L1 Regularization (Lasso)
2. L2 Regularization (Ridge)
3. Elastic Net Regularization
1. L1 Regularization (Lasso Regression)
L1 Regularization adds the absolute values of coefficients as a penalty term.
Formula:
Loss=RSS+λ ∑∣βi∣
Where:
- RSS = Residual Sum of Squares
- = Regularization parameter
- βi = Model coefficients
Characteristics of L1 Regularization
- Can reduce coefficients to zero
- Performs feature selection
- Produces sparse models
Python Example — Lasso Regression
from sklearn.linear_model import Lasso
from sklearn.datasets import make_regression
X, y = make_regression(
n_samples=100,
n_features=5,
noise=10
)
model = Lasso(alpha=0.1)
model.fit(X, y)
print(model.coef_)
2. L2 Regularization (Ridge Regression)
L2 Regularization adds squared coefficient values as a penalty term.
Formula:
Loss=RSS+λ∑βi2
Characteristics of L2 Regularization
- Reduces coefficient magnitudes
- Does not make coefficients exactly zero
- Handles multicollinearity
Python Example — Ridge Regression
from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression
X, y = make_regression(
n_samples=100,
n_features=5,
noise=10
)
model = Ridge(alpha=1.0)
model.fit(X, y)
print(model.coef_)
3. Elastic Net Regularization
Elastic Net combines both L1 and L2 Regularization.
Formula:
Loss=RSS + λ1∑∣βi∣+λ2∑βi2
Benefits of Elastic Net
- Performs feature selection
- Handles correlated features
- Combines strengths of Ridge and Lasso
Python Example — Elastic Net
from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression
X, y = make_regression(
n_samples=100,
n_features=5,
noise=10
)
model = ElasticNet(alpha=0.1)
model.fit(X, y)
print(model.coef_)
Understanding Lambda (λ)
Lambda controls the strength of regularization.
Small LambdaLess penaltyLarge Lambda
More complex model
More penalty
Simpler model
Real-World Example
House Price Prediction
Suppose a dataset contains:
- Area
- Bedrooms
- Floors
- Nearby shops
- Random unnecessary features
Regularization helps reduce the effect of unnecessary features and improves prediction reliability.
Important Points
1. Regularization is mainly used to reduce overfitting.
2. L1 Regularization performs feature selection by reducing coefficients to zero.
3. L2 Regularization reduces coefficient values but does not remove features completely.
4. Elastic Net combines L1 and L2 Regularization.
5. Lambda controls the amount of regularization applied to the model.
Summary
Regularization is a machine learning technique used to reduce overfitting and improve model generalization by adding penalty terms to the loss function. L1, L2, and Elastic Net Regularization help simplify models, reduce complexity, and improve prediction performance on unseen data.
Keywords
Regularization, Regularization in Machine Learning, Overfitting Prevention, L1 Regularization, L2 Regularization, Elastic Net, Ridge Regression, Lasso Regression, Loss Function, Penalty Term, Lambda in Regularization, Bias Variance Tradeoff, Model Generalization, Overfitting and Underfitting, Machine Learning Optimization, Regression Regularization