Gradient Descent in LR
Gradient Descent is an optimization algorithm used to find the best values of slope (m) and intercept (b) in Linear Regression (LR). It helps minimize prediction error by continuously updating model parameters step by step.
Instead of calculating the best-fit line directly using formulas, Gradient Descent gradually learns the optimal line through iterations.
Why Gradient Descent is Needed
Suppose a regression model predicts values poorly.
Example:
| Actual Marks | Predicted Marks |
|---|---|
| 50 | 30 |
| 60 | 35 |
| 70 | 40 |
The prediction error is high.
Gradient Descent helps:
-
Reduce prediction error
-
Improve model accuracy
-
Find optimal values of m and b
Main Idea of Gradient Descent
Gradient Descent works like this:
1. Start with random values of m and b
2. Calculate prediction error
3. Update m and b
4. Repeat until error becomes very small
Linear Regression Equation
y = mx + b
Where:
-
y → Predicted output
-
x → Input feature
-
m → Slope
-
b → Intercept
Cost Function
Gradient Descent minimizes the Cost Function.
The most common cost function is:
Mean Squared Error (MSE)
MSE Formula
MSE = Σ(actual_y - predicted_y)^2 / n
Goal:
Minimize MSE
Real-Life Analogy
Imagine you are standing on a mountain and want to reach the lowest point.
You:
-
Take small steps downward
-
Check direction continuously
-
Eventually reach the bottom
Gradient Descent works similarly:
-
It moves step by step toward minimum error.
Important Terms
1. Learning Rate
Learning Rate controls:
How big each step should be
Small Learning Rate
Slow learning
More iterations
Large Learning Rate
May skip minimum point
Unstable learning
2. Iterations
Iterations represent:
How many times parameters are updated
More iterations usually improve learning.
Mathematical Example
Dataset
| X | Y |
|---|---|
| 1 | 2 |
| 2 | 4 |
| 3 | 6 |
Step 1: Initial Values
Suppose:
m = 0
b = 0
Learning Rate = 0.01
Step 2: Prediction Formula
predicted_y = mx + b
Predictions
For X = 1:
predicted_y = (0 * 1) + 0 = 0
For X = 2:
predicted_y = (0 * 2) + 0 = 0
For X = 3:
predicted_y = (0 * 3) + 0 = 0
Step 3: Calculate Error
| Actual Y | Predicted Y | Error |
|---|---|---|
| 2 | 0 | 2 |
| 4 | 0 | 4 |
| 6 | 0 | 6 |
Large errors exist.
Step 4: Update Parameters
Gradient Descent updates:
-
m
-
b
to reduce error.
After one update:
m = 0.28
b = 0.12
Predictions improve.
Step 5: Repeat Process
Gradient Descent keeps updating values repeatedly until:
Error becomes minimum
Visualization of Learning
Iteration 1 → High Error
Iteration 10 → Lower Error
Iteration 100 → Minimum Error
Python Example — Gradient Descent
import numpy as np
# Dataset
X = np.array([1, 2, 3])
Y = np.array([2, 4, 6])
# Initial values
m = 0
b = 0
# Learning rate
L = 0.01
# Iterations
epochs = 1000
n = len(X)
# Gradient Descent
for i in range(epochs):
Y_pred = m * X + b
# Derivatives
dm = (-2/n) * sum(X * (Y - Y_pred))
db = (-2/n) * sum(Y - Y_pred)
# Update values
m = m - L * dm
b = b - L * db
print("Slope:", m)
print("Intercept:", b)
Expected Output
Slope ≈ 2
Intercept ≈ 0
Final Equation
y = 2x
What Gradient Descent Learned
The algorithm learned:
When X increases,
Y increases proportionally.
Types of Gradient Descent
| Type | Description |
|---|---|
| Batch Gradient Descent | Uses entire dataset |
| Stochastic Gradient Descent | Uses one sample at a time |
| Mini-Batch Gradient Descent | Uses small batches |
Advantages of Gradient Descent
-
Works for large datasets
-
Efficient optimization
-
Widely used in Deep Learning
-
Helps minimize prediction error
Limitations
-
Requires proper learning rate
-
Can be slow for complex problems
-
May get stuck in local minima
Important Points
1. Gradient Descent minimizes the cost function.
2. Learning Rate controls step size.
3. Gradient Descent updates slope and intercept iteratively.
4. MSE is commonly used as the cost function.
5. Gradient Descent is widely used in Machine Learning and Deep Learning.
Summary
Gradient Descent is an optimization algorithm used in Linear Regression to minimize prediction error by continuously updating slope and intercept values. It helps models learn the best-fit line step by step through iterative optimization.