Gradient Boosting
Introduction to Gradient Boosting
What is Boosting?
Boosting is an ensemble learning technique in machine learning where multiple weak models are combined to create one strong predictive model.
Many weak learners together form a strong learner.
A weak learner is a model that performs only slightly better than random prediction.
Example of weak learner:
Small Decision Tree
Main Idea of Boosting
Boosting works sequentially.
Model 1 makes prediction
Model 2 corrects errors of Model 1
Model 3 corrects errors of Model 2
...
Each new model focuses on mistakes made by previous models.
Why Boosting Works
Instead of building one large complex model:
Boosting gradually improves prediction step by step.
Each model learns only the remaining errors.
This reduces prediction error over time.
What is Gradient Boosting?
Gradient Boosting is a boosting technique where each new model learns the gradient of the loss function.
For regression problems using squared error loss:
Gradient = Residual Error
So practically:
Gradient Boosting learns residuals.
What is Residual?
Residual means prediction error.
Formula:
Residual = Actual Value - Predicted Value
Residual tells:
-
how wrong the prediction is
-
how much correction is needed
Gradient Boosting Workflow
1. Start with an initial prediction
2. Calculate residuals
3. Train a small model on residuals
4. Add correction to previous prediction
5. Calculate new residuals
6. Repeat the process
Example:
Dataset:
| X | Y |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
| 4 | 16 |
Step 1: Initial Prediction
Initial prediction is usually the mean of target values.
Mean:
(10 + 12 + 14 + 16) / 4
= 52 / 4
= 13
Initial prediction:
| X | Actual | Prediction |
|---|---|---|
| 1 | 10 | 13 |
| 2 | 12 | 13 |
| 3 | 14 | 13 |
| 4 | 16 | 13 |
Step 2: Calculate Residuals
Formula:
Residual = Actual - Prediction
Residuals:
| X | Actual | Prediction | Residual |
|---|---|---|---|
| 1 | 10 | 13 | -3 |
| 2 | 12 | 13 | -1 |
| 3 | 14 | 13 | 1 |
| 4 | 16 | 13 | 3 |
These residuals become the target for the next model.
Step 3: Train New Model on Residuals
Now a small model learns the residuals.
Suppose the model predicts:
| X | Correction |
|---|---|
| 1 | -2 |
| 2 | -2 |
| 3 | 2 |
| 4 | 2 |
This means:
-
small X values need negative correction
-
large X values need positive correction
Step 4: Update Predictions
Learning rate:
0.5
Update formula:
New Prediction =
Old Prediction + Learning Rate × Correction
Example for X = 4:
13 + 0.5 × 2
= 14
Updated predictions:
| X | Updated Prediction |
|---|---|
| 1 | 12 |
| 2 | 12 |
| 3 | 14 |
| 4 | 14 |
Step 5: Calculate New Residuals
New residuals:
| X | Actual | Prediction | Residual |
|---|---|---|---|
| 1 | 10 | 12 | -2 |
| 2 | 12 | 12 | 0 |
| 3 | 14 | 14 | 0 |
| 4 | 16 | 14 | 2 |
Errors are now smaller.
Important Concept
Gradient Boosting does not replace old models.
Instead:
New models are added to previous predictions.
Final prediction becomes:
Initial Prediction
+ Correction 1
+ Correction 2
+ Correction 3
+ ...
Why Decision Trees Are Commonly Used
Gradient Boosting can use many types of weak learners.
Most common weak learner:
Small Regression Trees
When regression trees are used:
Gradient Boosted Regression Trees (GBRT)
is formed.
Summary
Gradient Boosting works by:
-
starting with a simple prediction
-
calculating errors
-
training new models on errors
-
gradually improving prediction
Each new model learns only the remaining mistakes.
This sequential error correction makes Gradient Boosting a very powerful machine learning technique.