GBRT : Example
GBRT stands for Gradient Boosted Regression Trees.
It is used for regression problems, where the target value is numeric.
Examples:
House price prediction
Sales prediction
Temperature prediction
Demand forecasting
Main Idea of GBRT
GBRT does not build one large tree.
Instead, it builds many small trees one by one.
Each new tree tries to correct the mistakes made by the previous trees.
Tree 1 gives initial prediction
Tree 2 corrects Tree 1 mistakes
Tree 3 corrects remaining mistakes
Tree 4 corrects remaining mistakes
Final prediction is the sum of all tree outputs.
GBRT works like this:
Start with a simple prediction
Find the error
Train a small tree to predict the error
Add that tree to the model
Repeat again and again
Important Terms
Boosting = Combining many weak models sequentially
Weak Learner = A small decision tree
Residual = Actual value - Predicted value
Learning Rate = Controls how much each tree contributes
Gradient = Direction of error correction
GBRT Algorithm Steps
1. Start with an initial prediction.
2. Calculate residuals.
3. Train a small regression tree on residuals.
4. Multiply tree prediction by learning rate.
5. Add it to the previous prediction.
6. Calculate new residuals.
7. Repeat for many trees.
8. Final prediction is the sum of all updates.
Formula:
New Prediction =
Old Prediction + ( Learning Rate × Tree Prediction )
For squared error loss:
Residual = Actual Value - Predicted Value
Small Dataset
We want to predict price from size.
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
| 4 | 16 |
Step 1: Initial Prediction
GBRT starts with the average target value.
Initial Prediction = Mean of Price
Mean = (10 + 12 + 14 + 16) / 4
Mean = 52 / 4
Mean = 13
So:
F0(x) = 13
Every row initially gets prediction 13.
| Size | Actual Price | Initial Prediction |
|---|---|---|
| 1 | 10 | 13 |
| 2 | 12 | 13 |
| 3 | 14 | 13 |
| 4 | 16 | 13 |
Step 2: Calculate Residuals
Residual formula:
Residual = Actual Price - Predicted Price
| Size | Actual Price | Prediction | Residual |
|---|---|---|---|
| 1 | 10 | 13 | -3 |
| 2 | 12 | 13 | -1 |
| 3 | 14 | 13 | 1 |
| 4 | 16 | 13 | 3 |
Step 3: Train First Small Tree on Residuals
Now GBRT trains a small regression tree using:
Input = Size
Target = Residual
Dataset for Tree 1:
| Size | Residual |
|---|---|
| 1 | -3 |
| 2 | -1 |
| 3 | 1 |
| 4 | 3 |
A simple tree can split at:
Size < 2.5
Left node:
| Size | Residual |
|---|---|
| 1 | -3 |
| 2 | -1 |
Average residual:
(-3 + -1) / 2 = -2
Right node:
| Size | Residual |
|---|---|
| 3 | 1 |
| 4 | 3 |
Average residual:
(1 + 3) / 2 = 2
So Tree 1 becomes:
Size < 2.5?
/ \
-2 +2
Step 4: Use Learning Rate
Let learning rate be:
Learning Rate = 0.5
Formula:
New Prediction = Old Prediction + Learning Rate × Tree Prediction
Step 5: Update Predictions
| Size | Old Prediction | Tree 1 Output | New Prediction |
|---|---|---|---|
| 1 | 13 | -2 | 13 + 0.5×(-2) = 12 |
| 2 | 13 | -2 | 13 + 0.5×(-2) = 12 |
| 3 | 13 | 2 | 13 + 0.5×2 = 14 |
| 4 | 13 | 2 | 13 + 0.5×2 = 14 |
New predictions:
| Size | Actual | New Prediction |
|---|---|---|
| 1 | 10 | 12 |
| 2 | 12 | 12 |
| 3 | 14 | 14 |
| 4 | 16 | 14 |
Step 6: Calculate New Residuals
New Residual = Actual - New Prediction
| Size | Actual | New Prediction | New Residual |
|---|---|---|---|
| 1 | 10 | 12 | -2 |
| 2 | 12 | 12 | 0 |
| 3 | 14 | 14 | 0 |
| 4 | 16 | 14 | 2 |
The errors became smaller.
Step 7: Train Second Small Tree
Now Tree 2 is trained on new residuals.
Dataset for Tree 2:
| Size | Residual |
|---|---|
| 1 | -2 |
| 2 | 0 |
| 3 | 0 |
| 4 | 2 |
A simple tree again may split at:
Size < 2.5
Left node:
| Size | Residual |
|---|---|
| 1 | -2 |
| 2 | 0 |
Average residual:
(-2 + 0) / 2 = -1
Right node:
| Size | Residual |
|---|---|
| 3 | 0 |
| 4 | 2 |
Average residual:
(0 + 2) / 2 = 1
Tree 2:
Size < 2.5?
/ \
-1 +1
Step 8: Update Prediction Again
Learning rate:
0.5
Formula:
F2(x) = F1(x) + 0.5 × Tree2(x)
| Size | Previous Prediction | Tree 2 Output | Updated Prediction |
|---|---|---|---|
| 1 | 12 | -1 | 12 + 0.5×(-1) = 11.5 |
| 2 | 12 | -1 | 12 + 0.5×(-1) = 11.5 |
| 3 | 14 | 1 | 14 + 0.5×1 = 14.5 |
| 4 | 14 | 1 | 14 + 0.5×1 = 14.5 |
Step 9: Calculate Residuals Again
| Size | Actual | Prediction | Residual |
|---|---|---|---|
| 1 | 10 | 11.5 | -1.5 |
| 2 | 12 | 11.5 | 0.5 |
| 3 | 14 | 14.5 | -0.5 |
| 4 | 16 | 14.5 | 1.5 |
Errors are again reduced.
Step 10: Final Model After Two Trees
Final Prediction =
Initial Prediction
+ Learning Rate × Tree 1 Output
+ Learning Rate × Tree 2 Output
Final Prediction = 13 + 0.5×Tree1 + 0.5×Tree2
Final Predictions
| Size | Actual | Final Prediction |
|---|---|---|
| 1 | 10 | 11.5 |
| 2 | 12 | 11.5 |
| 3 | 14 | 14.5 |
| 4 | 16 | 14.5 |
Prediction for New Data
Suppose:
Size = 4
Initial prediction:
13
Tree 1 output:
+2
Tree 2 output:
+1
Final prediction:
13 + 0.5×2 + 0.5×1
= 13 + 1 + 0.5
= 14.5
So predicted price:
14.5
Why GBRT Works
Each tree focuses only on the remaining error.
First tree reduces large errors
Second tree reduces remaining errors
Third tree reduces even smaller errors
So the model improves step by step.
CART vs M5 vs GBRT
| Algorithm | Main Idea |
|---|---|
| CART | One tree, leaf has average value |
| M5 | One tree, leaf has linear model |
| GBRT | Many trees, each tree corrects previous error |
Summary
GBRT builds many small regression trees sequentially, where each new tree learns the residual errors of the previous model.