GBRT : Example

GBRT stands for Gradient Boosted Regression Trees.

It is used for regression problems, where the target value is numeric.

Examples:

House price prediction
Sales prediction
Temperature prediction
Demand forecasting

Main Idea of GBRT

GBRT does not build one large tree.

Instead, it builds many small trees one by one.

Each new tree tries to correct the mistakes made by the previous trees.

Tree 1 gives initial prediction
Tree 2 corrects Tree 1 mistakes
Tree 3 corrects remaining mistakes
Tree 4 corrects remaining mistakes

Final prediction is the sum of all tree outputs.

GBRT works like this:

Start with a simple prediction
Find the error
Train a small tree to predict the error
Add that tree to the model
Repeat again and again

Important Terms

Boosting = Combining many weak models sequentially

Weak Learner = A small decision tree

Residual = Actual value - Predicted value

Learning Rate = Controls how much each tree contributes

Gradient = Direction of error correction

GBRT Algorithm Steps

1. Start with an initial prediction.
2. Calculate residuals.
3. Train a small regression tree on residuals.
4. Multiply tree prediction by learning rate.
5. Add it to the previous prediction.
6. Calculate new residuals.
7. Repeat for many trees.
8. Final prediction is the sum of all updates.

Formula:

New Prediction =
Old Prediction + ( Learning Rate × Tree Prediction )

For squared error loss:

Residual = Actual Value - Predicted Value

Small Dataset

We want to predict price from size.

Size Price
1 10
2 12
3 14
4 16

Step 1: Initial Prediction

GBRT starts with the average target value.

Initial Prediction = Mean of Price
Mean = (10 + 12 + 14 + 16) / 4
Mean = 52 / 4
Mean = 13

So:

F0(x) = 13

Every row initially gets prediction 13.

Size Actual Price Initial Prediction
1 10 13
2 12 13
3 14 13
4 16 13

Step 2: Calculate Residuals

Residual formula:

Residual = Actual Price - Predicted Price
Size Actual Price Prediction Residual
1 10 13 -3
2 12 13 -1
3 14 13 1
4 16 13 3

Step 3: Train First Small Tree on Residuals

Now GBRT trains a small regression tree using:

Input = Size
Target = Residual

Dataset for Tree 1:

Size Residual
1 -3
2 -1
3 1
4 3

A simple tree can split at:

Size < 2.5

Left node:

Size Residual
1 -3
2 -1

Average residual:

(-3 + -1) / 2 = -2

Right node:

Size Residual
3 1
4 3

Average residual:

(1 + 3) / 2 = 2

So Tree 1 becomes:

             Size < 2.5?
/ \
-2 +2

Step 4: Use Learning Rate

Let learning rate be:

Learning Rate = 0.5

Formula:

New Prediction = Old Prediction + Learning Rate × Tree Prediction

Step 5: Update Predictions

Size Old Prediction Tree 1 Output New Prediction
1 13 -2 13 + 0.5×(-2) = 12
2 13 -2 13 + 0.5×(-2) = 12
3 13 2 13 + 0.5×2 = 14
4 13 2 13 + 0.5×2 = 14

New predictions:

Size Actual New Prediction
1 10 12
2 12 12
3 14 14
4 16 14

Step 6: Calculate New Residuals

New Residual = Actual - New Prediction
Size Actual New Prediction New Residual
1 10 12 -2
2 12 12 0
3 14 14 0
4 16 14 2

The errors became smaller.

Step 7: Train Second Small Tree

Now Tree 2 is trained on new residuals.

Dataset for Tree 2:

Size Residual
1 -2
2 0
3 0
4 2

A simple tree again may split at:

Size < 2.5

Left node:

Size Residual
1 -2
2 0

Average residual:

(-2 + 0) / 2 = -1

Right node:

Size Residual
3 0
4 2

Average residual:

(0 + 2) / 2 = 1

Tree 2:

             Size < 2.5?
/ \
-1 +1

Step 8: Update Prediction Again

Learning rate:

0.5

Formula:

F2(x) = F1(x) + 0.5 × Tree2(x)
Size Previous Prediction Tree 2 Output Updated Prediction
1 12 -1 12 + 0.5×(-1) = 11.5
2 12 -1 12 + 0.5×(-1) = 11.5
3 14 1 14 + 0.5×1 = 14.5
4 14 1 14 + 0.5×1 = 14.5

Step 9: Calculate Residuals Again

Size Actual Prediction Residual
1 10 11.5 -1.5
2 12 11.5 0.5
3 14 14.5 -0.5
4 16 14.5 1.5

Errors are again reduced.

Step 10: Final Model After Two Trees

Final Prediction =
Initial Prediction
+ Learning Rate × Tree 1 Output
+ Learning Rate × Tree 2 Output
Final Prediction = 13 + 0.5×Tree1 + 0.5×Tree2

Final Predictions

Size Actual Final Prediction
1 10 11.5
2 12 11.5
3 14 14.5
4 16 14.5

Prediction for New Data

Suppose:

Size = 4

Initial prediction:

13

Tree 1 output:

+2

Tree 2 output:

+1

Final prediction:

13 + 0.5×2 + 0.5×1
= 13 + 1 + 0.5
= 14.5

So predicted price:

14.5

Why GBRT Works

Each tree focuses only on the remaining error.

First tree reduces large errors
Second tree reduces remaining errors
Third tree reduces even smaller errors

So the model improves step by step.

CART vs M5 vs GBRT

Algorithm Main Idea
CART One tree, leaf has average value
M5 One tree, leaf has linear model
GBRT Many trees, each tree corrects previous error

Summary

GBRT builds many small regression trees sequentially, where each new tree learns the residual errors of the previous model.

Previous Topic Gradient Boosting Next Topic ML Projects