GBRT : Example - Machine Learning

GBRT stands for Gradient Boosted Regression Trees.

It is used for regression problems, where the target value is numeric.

Examples:

House price prediction
Sales prediction
Temperature prediction
Demand forecasting

Main Idea of GBRT

GBRT does not build one large tree.

Instead, it builds many small trees one by one.

Each new tree tries to correct the mistakes made by the previous trees.

Tree 1 gives initial prediction
Tree 2 corrects Tree 1 mistakes
Tree 3 corrects remaining mistakes
Tree 4 corrects remaining mistakes

Final prediction is the sum of all tree outputs.

GBRT works like this:

Start with a simple prediction
Find the error
Train a small tree to predict the error
Add that tree to the model
Repeat again and again

Important Terms

Boosting = Combining many weak models sequentially

Weak Learner = A small decision tree

Residual = Actual value - Predicted value

Learning Rate = Controls how much each tree contributes

Gradient = Direction of error correction

GBRT Algorithm Steps

1. Start with an initial prediction.
2. Calculate residuals.
3. Train a small regression tree on residuals.
4. Multiply tree prediction by learning rate.
5. Add it to the previous prediction.
6. Calculate new residuals.
7. Repeat for many trees.
8. Final prediction is the sum of all updates.

Formula:

New Prediction =
Old Prediction + ( Learning Rate × Tree Prediction )

For squared error loss:

Residual = Actual Value - Predicted Value

Small Dataset

We want to predict price from size.

Size	Price
1	10
2	12
3	14
4	16

Step 1: Initial Prediction

GBRT starts with the average target value.

Initial Prediction = Mean of Price

Mean = (10 + 12 + 14 + 16) / 4

Mean = 52 / 4

Mean = 13

So:

F0(x) = 13

Every row initially gets prediction 13.

Size	Actual Price	Initial Prediction
1	10	13
2	12	13
3	14	13
4	16	13

Step 2: Calculate Residuals

Residual formula:

Residual = Actual Price - Predicted Price

Size	Actual Price	Prediction	Residual
1	10	13	-3
2	12	13	-1
3	14	13	1
4	16	13	3

Step 3: Train First Small Tree on Residuals

Now GBRT trains a small regression tree using:

Input = Size
Target = Residual

Dataset for Tree 1:

Size	Residual
1	-3
2	-1
3	1
4	3

A simple tree can split at:

Size < 2.5

Left node:

Size	Residual
1	-3
2	-1

Average residual:

(-3 + -1) / 2 = -2

Right node:

Size	Residual
3	1
4	3

Average residual:

(1 + 3) / 2 = 2

So Tree 1 becomes:

             Size < 2.5?
              /       \
           -2          +2

Step 4: Use Learning Rate

Let learning rate be:

Learning Rate = 0.5

Formula:

New Prediction = Old Prediction + Learning Rate × Tree Prediction

Step 5: Update Predictions

Size	Old Prediction	Tree 1 Output	New Prediction
1	13	-2	13 + 0.5×(-2) = 12
2	13	-2	13 + 0.5×(-2) = 12
3	13	2	13 + 0.5×2 = 14
4	13	2	13 + 0.5×2 = 14

New predictions:

Size	Actual	New Prediction
1	10	12
2	12	12
3	14	14
4	16	14

Step 6: Calculate New Residuals

New Residual = Actual - New Prediction

Size	Actual	New Prediction	New Residual
1	10	12	-2
2	12	12	0
3	14	14	0
4	16	14	2

The errors became smaller.

Step 7: Train Second Small Tree

Now Tree 2 is trained on new residuals.

Dataset for Tree 2:

Size	Residual
1	-2
2	0
3	0
4	2

A simple tree again may split at:

Size < 2.5

Left node:

Size	Residual
1	-2
2	0

Average residual:

(-2 + 0) / 2 = -1

Right node:

Size	Residual
3	0
4	2

Average residual:

(0 + 2) / 2 = 1

Tree 2:

             Size < 2.5?
              /       \
           -1          +1

Step 8: Update Prediction Again

Learning rate:

0.5

Formula:

F2(x) = F1(x) + 0.5 × Tree2(x)

Size	Previous Prediction	Tree 2 Output	Updated Prediction
1	12	-1	12 + 0.5×(-1) = 11.5
2	12	-1	12 + 0.5×(-1) = 11.5
3	14	1	14 + 0.5×1 = 14.5
4	14	1	14 + 0.5×1 = 14.5

Step 9: Calculate Residuals Again

Size	Actual	Prediction	Residual
1	10	11.5	-1.5
2	12	11.5	0.5
3	14	14.5	-0.5
4	16	14.5	1.5

Errors are again reduced.

Step 10: Final Model After Two Trees

Final Prediction =
Initial Prediction
+ Learning Rate × Tree 1 Output
+ Learning Rate × Tree 2 Output

Final Prediction = 13 + 0.5×Tree1 + 0.5×Tree2

Final Predictions

Size	Actual	Final Prediction
1	10	11.5
2	12	11.5
3	14	14.5
4	16	14.5

Prediction for New Data

Suppose:

Size = 4

Initial prediction:

Tree 1 output:

+2

Tree 2 output:

+1

Final prediction:

13 + 0.5×2 + 0.5×1

= 13 + 1 + 0.5

= 14.5

So predicted price:

14.5

Why GBRT Works

Each tree focuses only on the remaining error.

First tree reduces large errors
Second tree reduces remaining errors
Third tree reduces even smaller errors

So the model improves step by step.

CART vs M5 vs GBRT

Algorithm	Main Idea
CART	One tree, leaf has average value
M5	One tree, leaf has linear model
GBRT	Many trees, each tree corrects previous error

Summary

GBRT builds many small regression trees sequentially, where each new tree learns the residual errors of the previous model.