Gradient Boosting

Introduction to Gradient Boosting

What is Boosting?

Boosting is an ensemble learning technique in machine learning where multiple weak models are combined to create one strong predictive model.

Many weak learners together form a strong learner.

A weak learner is a model that performs only slightly better than random prediction.

Example of weak learner:

Small Decision Tree

Main Idea of Boosting

Boosting works sequentially.

Model 1 makes prediction
Model 2 corrects errors of Model 1
Model 3 corrects errors of Model 2
...

Each new model focuses on mistakes made by previous models.

Why Boosting Works

Instead of building one large complex model:

Boosting gradually improves prediction step by step.

Each model learns only the remaining errors.

This reduces prediction error over time.

What is Gradient Boosting?

Gradient Boosting is a boosting technique where each new model learns the gradient of the loss function.

For regression problems using squared error loss:

Gradient = Residual Error

So practically:

Gradient Boosting learns residuals.

What is Residual?

Residual means prediction error.

Formula:

Residual = Actual Value - Predicted Value

Residual tells:

  • how wrong the prediction is

  • how much correction is needed

Gradient Boosting Workflow

1. Start with an initial prediction
2. Calculate residuals
3. Train a small model on residuals
4. Add correction to previous prediction
5. Calculate new residuals
6. Repeat the process

Example:

Dataset:

X Y
1 10
2 12
3 14
4 16

Step 1: Initial Prediction

Initial prediction is usually the mean of target values.

Mean:

(10 + 12 + 14 + 16) / 4
= 52 / 4
= 13

Initial prediction:

X Actual Prediction
1 10 13
2 12 13
3 14 13
4 16 13

Step 2: Calculate Residuals

Formula:

Residual = Actual - Prediction

Residuals:

X Actual Prediction Residual
1 10 13 -3
2 12 13 -1
3 14 13 1
4 16 13 3

These residuals become the target for the next model.

Step 3: Train New Model on Residuals

Now a small model learns the residuals.

Suppose the model predicts:

X Correction
1 -2
2 -2
3 2
4 2

This means:

  • small X values need negative correction

  • large X values need positive correction

Step 4: Update Predictions

Learning rate:

0.5

Update formula:

New Prediction =
Old Prediction + Learning Rate × Correction

Example for X = 4:

13 + 0.5 × 2
= 14

Updated predictions:

X Updated Prediction
1 12
2 12
3 14
4 14

Step 5: Calculate New Residuals

New residuals:

X Actual Prediction Residual
1 10 12 -2
2 12 12 0
3 14 14 0
4 16 14 2

Errors are now smaller.

Important Concept

Gradient Boosting does not replace old models.

Instead:

New models are added to previous predictions.

Final prediction becomes:

Initial Prediction
+ Correction 1
+ Correction 2
+ Correction 3
+ ...

Why Decision Trees Are Commonly Used

Gradient Boosting can use many types of weak learners.

Most common weak learner:

Small Regression Trees

When regression trees are used:

Gradient Boosted Regression Trees (GBRT)

is formed.

Summary

Gradient Boosting works by:

  • starting with a simple prediction

  • calculating errors

  • training new models on errors

  • gradually improving prediction

Each new model learns only the remaining mistakes.

This sequential error correction makes Gradient Boosting a very powerful machine learning technique.

Previous Topic M5 : Example Next Topic GBRT : Example