Gradient Boosting - Machine Learning

Introduction to Gradient Boosting

What is Boosting?

Boosting is an ensemble learning technique in machine learning where multiple weak models are combined to create one strong predictive model.

Many weak learners together form a strong learner.

A weak learner is a model that performs only slightly better than random prediction.

Example of weak learner:

Small Decision Tree

Main Idea of Boosting

Boosting works sequentially.

Model 1 makes prediction
Model 2 corrects errors of Model 1
Model 3 corrects errors of Model 2
...

Each new model focuses on mistakes made by previous models.

Why Boosting Works

Instead of building one large complex model:

Boosting gradually improves prediction step by step.

Each model learns only the remaining errors.

This reduces prediction error over time.

What is Gradient Boosting?

Gradient Boosting is a boosting technique where each new model learns the gradient of the loss function.

For regression problems using squared error loss:

Gradient = Residual Error

So practically:

Gradient Boosting learns residuals.

What is Residual?

Residual means prediction error.

Formula:

Residual = Actual Value - Predicted Value

Residual tells:

how wrong the prediction is
how much correction is needed

Gradient Boosting Workflow

1. Start with an initial prediction
2. Calculate residuals
3. Train a small model on residuals
4. Add correction to previous prediction
5. Calculate new residuals
6. Repeat the process

Example:

Dataset:

X	Y
1	10
2	12
3	14
4	16

Step 1: Initial Prediction

Initial prediction is usually the mean of target values.

Mean:

(10 + 12 + 14 + 16) / 4

= 52 / 4

= 13

Initial prediction:

X	Actual	Prediction
1	10	13
2	12	13
3	14	13
4	16	13

Step 2: Calculate Residuals

Formula:

Residual = Actual - Prediction

Residuals:

X	Actual	Prediction	Residual
1	10	13	-3
2	12	13	-1
3	14	13	1
4	16	13	3

These residuals become the target for the next model.

Step 3: Train New Model on Residuals

Now a small model learns the residuals.

Suppose the model predicts:

X	Correction
1	-2
2	-2
3	2
4	2

This means:

small X values need negative correction
large X values need positive correction

Step 4: Update Predictions

Learning rate:

0.5

Update formula:

New Prediction =
Old Prediction + Learning Rate × Correction

Example for X = 4:

13 + 0.5 × 2

= 14

Updated predictions:

X	Updated Prediction
1	12
2	12
3	14
4	14

Step 5: Calculate New Residuals

New residuals:

X	Actual	Prediction	Residual
1	10	12	-2
2	12	12	0
3	14	14	0
4	16	14	2

Errors are now smaller.

Important Concept

Gradient Boosting does not replace old models.

Instead:

New models are added to previous predictions.

Final prediction becomes:

Initial Prediction
+ Correction 1
+ Correction 2
+ Correction 3
+ ...

Why Decision Trees Are Commonly Used

Gradient Boosting can use many types of weak learners.

Most common weak learner:

Small Regression Trees

When regression trees are used:

Gradient Boosted Regression Trees (GBRT)

is formed.

Summary

Gradient Boosting works by:

starting with a simple prediction
calculating errors
training new models on errors
gradually improving prediction

Each new model learns only the remaining mistakes.

This sequential error correction makes Gradient Boosting a very powerful machine learning technique.