Example: DTR

Problem Statement

Suppose we want to predict salary based on years of experience.

Dataset

Experience (X) Salary (Y)
1 20
2 25
3 30
4 80

Main Idea of Decision Tree Regression

Decision Tree Regression works by:

Splitting the dataset into smaller regions

The algorithm tries to find:

The split that minimizes prediction error

using:

Mean Squared Error (MSE)

Step 1: Find Possible Split Points

The tree first sorts input values:

1, 2, 3, 4

Then it calculates:

Midpoints between consecutive values

Possible Split Calculations

Between 1 and 2:

(1 + 2) / 2
3 / 2
1.5

Between 2 and 3:

(2 + 3) / 2
5 / 2
2.5

Between 3 and 4:

(3 + 4) / 2
7 / 2
3.5

Final Possible Splits

1.5, 2.5, 3.5

The algorithm will:

  • Try every split

  • Calculate MSE

  • Select the split with minimum error

Step 2: Calculate Initial Mean

Before splitting:

Formula:

Mean = Sum of salaries / Number of samples

Calculation:

(20 + 25 + 30 + 80) / 4
155 / 4
38.75

Initially:

Every prediction = 38.75

Step 3: Calculate Initial MSE

Formula:

MSE = mean((Actual - Predicted)²)

Error Table

Actual Salary Predicted Salary Error Error²
20 38.75 -18.75 351.56
25 38.75 -13.75 189.06
30 38.75 -8.75 76.56
80 38.75 41.25 1701.56

Sum of Squared Errors

351.56 + 189.06 + 76.56 + 1701.56
= 2318.74

Initial MSE

2318.74 / 4
579.68

This error is very high.

So the tree tries splitting.

Step 4: Try Split = 1.5

Split rule:

Experience < 1.5

Left Region

Experience Salary
1 20

Prediction:

20

MSE:

0

Right Region

Experience Salary
2 25
3 30
4 80

Mean:

(25 + 30 + 80) / 3
135 / 3
45

Right Region Error Table

Actual Predicted Error²
25 45 400
30 45 225
80 45 1225

Total Error

400 + 225 + 1225 = 1850

Total MSE

1850 / 4
462.5

Step 5: Try Split = 2.5

Split rule:

Experience < 2.5

Left Region

Salary
20
25

Mean:

(20 + 25) / 2
22.5

Left Error

Actual Predicted Error²
20 22.5 6.25
25 22.5 6.25

Total:

12.5

Right Region

Salary
30
80

Mean:

(30 + 80) / 2
55

Right Error

Actual Predicted Error²
30 55 625
80 55 625

Total:

1250

Total Error

12.5 + 1250
1262.5

Total MSE

1262.5 / 4
315.63

Step 6: Try Split = 3.5

Split rule:

Experience < 3.5

Left Region

Salary
20
25
30

Mean:

(20 + 25 + 30) / 3
75 / 3
25

Left Error

Actual Predicted Error²
20 25 25
25 25 0
30 25 25

Total:

50

Right Region

Salary
80

Prediction:

80

Error:

0

Total Error

50 + 0
50

Total MSE

50 / 4
12.5

Step 7: Compare All Splits

Split MSE
1.5 462.5
2.5 315.63
3.5 12.5

Best Split

The minimum MSE is:

12.5

So the best split becomes:

Experience < 3.5

Final Decision Tree

Experience < 3.5
/ \
Yes No
| |
Predict 25 Predict 80

Prediction Example

Suppose:

Experience = 2

Prediction:

25

Suppose:

Experience = 4

Prediction:

80

Summary

Decision Tree Regression automatically generates possible split points using midpoints between neighboring values. It evaluates every split using Mean Squared Error (MSE) and selects the split that minimizes prediction error. The final tree predicts the average value within each region.

Previous Topic Decision Tree Regression Next Topic ML Projects