M5 : Example
M5 Algorithm: Complete Step-by-Step Example
Dataset
We want to predict house price using house size.
| House | Size | Price |
|---|---|---|
| A | 1 | 10 |
| B | 2 | 12 |
| C | 3 | 14 |
| D | 4 | 30 |
| E | 5 | 32 |
| F | 6 | 35 |
Input feature:
X = Size
Target variable:
Y = Price
Step 1: Start with all data
All records are placed in the root node.
Root Node = All 6 records
M5 checks whether splitting the data will reduce variation in target values.
Step 2: Find possible split points
The size values are:
1, 2, 3, 4, 5, 6
Possible split points are the middle values between consecutive sizes:
1.5, 2.5, 3.5, 4.5, 5.5
So M5 checks:
Size < 1.5
Size < 2.5
Size < 3.5
Size < 4.5
Size < 5.5
Step 3: Calculate parent node standard deviation
Parent prices:
10, 12, 14, 30, 32, 35
Mean:
Mean = (10 + 12 + 14 + 30 + 32 + 35) / 6
Mean = 133 / 6
Mean = 22.17
Variance:
Variance = Sum of (Value - Mean)^2 / n
| Price | Price - Mean | Square |
|---|---|---|
| 10 | -12.17 | 148.11 |
| 12 | -10.17 | 103.43 |
| 14 | -8.17 | 66.75 |
| 30 | 7.83 | 61.31 |
| 32 | 9.83 | 96.63 |
| 35 | 12.83 | 164.61 |
Sum of Squares = 640.84
Variance = 640.84 / 6
Variance = 106.81
Parent SD = √106.81
Parent SD = 10.34
So:
Parent Standard Deviation = 10.34
Step 4: Formula used for split selection
M5 uses Standard Deviation Reduction, also called SDR.
SDR = Parent SD - Weighted Child SD
Where:
Weighted Child SD =
(Left Samples / Total Samples × Left SD)
+
(Right Samples / Total Samples × Right SD)
M5 chooses the split with the highest SDR.
Step 5: Check split Size < 1.5
Left node:
| Size | Price |
|---|---|
| 1 | 10 |
Right node:
| Size | Price |
|---|---|
| 2 | 12 |
| 3 | 14 |
| 4 | 30 |
| 5 | 32 |
| 6 | 35 |
Left SD:
Only one value, so Left SD = 0
Right mean:
Right Mean = (12 + 14 + 30 + 32 + 35) / 5
Right Mean = 123 / 5
Right Mean = 24.6
Right variance:
[(12-24.6)^2 + (14-24.6)^2 + (30-24.6)^2 + (32-24.6)^2 + (35-24.6)^2] / 5
= (158.76 + 112.36 + 29.16 + 54.76 + 108.16) / 5
= 463.20 / 5
= 92.64
Right SD:
Right SD = √92.64
Right SD = 9.62
Weighted Child SD:
= (1/6 × 0) + (5/6 × 9.62)
= 0 + 8.02
= 8.02
SDR:
SDR = 10.34 - 8.02
SDR = 2.32
Step 6: Check split Size < 2.5
Left node:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
Right node:
| Size | Price |
|---|---|
| 3 | 14 |
| 4 | 30 |
| 5 | 32 |
| 6 | 35 |
Left mean:
Left Mean = (10 + 12) / 2
Left Mean = 11
Left variance:
[(10-11)^2 + (12-11)^2] / 2
= (1 + 1) / 2
= 1
Left SD:
Left SD = √1
Left SD = 1
Right mean:
Right Mean = (14 + 30 + 32 + 35) / 4
Right Mean = 111 / 4
Right Mean = 27.75
Right variance:
[(14-27.75)^2 + (30-27.75)^2 + (32-27.75)^2 + (35-27.75)^2] / 4
= (189.06 + 5.06 + 18.06 + 52.56) / 4
= 264.74 / 4
= 66.19
Right SD:
Right SD = √66.19
Right SD = 8.14
Weighted Child SD:
= (2/6 × 1) + (4/6 × 8.14)
= 0.33 + 5.43
= 5.76
SDR:
SDR = 10.34 - 5.76
SDR = 4.58
Step 7: Check split Size < 3.5
Left node:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
Right node:
| Size | Price |
|---|---|
| 4 | 30 |
| 5 | 32 |
| 6 | 35 |
Left mean:
Left Mean = (10 + 12 + 14) / 3
Left Mean = 36 / 3
Left Mean = 12
Left variance:
[(10-12)^2 + (12-12)^2 + (14-12)^2] / 3
= (4 + 0 + 4) / 3
= 8 / 3
= 2.67
Left SD:
Left SD = √2.67
Left SD = 1.63
Right mean:
Right Mean = (30 + 32 + 35) / 3
Right Mean = 97 / 3
Right Mean = 32.33
Right variance:
[(30-32.33)^2 + (32-32.33)^2 + (35-32.33)^2] / 3
= (5.43 + 0.11 + 7.11) / 3
= 12.65 / 3
= 4.22
Right SD:
Right SD = √4.22
Right SD = 2.05
Weighted Child SD:
= (3/6 × 1.63) + (3/6 × 2.05)
= 0.815 + 1.025
= 1.84
SDR:
SDR = 10.34 - 1.84
SDR = 8.50
Step 8: Check split Size < 4.5
Left node:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
| 4 | 30 |
Right node:
| Size | Price |
|---|---|
| 5 | 32 |
| 6 | 35 |
Left mean:
Left Mean = (10 + 12 + 14 + 30) / 4
Left Mean = 66 / 4
Left Mean = 16.5
Left variance:
[(10-16.5)^2 + (12-16.5)^2 + (14-16.5)^2 + (30-16.5)^2] / 4
= (42.25 + 20.25 + 6.25 + 182.25) / 4
= 251 / 4
= 62.75
Left SD:
Left SD = √62.75
Left SD = 7.92
Right mean:
Right Mean = (32 + 35) / 2
Right Mean = 33.5
Right variance:
[(32-33.5)^2 + (35-33.5)^2] / 2
= (2.25 + 2.25) / 2
= 2.25
Right SD:
Right SD = √2.25
Right SD = 1.5
Weighted Child SD:
= (4/6 × 7.92) + (2/6 × 1.5)
= 5.28 + 0.50
= 5.78
SDR:
SDR = 10.34 - 5.78
SDR = 4.56
Step 9: Check split Size < 5.5
Left node:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
| 4 | 30 |
| 5 | 32 |
Right node:
| Size | Price |
|---|---|
| 6 | 35 |
Left mean:
Left Mean = (10 + 12 + 14 + 30 + 32) / 5
Left Mean = 98 / 5
Left Mean = 19.6
Left variance:
[(10-19.6)^2 + (12-19.6)^2 + (14-19.6)^2 + (30-19.6)^2 + (32-19.6)^2] / 5
= (92.16 + 57.76 + 31.36 + 108.16 + 153.76) / 5
= 443.20 / 5
= 88.64
Left SD:
Left SD = √88.64
Left SD = 9.41
Right SD:
Only one value, so Right SD = 0
Weighted Child SD:
= (5/6 × 9.41) + (1/6 × 0)
= 7.84 + 0
= 7.84
SDR:
SDR = 10.34 - 7.84
SDR = 2.50
Step 10: Compare all splits
| Split | SDR |
|---|---|
| Size < 1.5 | 2.32 |
| Size < 2.5 | 4.58 |
| Size < 3.5 | 8.50 |
| Size < 4.5 | 4.56 |
| Size < 5.5 | 2.50 |
Best split:
Size < 3.5
Reason:
It has the highest SDR value: 8.50
So M5 selects:
Root Split = Size < 3.5
Step 11: Build first tree
Size < 3.5?
/ \
Yes No
Left Node Right Node
Left node contains:
Size 1, 2, 3
Prices 10, 12, 14
Right node contains:
Size 4, 5, 6
Prices 30, 32, 35
Step 12: Fit linear model in left leaf
Left data:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
The price increases by 2 when size increases by 1.
So slope:
Slope = 2
Using equation:
Price = m × Size + c
Use point Size = 1, Price = 10:
10 = 2 × 1 + c
10 = 2 + c
c = 8
Left leaf model:
Price = 2 × Size + 8
Verification:
| Size | Calculation | Prediction |
|---|---|---|
| 1 | 2×1 + 8 | 10 |
| 2 | 2×2 + 8 | 12 |
| 3 | 2×3 + 8 | 14 |
Step 13: Fit linear model in right leaf
Right data:
| Size | Price |
|---|---|
| 4 | 30 |
| 5 | 32 |
| 6 | 35 |
Approximate slope:
Slope = (35 - 30) / (6 - 4)
Slope = 5 / 2
Slope = 2.5
Using equation:
Price = m × Size + c
Use point Size = 4, Price = 30:
30 = 2.5 × 4 + c
30 = 10 + c
c = 20
Right leaf model:
Price = 2.5 × Size + 20
Verification:
| Size | Calculation | Prediction |
|---|---|---|
| 4 | 2.5×4 + 20 | 30 |
| 5 | 2.5×5 + 20 | 32.5 |
| 6 | 2.5×6 + 20 | 35 |
Step 14: Final M5 tree
Size < 3.5?
/ \
Yes No
Price = 2×Size + 8 Price = 2.5×Size + 20
Step 15: Prediction example
Suppose new house size is:
Size = 5
Start at root:
Is 5 < 3.5?
Answer:
No
So go to right leaf.
Right leaf model:
Price = 2.5 × Size + 20
Substitute Size = 5:
Price = 2.5 × 5 + 20
Price = 12.5 + 20
Price = 32.5
Final prediction:
Predicted Price = 32.5
Summary
M5 first splits the data using Standard Deviation Reduction.
Then it fits linear regression models at the leaf nodes.
CART leaf prediction:
One constant value
M5 leaf prediction:
Linear equation
That is why M5 gives smoother and more accurate numeric predictions.