M5 : Example

M5 Algorithm: Complete Step-by-Step Example

Dataset

We want to predict house price using house size.

House Size Price
A 1 10
B 2 12
C 3 14
D 4 30
E 5 32
F 6 35

Input feature:

X = Size

Target variable:

Y = Price

Step 1: Start with all data

All records are placed in the root node.

Root Node = All 6 records

M5 checks whether splitting the data will reduce variation in target values.

Step 2: Find possible split points

The size values are:

1, 2, 3, 4, 5, 6

Possible split points are the middle values between consecutive sizes:

1.5, 2.5, 3.5, 4.5, 5.5

So M5 checks:

Size < 1.5
Size < 2.5
Size < 3.5
Size < 4.5
Size < 5.5

Step 3: Calculate parent node standard deviation

Parent prices:

10, 12, 14, 30, 32, 35

Mean:

Mean = (10 + 12 + 14 + 30 + 32 + 35) / 6
Mean = 133 / 6
Mean = 22.17

Variance:

Variance = Sum of (Value - Mean)^2 / n
Price Price - Mean Square
10 -12.17 148.11
12 -10.17 103.43
14 -8.17 66.75
30 7.83 61.31
32 9.83 96.63
35 12.83 164.61
Sum of Squares = 640.84

Variance = 640.84 / 6
Variance = 106.81

Parent SD = √106.81
Parent SD = 10.34

So:

Parent Standard Deviation = 10.34

Step 4: Formula used for split selection

M5 uses Standard Deviation Reduction, also called SDR.

SDR = Parent SD - Weighted Child SD

Where:

Weighted Child SD =
(Left Samples / Total Samples × Left SD)
+
(Right Samples / Total Samples × Right SD)

M5 chooses the split with the highest SDR.

Step 5: Check split Size < 1.5

Left node:

Size Price
1 10

Right node:

Size Price
2 12
3 14
4 30
5 32
6 35

Left SD:

Only one value, so Left SD = 0

Right mean:

Right Mean = (12 + 14 + 30 + 32 + 35) / 5
Right Mean = 123 / 5
Right Mean = 24.6

Right variance:

[(12-24.6)^2 + (14-24.6)^2 + (30-24.6)^2 + (32-24.6)^2 + (35-24.6)^2] / 5

= (158.76 + 112.36 + 29.16 + 54.76 + 108.16) / 5
= 463.20 / 5
= 92.64

Right SD:

Right SD = √92.64
Right SD = 9.62

Weighted Child SD:

= (1/6 × 0) + (5/6 × 9.62)
= 0 + 8.02
= 8.02

SDR:

SDR = 10.34 - 8.02
SDR = 2.32

Step 6: Check split Size < 2.5

Left node:

Size Price
1 10
2 12

Right node:

Size Price
3 14
4 30
5 32
6 35

Left mean:

Left Mean = (10 + 12) / 2
Left Mean = 11

Left variance:

[(10-11)^2 + (12-11)^2] / 2
= (1 + 1) / 2
= 1

Left SD:

Left SD = √1
Left SD = 1

Right mean:

Right Mean = (14 + 30 + 32 + 35) / 4
Right Mean = 111 / 4
Right Mean = 27.75

Right variance:

[(14-27.75)^2 + (30-27.75)^2 + (32-27.75)^2 + (35-27.75)^2] / 4

= (189.06 + 5.06 + 18.06 + 52.56) / 4
= 264.74 / 4
= 66.19

Right SD:

Right SD = √66.19
Right SD = 8.14

Weighted Child SD:

= (2/6 × 1) + (4/6 × 8.14)
= 0.33 + 5.43
= 5.76

SDR:

SDR = 10.34 - 5.76
SDR = 4.58

Step 7: Check split Size < 3.5

Left node:

Size Price
1 10
2 12
3 14

Right node:

Size Price
4 30
5 32
6 35

Left mean:

Left Mean = (10 + 12 + 14) / 3
Left Mean = 36 / 3
Left Mean = 12

Left variance:

[(10-12)^2 + (12-12)^2 + (14-12)^2] / 3
= (4 + 0 + 4) / 3
= 8 / 3
= 2.67

Left SD:

Left SD = √2.67
Left SD = 1.63

Right mean:

Right Mean = (30 + 32 + 35) / 3
Right Mean = 97 / 3
Right Mean = 32.33

Right variance:

[(30-32.33)^2 + (32-32.33)^2 + (35-32.33)^2] / 3
= (5.43 + 0.11 + 7.11) / 3
= 12.65 / 3
= 4.22

Right SD:

Right SD = √4.22
Right SD = 2.05

Weighted Child SD:

= (3/6 × 1.63) + (3/6 × 2.05)
= 0.815 + 1.025
= 1.84

SDR:

SDR = 10.34 - 1.84
SDR = 8.50

Step 8: Check split Size < 4.5

Left node:

Size Price
1 10
2 12
3 14
4 30

Right node:

Size Price
5 32
6 35

Left mean:

Left Mean = (10 + 12 + 14 + 30) / 4
Left Mean = 66 / 4
Left Mean = 16.5

Left variance:

[(10-16.5)^2 + (12-16.5)^2 + (14-16.5)^2 + (30-16.5)^2] / 4

= (42.25 + 20.25 + 6.25 + 182.25) / 4
= 251 / 4
= 62.75

Left SD:

Left SD = √62.75
Left SD = 7.92

Right mean:

Right Mean = (32 + 35) / 2
Right Mean = 33.5

Right variance:

[(32-33.5)^2 + (35-33.5)^2] / 2
= (2.25 + 2.25) / 2
= 2.25

Right SD:

Right SD = √2.25
Right SD = 1.5

Weighted Child SD:

= (4/6 × 7.92) + (2/6 × 1.5)
= 5.28 + 0.50
= 5.78

SDR:

SDR = 10.34 - 5.78
SDR = 4.56

Step 9: Check split Size < 5.5

Left node:

Size Price
1 10
2 12
3 14
4 30
5 32

Right node:

Size Price
6 35

Left mean:

Left Mean = (10 + 12 + 14 + 30 + 32) / 5
Left Mean = 98 / 5
Left Mean = 19.6

Left variance:

[(10-19.6)^2 + (12-19.6)^2 + (14-19.6)^2 + (30-19.6)^2 + (32-19.6)^2] / 5

= (92.16 + 57.76 + 31.36 + 108.16 + 153.76) / 5
= 443.20 / 5
= 88.64

Left SD:

Left SD = √88.64
Left SD = 9.41

Right SD:

Only one value, so Right SD = 0

Weighted Child SD:

= (5/6 × 9.41) + (1/6 × 0)
= 7.84 + 0
= 7.84

SDR:

SDR = 10.34 - 7.84
SDR = 2.50

Step 10: Compare all splits

Split SDR
Size < 1.5 2.32
Size < 2.5 4.58
Size < 3.5 8.50
Size < 4.5 4.56
Size < 5.5 2.50

Best split:

Size < 3.5

Reason:

It has the highest SDR value: 8.50

So M5 selects:

Root Split = Size < 3.5

Step 11: Build first tree

              Size < 3.5?
/ \
Yes No
Left Node Right Node

Left node contains:

Size 1, 2, 3
Prices 10, 12, 14

Right node contains:

Size 4, 5, 6
Prices 30, 32, 35

Step 12: Fit linear model in left leaf

Left data:

Size Price
1 10
2 12
3 14

The price increases by 2 when size increases by 1.

So slope:

Slope = 2

Using equation:

Price = m × Size + c

Use point Size = 1, Price = 10:

10 = 2 × 1 + c
10 = 2 + c
c = 8

Left leaf model:

Price = 2 × Size + 8

Verification:

Size Calculation Prediction
1 2×1 + 8 10
2 2×2 + 8 12
3 2×3 + 8 14

Step 13: Fit linear model in right leaf

Right data:

Size Price
4 30
5 32
6 35

Approximate slope:

Slope = (35 - 30) / (6 - 4)
Slope = 5 / 2
Slope = 2.5

Using equation:

Price = m × Size + c

Use point Size = 4, Price = 30:

30 = 2.5 × 4 + c
30 = 10 + c
c = 20

Right leaf model:

Price = 2.5 × Size + 20

Verification:

Size Calculation Prediction
4 2.5×4 + 20 30
5 2.5×5 + 20 32.5
6 2.5×6 + 20 35

Step 14: Final M5 tree

                    Size < 3.5?
/ \
Yes No

Price = 2×Size + 8 Price = 2.5×Size + 20

Step 15: Prediction example

Suppose new house size is:

Size = 5

Start at root:

Is 5 < 3.5?

Answer:

No

So go to right leaf.

Right leaf model:

Price = 2.5 × Size + 20

Substitute Size = 5:

Price = 2.5 × 5 + 20
Price = 12.5 + 20
Price = 32.5

Final prediction:

Predicted Price = 32.5

Summary

M5 first splits the data using Standard Deviation Reduction.

Then it fits linear regression models at the leaf nodes.

CART leaf prediction:

One constant value

M5 leaf prediction:

Linear equation

That is why M5 gives smoother and more accurate numeric predictions.

Previous Topic CART : Example Next Topic Gradient Boosting