CART : Example

CART: Step-by-step Regression Example

CART means Classification and Regression Tree.

Here we do regression CART, because target value is numeric.

Goal: predict house price from house size.

Dataset

House Size Price
A 1 10
B 2 12
C 3 14
D 4 30
E 5 32
F 6 35

We have one input:

X = Size

Target:

y = Price

Step 1: Start with all data

Before splitting, CART predicts the average price.

Average:

(10 + 12 + 14 + 30 + 32 + 35) / 6 = 22.17

So root prediction is:

Predict 22.17

But this is not good because small houses and large houses have very different prices.

Step 2: CART tries possible splits

Since sizes are:

1, 2, 3, 4, 5, 6

Possible split points are between values:

1.5, 2.5, 3.5, 4.5, 5.5

So CART checks:

Size < 1.5
Size < 2.5
Size < 3.5
Size < 4.5
Size < 5.5

For each split, it calculates error.

For regression CART, error is usually sum of squared errors.

SSE=∑(yiȳ)2

Step 3: Test split Size < 1.5

Left group:

Size Price
1 10

Mean = 10

Right group:

Size Price
2 12
3 14
4 30
5 32
6 35

Mean:

(12 + 14 + 30 + 32 + 35) / 5 = 24.6

Left error:

(10 - 10)² = 0

Right error:

(12 - 24.6)² + (14 - 24.6)² + (30 - 24.6)² + (32 - 24.6)² + (35 - 24.6)²
= 462.8

Total error:

462.8

Step 4: Test split Size < 2.5

Left group:

Size Price
1 10
2 12

Mean:

11

Right group:

Size Price
3 14
4 30
5 32
6 35

Mean:

27.75

Left error:

(10 - 11)² + (12 - 11)² = 2

Right error:

(14 - 27.75)² + (30 - 27.75)² + (32 - 27.75)² + (35 - 27.75)²
= 265.25

Total error:

267.25

Step 5: Test split Size < 3.5

Left group:

Size Price
1 10
2 12
3 14

Mean:

12

Right group:

Size Price
4 30
5 32
6 35

Mean:

32.33

Left error:

(10 - 12)² + (12 - 12)² + (14 - 12)²
= 8

Right error:

(30 - 32.33)² + (32 - 32.33)² + (35 - 32.33)²
= 12.67

Total error:

20.67

This is much better.

Step 6: Test split Size < 4.5

Left group:

Size Price
1 10
2 12
3 14
4 30

Mean:

16.5

Right group:

Size Price
5 32
6 35

Mean:

33.5

Left error:

(10 - 16.5)² + (12 - 16.5)² + (14 - 16.5)² + (30 - 16.5)²
= 251

Right error:

(32 - 33.5)² + (35 - 33.5)²
= 4.5

Total error:

255.5

Step 7: Test split Size < 5.5

Left group:

Size Price
1 10
2 12
3 14
4 30
5 32

Mean:

19.6

Right group:

Size Price
6 35

Mean:

35

Left error:

458.8

Right error:

0

Total error:

458.8

Step 8: Compare all splits

Split Total Error
Size < 1.5 462.8
Size < 2.5 267.25
Size < 3.5 20.67
Size < 4.5 255.5
Size < 5.5 458.8

Best split is:

Size < 3.5

because it gives the smallest error.

Step 9: Build first tree

              Size < 3.5?
/ \
Yes No
Predict 12 Predict 32.33

This is already a complete CART tree if we stop here.

Step 10: Continue splitting?

CART can continue splitting each side.

Left side has:

Size Price
1 10
2 12
3 14

Current prediction = 12.

Possible left splits:

Size < 1.5
Size < 2.5

If we split fully, we can get:

Size 1 → 10
Size 2 → 12
Size 3 → 14

That gives zero error.

Right side has:

Size Price
4 30
5 32
6 35

Current prediction = 32.33.

Possible right splits:

Size < 4.5
Size < 5.5

If split fully:

Size 4 → 30
Size 5 → 32
Size 6 → 35

Again zero error.

Step 11: Fully grown tree

                    Size < 3.5?
/ \
Yes No
Size < 1.5? Size < 4.5?
/ \ / \
10 Size<2.5? 30 Size<5.5?
/ \ / \
12 14 32 35

This tree perfectly memorizes the training data.

But this may be overfitting.

Step 12: Pruned tree

A simpler tree is:

              Size < 3.5?
/ \
Yes No
Predict 12 Predict 32.33

This is less perfect on training data, but more general.

Step 13: Prediction example

Suppose new house size is:

Size = 5

Start at root:

Is 5 < 3.5?

No.

Go right.

Prediction:

32.33

So CART predicts:

Price = 32.33

CART Algorithm:

CART does this:

1. Start with the complete dataset.
2. Try all possible feature splits.
3. Compute error/impurity for every split.
4. Select the split with minimum error.
5. Divide the dataset into child nodes.
6. Repeat recursively for each child node.
7. Stop when stopping conditions are met.
8. Assign prediction value at leaf nodes.

Important Point

CART prediction is constant inside each region.

For our simple tree:

Size Range Prediction
Size < 3.5 12
Size ≥ 3.5 32.33

So CART does not predict a smooth line.

It predicts step-like values.

Important Formulas

Sum of Squared Error (SSE)

SSE=∑(yi−ȳ)2

Mean Squared Error (MSE)

MSE=1/n * ∑ (yi−ŷ)2

yi = Actual target value

ȳ = Mean (average) target value

ŷ = Predicted target value

n = Total number of samples

Σ = Summation of all values

Advantages of CART

  • Simple and interpretable

  • Handles nonlinear relationships

  • No feature scaling required

  • Works with numerical and categorical features

  • Easy visualization

Disadvantages of CART

  • Overfitting problem

  • High variance

  • Sensitive to small data changes

  • Piecewise constant predictions

  • Lower accuracy compared to ensemble methods

Applications of CART

  • House price prediction

  • Customer churn prediction

  • Medical diagnosis

  • Credit risk analysis

  • Sales forecasting

Previous Topic Regression Evaluation Metrics Next Topic M5 : Example