CART : Example - Machine Learning

CART: Step-by-step Regression Example

CART means Classification and Regression Tree.

Here we do regression CART, because target value is numeric.

Goal: predict house price from house size.

Dataset

House	Size	Price
A	1	10
B	2	12
C	3	14
D	4	30
E	5	32
F	6	35

We have one input:

X = Size

Target:

y = Price

Step 1: Start with all data

Before splitting, CART predicts the average price.

Average:

(10 + 12 + 14 + 30 + 32 + 35) / 6 = 22.17

So root prediction is:

Predict 22.17

But this is not good because small houses and large houses have very different prices.

Step 2: CART tries possible splits

Since sizes are:

1, 2, 3, 4, 5, 6

Possible split points are between values:

1.5, 2.5, 3.5, 4.5, 5.5

So CART checks:

Size < 1.5
Size < 2.5
Size < 3.5
Size < 4.5
Size < 5.5

For each split, it calculates error.

For regression CART, error is usually sum of squared errors.

SSE=∑(y_i−)²

Step 3: Test split Size < 1.5

Left group:

Size	Price
1	10

Mean = 10

Right group:

Size	Price
2	12
3	14
4	30
5	32
6	35

Mean:

(12 + 14 + 30 + 32 + 35) / 5 = 24.6

Left error:

(10 - 10)² = 0

Right error:

(12 - 24.6)² + (14 - 24.6)² + (30 - 24.6)² + (32 - 24.6)² + (35 - 24.6)²
= 462.8

Total error:

462.8

Step 4: Test split Size < 2.5

Left group:

Size	Price
1	10
2	12

Mean:

Right group:

Size	Price
3	14
4	30
5	32
6	35

Mean:

27.75

Left error:

(10 - 11)² + (12 - 11)² = 2

Right error:

(14 - 27.75)² + (30 - 27.75)² + (32 - 27.75)² + (35 - 27.75)²
= 265.25

Total error:

267.25

Step 5: Test split Size < 3.5

Left group:

Size	Price
1	10
2	12
3	14

Mean:

Right group:

Size	Price
4	30
5	32
6	35

Mean:

32.33

Left error:

(10 - 12)² + (12 - 12)² + (14 - 12)²
= 8

Right error:

(30 - 32.33)² + (32 - 32.33)² + (35 - 32.33)²
= 12.67

Total error:

20.67

This is much better.

Step 6: Test split Size < 4.5

Left group:

Size	Price
1	10
2	12
3	14
4	30

Mean:

16.5

Right group:

Size	Price
5	32
6	35

Mean:

33.5

Left error:

(10 - 16.5)² + (12 - 16.5)² + (14 - 16.5)² + (30 - 16.5)²
= 251

Right error:

(32 - 33.5)² + (35 - 33.5)²
= 4.5

Total error:

255.5

Step 7: Test split Size < 5.5

Left group:

Size	Price
1	10
2	12
3	14
4	30
5	32

Mean:

19.6

Right group:

Size	Price
6	35

Mean:

Left error:

458.8

Right error:

Total error:

458.8

Step 8: Compare all splits

Split	Total Error
Size < 1.5	462.8
Size < 2.5	267.25
Size < 3.5	20.67
Size < 4.5	255.5
Size < 5.5	458.8

Best split is:

Size < 3.5

because it gives the smallest error.

Step 9: Build first tree

              Size < 3.5?
              /         \
           Yes           No
       Predict 12     Predict 32.33

This is already a complete CART tree if we stop here.

Step 10: Continue splitting?

CART can continue splitting each side.

Left side has:

Size	Price
1	10
2	12
3	14

Current prediction = 12.

Possible left splits:

Size < 1.5
Size < 2.5

If we split fully, we can get:

Size 1 → 10
Size 2 → 12
Size 3 → 14

That gives zero error.

Right side has:

Size	Price
4	30
5	32
6	35

Current prediction = 32.33.

Possible right splits:

Size < 4.5
Size < 5.5

If split fully:

Size 4 → 30
Size 5 → 32
Size 6 → 35

Again zero error.

Step 11: Fully grown tree

                    Size < 3.5?
                    /         \
                 Yes           No
              Size < 1.5?    Size < 4.5?
              /      \        /       \
            10     Size<2.5? 30      Size<5.5?
                   /    \             /     \
                 12     14           32      35

This tree perfectly memorizes the training data.

But this may be overfitting.

Step 12: Pruned tree

A simpler tree is:

              Size < 3.5?
              /         \
           Yes           No
       Predict 12     Predict 32.33

This is less perfect on training data, but more general.

Step 13: Prediction example

Suppose new house size is:

Size = 5

Start at root:

Is 5 < 3.5?

No.

Go right.

Prediction:

32.33

So CART predicts:

Price = 32.33

CART Algorithm:

CART does this:

1. Start with the complete dataset.
2. Try all possible feature splits.
3. Compute error/impurity for every split.
4. Select the split with minimum error.
5. Divide the dataset into child nodes.
6. Repeat recursively for each child node.
7. Stop when stopping conditions are met.
8. Assign prediction value at leaf nodes.

Important Point

CART prediction is constant inside each region.

For our simple tree:

Size Range	Prediction
Size < 3.5	12
Size ≥ 3.5	32.33

So CART does not predict a smooth line.

It predicts step-like values.

Important Formulas

Sum of Squared Error (SSE)

Mean Squared Error (MSE)

yi = Actual target value

ȳ = Mean (average) target value

ŷ = Predicted target value

n = Total number of samples

Σ = Summation of all values

Advantages of CART

Simple and interpretable
Handles nonlinear relationships
No feature scaling required
Works with numerical and categorical features
Easy visualization

Disadvantages of CART

Overfitting problem
High variance
Sensitive to small data changes
Piecewise constant predictions
Lower accuracy compared to ensemble methods

Applications of CART

House price prediction
Customer churn prediction
Medical diagnosis
Credit risk analysis
Sales forecasting