CART : Example
CART: Step-by-step Regression Example
CART means Classification and Regression Tree.
Here we do regression CART, because target value is numeric.
Goal: predict house price from house size.
Dataset
| House | Size | Price |
|---|---|---|
| A | 1 | 10 |
| B | 2 | 12 |
| C | 3 | 14 |
| D | 4 | 30 |
| E | 5 | 32 |
| F | 6 | 35 |
We have one input:
X = Size
Target:
y = Price
Step 1: Start with all data
Before splitting, CART predicts the average price.
Average:
(10 + 12 + 14 + 30 + 32 + 35) / 6 = 22.17
So root prediction is:
Predict 22.17
But this is not good because small houses and large houses have very different prices.
Step 2: CART tries possible splits
Since sizes are:
1, 2, 3, 4, 5, 6
Possible split points are between values:
1.5, 2.5, 3.5, 4.5, 5.5
So CART checks:
Size < 1.5
Size < 2.5
Size < 3.5
Size < 4.5
Size < 5.5
For each split, it calculates error.
For regression CART, error is usually sum of squared errors.
SSE=∑(yi−ȳ)2
Step 3: Test split Size < 1.5
Left group:
| Size | Price |
|---|---|
| 1 | 10 |
Mean = 10
Right group:
| Size | Price |
|---|---|
| 2 | 12 |
| 3 | 14 |
| 4 | 30 |
| 5 | 32 |
| 6 | 35 |
Mean:
(12 + 14 + 30 + 32 + 35) / 5 = 24.6
Left error:
(10 - 10)² = 0
Right error:
(12 - 24.6)² + (14 - 24.6)² + (30 - 24.6)² + (32 - 24.6)² + (35 - 24.6)²
= 462.8
Total error:
462.8
Step 4: Test split Size < 2.5
Left group:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
Mean:
11
Right group:
| Size | Price |
|---|---|
| 3 | 14 |
| 4 | 30 |
| 5 | 32 |
| 6 | 35 |
Mean:
27.75
Left error:
(10 - 11)² + (12 - 11)² = 2
Right error:
(14 - 27.75)² + (30 - 27.75)² + (32 - 27.75)² + (35 - 27.75)²
= 265.25
Total error:
267.25
Step 5: Test split Size < 3.5
Left group:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
Mean:
12
Right group:
| Size | Price |
|---|---|
| 4 | 30 |
| 5 | 32 |
| 6 | 35 |
Mean:
32.33
Left error:
(10 - 12)² + (12 - 12)² + (14 - 12)²
= 8
Right error:
(30 - 32.33)² + (32 - 32.33)² + (35 - 32.33)²
= 12.67
Total error:
20.67
This is much better.
Step 6: Test split Size < 4.5
Left group:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
| 4 | 30 |
Mean:
16.5
Right group:
| Size | Price |
|---|---|
| 5 | 32 |
| 6 | 35 |
Mean:
33.5
Left error:
(10 - 16.5)² + (12 - 16.5)² + (14 - 16.5)² + (30 - 16.5)²
= 251
Right error:
(32 - 33.5)² + (35 - 33.5)²
= 4.5
Total error:
255.5
Step 7: Test split Size < 5.5
Left group:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
| 4 | 30 |
| 5 | 32 |
Mean:
19.6
Right group:
| Size | Price |
|---|---|
| 6 | 35 |
Mean:
35
Left error:
458.8
Right error:
0
Total error:
458.8
Step 8: Compare all splits
| Split | Total Error |
|---|---|
| Size < 1.5 | 462.8 |
| Size < 2.5 | 267.25 |
| Size < 3.5 | 20.67 |
| Size < 4.5 | 255.5 |
| Size < 5.5 | 458.8 |
Best split is:
Size < 3.5
because it gives the smallest error.
Step 9: Build first tree
Size < 3.5?
/ \
Yes No
Predict 12 Predict 32.33
This is already a complete CART tree if we stop here.
Step 10: Continue splitting?
CART can continue splitting each side.
Left side has:
| Size | Price |
|---|---|
| 1 | 10 |
| 2 | 12 |
| 3 | 14 |
Current prediction = 12.
Possible left splits:
Size < 1.5
Size < 2.5
If we split fully, we can get:
Size 1 → 10
Size 2 → 12
Size 3 → 14
That gives zero error.
Right side has:
| Size | Price |
|---|---|
| 4 | 30 |
| 5 | 32 |
| 6 | 35 |
Current prediction = 32.33.
Possible right splits:
Size < 4.5
Size < 5.5
If split fully:
Size 4 → 30
Size 5 → 32
Size 6 → 35
Again zero error.
Step 11: Fully grown tree
Size < 3.5?
/ \
Yes No
Size < 1.5? Size < 4.5?
/ \ / \
10 Size<2.5? 30 Size<5.5?
/ \ / \
12 14 32 35
This tree perfectly memorizes the training data.
But this may be overfitting.
Step 12: Pruned tree
A simpler tree is:
Size < 3.5?
/ \
Yes No
Predict 12 Predict 32.33
This is less perfect on training data, but more general.
Step 13: Prediction example
Suppose new house size is:
Size = 5
Start at root:
Is 5 < 3.5?
No.
Go right.
Prediction:
32.33
So CART predicts:
Price = 32.33
CART Algorithm:
CART does this:
1. Start with the complete dataset.
2. Try all possible feature splits.
3. Compute error/impurity for every split.
4. Select the split with minimum error.
5. Divide the dataset into child nodes.
6. Repeat recursively for each child node.
7. Stop when stopping conditions are met.
8. Assign prediction value at leaf nodes.
Important Point
CART prediction is constant inside each region.
For our simple tree:
| Size Range | Prediction |
|---|---|
| Size < 3.5 | 12 |
| Size ≥ 3.5 | 32.33 |
So CART does not predict a smooth line.
It predicts step-like values.
Important Formulas
Sum of Squared Error (SSE)
SSE=∑(yi−ȳ)2
Mean Squared Error (MSE)
MSE=1/n * ∑ (yi−ŷ)2
yi = Actual target value
ȳ = Mean (average) target value
ŷ = Predicted target value
n = Total number of samples
Σ = Summation of all values
Advantages of CART
-
Simple and interpretable
-
Handles nonlinear relationships
-
No feature scaling required
-
Works with numerical and categorical features
-
Easy visualization
Disadvantages of CART
-
Overfitting problem
-
High variance
-
Sensitive to small data changes
-
Piecewise constant predictions
-
Lower accuracy compared to ensemble methods
Applications of CART
-
House price prediction
-
Customer churn prediction
-
Medical diagnosis
-
Credit risk analysis
-
Sales forecasting