RBF Kernel : Example
Problem: XOR Dataset
| Point | x1 | x2 | Class |
|---|---|---|---|
| A | 0 | 0 | -1 |
| B | 0 | 1 | +1 |
| C | 1 | 0 | +1 |
| D | 1 | 1 | -1 |
Visual representation:
(0,1) +1 (1,1) -1
(0,0) -1 (1,0) +1
This is a non-linear classification problem.
A straight line cannot separate these classes.
So we use RBF Kernel SVM.
Step 1: RBF Kernel Formula
The RBF Kernel formula is:
K(xi, xj) = exp(-γ ||xi - xj||²)
Where:
xi, xj = two data points
γ = gamma parameter
||xi - xj||² = squared distance between two points
For this example, take:
γ = 1
So:
K(xi, xj) = exp(-||xi - xj||²)
Step 2: Meaning of RBF Kernel
RBF measures similarity between two points.
If two points are close:
Distance is small
Kernel value is high
If two points are far:
Distance is large
Kernel value is low
Important values:
exp(0) = 1
exp(-1) = 0.3679
exp(-2) = 0.1353
Step 3: Calculate Squared Distances
Points:
A = (0,0)
B = (0,1)
C = (1,0)
D = (1,1)
Distance A to A
||A - A||² = 0
K(A,A) = exp(0) = 1
Distance A to B
A = (0,0), B = (0,1)
||A - B||² = (0 - 0)² + (0 - 1)²
= 0² + (-1)²
= 1
K(A,B) = exp(-1) = 0.3679
Distance A to C
A = (0,0), C = (1,0)
||A - C||² = (0 - 1)² + (0 - 0)²
= (-1)² + 0²
= 1
K(A,C) = exp(-1) = 0.3679
Distance A to D
A = (0,0), D = (1,1)
||A - D||² = (0 - 1)² + (0 - 1)²
= (-1)² + (-1)²
= 2
K(A,D) = exp(-2) = 0.1353
Step 4: Kernel Matrix
Using the same method for all pairs:
| Kernel | A | B | C | D |
|---|---|---|---|---|
| A | 1 | 0.3679 | 0.3679 | 0.1353 |
| B | 0.3679 | 1 | 0.1353 | 0.3679 |
| C | 0.3679 | 0.1353 | 1 | 0.3679 |
| D | 0.1353 | 0.3679 | 0.3679 | 1 |
This matrix tells how similar every point is to every other point.
Step 5: RBF SVM Decision Function
Kernel SVM prediction uses:
f(x) = Σ αi yi K(xi, x) + b
Where:
αi = alpha value learned by SVM
yi = class label of training point
K(xi, x) = RBF similarity between training point and new point
b = bias
For this symmetric XOR dataset, the learned values become:
αA = αB = αC = αD = 2.503
b = 0
So all four points are support vectors.
Step 6: Decision Function for This Dataset
Labels:
A = -1
B = +1
C = +1
D = -1
Decision function:
f(x) =
2.503[-K(A,x) + K(B,x) + K(C,x) - K(D,x)]
Prediction rule:
If f(x) > 0 → Class +1
If f(x) < 0 → Class -1
Step 7: Check Training Point A(0,0)
We need:
K(A,A), K(B,A), K(C,A), K(D,A)
From kernel matrix:
K(A,A) = 1
K(B,A) = 0.3679
K(C,A) = 0.3679
K(D,A) = 0.1353
Substitute:
f(A) = 2.503[-1 + 0.3679 + 0.3679 - 0.1353]
f(A) = 2.503[-0.3995]
f(A) ≈ -1
Prediction:
Class -1
Correct.
Step 8: Check Training Point B(0,1)
From kernel matrix:
K(A,B) = 0.3679
K(B,B) = 1
K(C,B) = 0.1353
K(D,B) = 0.3679
Substitute:
f(B) = 2.503[-0.3679 + 1 + 0.1353 - 0.3679]
f(B) = 2.503[0.3995]
f(B) ≈ +1
Prediction:
Class +1
Correct.
Step 9: Check Training Point C(1,0)
From kernel matrix:
K(A,C) = 0.3679
K(B,C) = 0.1353
K(C,C) = 1
K(D,C) = 0.3679
Substitute:
f(C) = 2.503[-0.3679 + 0.1353 + 1 - 0.3679]
f(C) = 2.503[0.3995]
f(C) ≈ +1
Prediction:
Class +1
Correct.
Step 10: Check Training Point D(1,1)
From kernel matrix:
K(A,D) = 0.1353
K(B,D) = 0.3679
K(C,D) = 0.3679
K(D,D) = 1
Substitute:
f(D) = 2.503[-0.1353 + 0.3679 + 0.3679 - 1]
f(D) = 2.503[-0.3995]
f(D) ≈ -1
Prediction:
Class -1
Correct.
Step 11: Verify SVM Margin Condition
SVM condition:
y × f(x) ≥ 1
| Point | y | f(x) | y × f(x) |
|---|---|---|---|
| A | -1 | -1 | 1 |
| B | +1 | +1 | 1 |
| C | +1 | +1 | 1 |
| D | -1 | -1 | 1 |
All points satisfy:
y × f(x) = 1
Therefore:
Support Vectors = A, B, C, D
Step 12: Predict New Point P(0, 0.8)
New point:
P = (0, 0.8)
Calculate RBF similarity with all support vectors.
K(A, P)
A = (0,0), P = (0,0.8)
||A - P||² = (0 - 0)² + (0 - 0.8)²
= 0 + 0.64
= 0.64
K(A,P) = exp(-0.64) = 0.5273
K(B, P)
B = (0,1), P = (0,0.8)
||B - P||² = (0 - 0)² + (1 - 0.8)²
= 0 + 0.04
= 0.04
K(B,P) = exp(-0.04) = 0.9608
K(C, P)
C = (1,0), P = (0,0.8)
||C - P||² = (1 - 0)² + (0 - 0.8)²
= 1 + 0.64
= 1.64
K(C,P) = exp(-1.64) = 0.1940
K(D, P)
D = (1,1), P = (0,0.8)
||D - P||² = (1 - 0)² + (1 - 0.8)²
= 1 + 0.04
= 1.04
K(D,P) = exp(-1.04) = 0.3535
Calculate f(P)
f(P) = 2.503[-K(A,P) + K(B,P) + K(C,P) - K(D,P)]
f(P) = 2.503[-0.5273 + 0.9608 + 0.1940 - 0.3535]
f(P) = 2.503[0.2740]
f(P) = 0.686
Since:
f(P) > 0
Prediction:
Class +1
Step 13: Predict New Point Q(0.9, 0.9)
New point:
Q = (0.9, 0.9)
K(A, Q)
||A - Q||² = (0 - 0.9)² + (0 - 0.9)²
= 0.81 + 0.81
= 1.62
K(A,Q) = exp(-1.62) = 0.1979
K(B, Q)
||B - Q||² = (0 - 0.9)² + (1 - 0.9)²
= 0.81 + 0.01
= 0.82
K(B,Q) = exp(-0.82) = 0.4404
K(C, Q)
||C - Q||² = (1 - 0.9)² + (0 - 0.9)²
= 0.01 + 0.81
= 0.82
K(C,Q) = exp(-0.82) = 0.4404
K(D, Q)
||D - Q||² = (1 - 0.9)² + (1 - 0.9)²
= 0.01 + 0.01
= 0.02
K(D,Q) = exp(-0.02) = 0.9802
Calculate f(Q)
f(Q) = 2.503[-K(A,Q) + K(B,Q) + K(C,Q) - K(D,Q)]
f(Q) = 2.503[-0.1979 + 0.4404 + 0.4404 - 0.9802]
f(Q) = 2.503[-0.2973]
f(Q) = -0.744
Since:
f(Q) < 0
Prediction:
Class -1
Step 14: Meaning of Gamma in RBF
Gamma controls how far the influence of each point reaches.
Small Gamma
Wide influence area
Smooth boundary
May underfit
Large Gamma
Small influence area
Very complex boundary
May overfit
So gamma controls the flexibility of the decision boundary.
Python Implementation
from sklearn.svm import SVC
import numpy as np
# XOR dataset
X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])
y = np.array([-1, 1, 1, -1])
# RBF Kernel SVM
model = SVC(kernel="rbf", gamma=1, C=1000)
# Train model
model.fit(X, y)
# Predict training data
training_predictions = model.predict(X)
print("Training Predictions:")
for point, actual, pred in zip(X, y, training_predictions):
print(f"Point: {point}, Actual: {actual}, Predicted: {pred}")
# Predict new points
new_points = np.array([
[0, 0.8],
[0.9, 0.9]
])
new_predictions = model.predict(new_points)
print("\nNew Point Predictions:")
for point, pred in zip(new_points, new_predictions):
print(f"Point: {point}, Predicted Class: {pred}")
# Support vectors
print("\nSupport Vectors:")
print(model.support_vectors_)
print("\nSupport Vector Indices:")
print(model.support_)
print("\nNumber of Support Vectors per Class:")
print(model.n_support_)
Final Result
Dataset:
XOR Dataset
Kernel Used:
RBF Kernel
Gamma:
γ = 1
Decision Function:
f(x) = Σ αi yi K(xi, x) + b
Support Vectors:
A(0,0), B(0,1), C(1,0), D(1,1)
Prediction:
P(0,0.8) → +1
Q(0.9,0.9) → -1
Important Points
-
RBF Kernel is used for non-linear classification.
-
It measures similarity between points.
-
Nearby points have high similarity.
-
Faraway points have low similarity.
-
RBF can create curved and complex decision boundaries.
-
Gamma controls the influence range of training points.
-
Small gamma creates smoother boundaries.
-
Large gamma creates more complex boundaries.
-
RBF is one of the most commonly used SVM kernels.