RBF Kernel : Example

Problem: XOR Dataset

Point x1 x2 Class
A 0 0 -1
B 0 1 +1
C 1 0 +1
D 1 1 -1

Visual representation:

(0,1)  +1        (1,1)  -1


(0,0) -1 (1,0) +1

This is a non-linear classification problem.

A straight line cannot separate these classes.

So we use RBF Kernel SVM.

Step 1: RBF Kernel Formula

The RBF Kernel formula is:

K(xi, xj) = exp(-γ ||xi - xj||²)

Where:

xi, xj = two data points
γ = gamma parameter
||xi - xj||² = squared distance between two points

For this example, take:

γ = 1

So:

K(xi, xj) = exp(-||xi - xj||²)

Step 2: Meaning of RBF Kernel

RBF measures similarity between two points.

If two points are close:

Distance is small
Kernel value is high

If two points are far:

Distance is large
Kernel value is low

Important values:

exp(0)  = 1
exp(-1) = 0.3679
exp(-2) = 0.1353

Step 3: Calculate Squared Distances

Points:

A = (0,0)
B = (0,1)
C = (1,0)
D = (1,1)

Distance A to A

||A - A||² = 0
K(A,A) = exp(0) = 1

Distance A to B

A = (0,0), B = (0,1)
||A - B||² = (0 - 0)² + (0 - 1)²
= 0² + (-1)²
= 1
K(A,B) = exp(-1) = 0.3679

Distance A to C

A = (0,0), C = (1,0)
||A - C||² = (0 - 1)² + (0 - 0)²
= (-1)² + 0²
= 1
K(A,C) = exp(-1) = 0.3679

Distance A to D

A = (0,0), D = (1,1)
||A - D||² = (0 - 1)² + (0 - 1)²
= (-1)² + (-1)²
= 2
K(A,D) = exp(-2) = 0.1353

Step 4: Kernel Matrix

Using the same method for all pairs:

Kernel A B C D
A 1 0.3679 0.3679 0.1353
B 0.3679 1 0.1353 0.3679
C 0.3679 0.1353 1 0.3679
D 0.1353 0.3679 0.3679 1

This matrix tells how similar every point is to every other point.

Step 5: RBF SVM Decision Function

Kernel SVM prediction uses:

f(x) = Σ αi yi K(xi, x) + b

Where:

αi = alpha value learned by SVM
yi = class label of training point
K(xi, x) = RBF similarity between training point and new point
b = bias

For this symmetric XOR dataset, the learned values become:

αA = αB = αC = αD = 2.503
b = 0

So all four points are support vectors.

Step 6: Decision Function for This Dataset

Labels:

A = -1
B = +1
C = +1
D = -1

Decision function:

f(x) =
2.503[-K(A,x) + K(B,x) + K(C,x) - K(D,x)]

Prediction rule:

If f(x) > 0 → Class +1
If f(x) < 0 → Class -1

Step 7: Check Training Point A(0,0)

We need:

K(A,A), K(B,A), K(C,A), K(D,A)

From kernel matrix:

K(A,A) = 1
K(B,A) = 0.3679
K(C,A) = 0.3679
K(D,A) = 0.1353

Substitute:

f(A) = 2.503[-1 + 0.3679 + 0.3679 - 0.1353]
f(A) = 2.503[-0.3995]
f(A) ≈ -1

Prediction:

Class -1

Correct.

Step 8: Check Training Point B(0,1)

From kernel matrix:

K(A,B) = 0.3679
K(B,B) = 1
K(C,B) = 0.1353
K(D,B) = 0.3679

Substitute:

f(B) = 2.503[-0.3679 + 1 + 0.1353 - 0.3679]
f(B) = 2.503[0.3995]
f(B) ≈ +1

Prediction:

Class +1

Correct.

Step 9: Check Training Point C(1,0)

From kernel matrix:

K(A,C) = 0.3679
K(B,C) = 0.1353
K(C,C) = 1
K(D,C) = 0.3679

Substitute:

f(C) = 2.503[-0.3679 + 0.1353 + 1 - 0.3679]
f(C) = 2.503[0.3995]
f(C) ≈ +1

Prediction:

Class +1

Correct.

Step 10: Check Training Point D(1,1)

From kernel matrix:

K(A,D) = 0.1353
K(B,D) = 0.3679
K(C,D) = 0.3679
K(D,D) = 1

Substitute:

f(D) = 2.503[-0.1353 + 0.3679 + 0.3679 - 1]
f(D) = 2.503[-0.3995]
f(D) ≈ -1

Prediction:

Class -1

Correct.

Step 11: Verify SVM Margin Condition

SVM condition:

y × f(x) ≥ 1
Point y f(x) y × f(x)
A -1 -1 1
B +1 +1 1
C +1 +1 1
D -1 -1 1

All points satisfy:

y × f(x) = 1

Therefore:

Support Vectors = A, B, C, D

Step 12: Predict New Point P(0, 0.8)

New point:

P = (0, 0.8)

Calculate RBF similarity with all support vectors.

K(A, P)

A = (0,0), P = (0,0.8)
||A - P||² = (0 - 0)² + (0 - 0.8)²
= 0 + 0.64
= 0.64
K(A,P) = exp(-0.64) = 0.5273

K(B, P)

B = (0,1), P = (0,0.8)
||B - P||² = (0 - 0)² + (1 - 0.8)²
= 0 + 0.04
= 0.04
K(B,P) = exp(-0.04) = 0.9608

K(C, P)

C = (1,0), P = (0,0.8)
||C - P||² = (1 - 0)² + (0 - 0.8)²
= 1 + 0.64
= 1.64
K(C,P) = exp(-1.64) = 0.1940

K(D, P)

D = (1,1), P = (0,0.8)
||D - P||² = (1 - 0)² + (1 - 0.8)²
= 1 + 0.04
= 1.04
K(D,P) = exp(-1.04) = 0.3535

Calculate f(P)

f(P) = 2.503[-K(A,P) + K(B,P) + K(C,P) - K(D,P)]
f(P) = 2.503[-0.5273 + 0.9608 + 0.1940 - 0.3535]
f(P) = 2.503[0.2740]
f(P) = 0.686

Since:

f(P) > 0

Prediction:

Class +1

Step 13: Predict New Point Q(0.9, 0.9)

New point:

Q = (0.9, 0.9)

K(A, Q)

||A - Q||² = (0 - 0.9)² + (0 - 0.9)²
= 0.81 + 0.81
= 1.62
K(A,Q) = exp(-1.62) = 0.1979

K(B, Q)

||B - Q||² = (0 - 0.9)² + (1 - 0.9)²
= 0.81 + 0.01
= 0.82
K(B,Q) = exp(-0.82) = 0.4404

K(C, Q)

||C - Q||² = (1 - 0.9)² + (0 - 0.9)²
= 0.01 + 0.81
= 0.82
K(C,Q) = exp(-0.82) = 0.4404

K(D, Q)

||D - Q||² = (1 - 0.9)² + (1 - 0.9)²
= 0.01 + 0.01
= 0.02
K(D,Q) = exp(-0.02) = 0.9802

Calculate f(Q)

f(Q) = 2.503[-K(A,Q) + K(B,Q) + K(C,Q) - K(D,Q)]
f(Q) = 2.503[-0.1979 + 0.4404 + 0.4404 - 0.9802]
f(Q) = 2.503[-0.2973]
f(Q) = -0.744

Since:

f(Q) < 0

Prediction:

Class -1

Step 14: Meaning of Gamma in RBF

Gamma controls how far the influence of each point reaches.

Small Gamma

Wide influence area
Smooth boundary
May underfit

Large Gamma

Small influence area
Very complex boundary
May overfit

So gamma controls the flexibility of the decision boundary.

Python Implementation

from sklearn.svm import SVC
import numpy as np

# XOR dataset
X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])

y = np.array([-1, 1, 1, -1])

# RBF Kernel SVM
model = SVC(kernel="rbf", gamma=1, C=1000)

# Train model
model.fit(X, y)

# Predict training data
training_predictions = model.predict(X)

print("Training Predictions:")
for point, actual, pred in zip(X, y, training_predictions):
print(f"Point: {point}, Actual: {actual}, Predicted: {pred}")

# Predict new points
new_points = np.array([
[0, 0.8],
[0.9, 0.9]
])

new_predictions = model.predict(new_points)

print("\nNew Point Predictions:")
for point, pred in zip(new_points, new_predictions):
print(f"Point: {point}, Predicted Class: {pred}")

# Support vectors
print("\nSupport Vectors:")
print(model.support_vectors_)

print("\nSupport Vector Indices:")
print(model.support_)

print("\nNumber of Support Vectors per Class:")
print(model.n_support_)

Final Result

Dataset:
XOR Dataset

Kernel Used:
RBF Kernel

Gamma:
γ = 1

Decision Function:
f(x) = Σ αi yi K(xi, x) + b

Support Vectors:
A(0,0), B(0,1), C(1,0), D(1,1)

Prediction:
P(0,0.8) → +1
Q(0.9,0.9) → -1

Important Points

  • RBF Kernel is used for non-linear classification.

  • It measures similarity between points.

  • Nearby points have high similarity.

  • Faraway points have low similarity.

  • RBF can create curved and complex decision boundaries.

  • Gamma controls the influence range of training points.

  • Small gamma creates smoother boundaries.

  • Large gamma creates more complex boundaries.

  • RBF is one of the most commonly used SVM kernels.

Previous Topic Polynomial Kernel : Example Next Topic Naive Bayes Classifier