Polynomial Kernel : Example
Problem: XOR Dataset
Consider the following dataset:
| Point | x1 | x2 | Class |
|---|---|---|---|
| A | 0 | 0 | -1 |
| B | 0 | 1 | +1 |
| C | 1 | 0 | +1 |
| D | 1 | 1 | -1 |
Visual representation:
(0,1) +1 (1,1) -1
(0,0) -1 (1,0) +1
This is a non-linear classification problem.
A straight line cannot separate the two classes because the same class labels are placed diagonally.
So Linear SVM fails here.
Step 1: Why Linear SVM Fails
Linear SVM tries to find a straight-line boundary:
w1x1 + w2x2 + b = 0
But for the XOR dataset:
-1 class points: (0,0), (1,1)
+1 class points: (0,1), (1,0)
The classes are diagonally opposite.
No single straight line can separate them correctly.
So we need a non-linear transformation.
Step 2: Polynomial Kernel Idea
A Polynomial Kernel helps SVM create curved decision boundaries.
Instead of using only:
x1, x2
we create additional polynomial features such as:
x1², x2², x1x2
For this XOR example, the most useful new feature is:
z3 = x1 × x2
So we transform the original 2D data into 3D space:
Original features:
(x1, x2)
Transformed features:
(z1, z2, z3) = (x1, x2, x1x2)
Step 3: Transform the Dataset
| Point | x1 | x2 | x1x2 | Transformed Point | Class |
|---|---|---|---|---|---|
| A | 0 | 0 | 0 | (0,0,0) | -1 |
| B | 0 | 1 | 0 | (0,1,0) | +1 |
| C | 1 | 0 | 0 | (1,0,0) | +1 |
| D | 1 | 1 | 1 | (1,1,1) | -1 |
Now the data is represented in a higher-dimensional feature space.
Step 4: Define a Linear Hyperplane in Transformed Space
In transformed space, SVM tries to find:
w1z1 + w2z2 + w3z3 + b = 0
Let us choose:
w1 = 2
w2 = 2
w3 = -4
b = -1
So the decision function becomes:
f(z) = 2z1 + 2z2 - 4z3 - 1
Since:
z1 = x1
z2 = x2
z3 = x1x2
we can write:
f(x) = 2x1 + 2x2 - 4x1x2 - 1
Step 5: Check Point A(0,0)
Actual class:
y = -1
Substitute in the decision function:
f(x) = 2(0) + 2(0) - 4(0×0) - 1
f(x) = 0 + 0 - 0 - 1
f(x) = -1
Since:
f(x) < 0
Prediction:
Class -1
Correct.
Step 6: Check Point B(0,1)
Actual class:
y = +1
f(x) = 2(0) + 2(1) - 4(0×1) - 1
f(x) = 0 + 2 - 0 - 1
f(x) = 1
Since:
f(x) > 0
Prediction:
Class +1
Correct.
Step 7: Check Point C(1,0)
Actual class:
y = +1
f(x) = 2(1) + 2(0) - 4(1×0) - 1
f(x) = 2 + 0 - 0 - 1
f(x) = 1
Since:
f(x) > 0
Prediction:
Class +1
Correct.
Step 8: Check Point D(1,1)
Actual class:
y = -1
f(x) = 2(1) + 2(1) - 4(1×1) - 1
f(x) = 2 + 2 - 4 - 1
f(x) = -1
Since:
f(x) < 0
Prediction:
Class -1
Correct.
Step 9: Verify SVM Margin Condition
SVM condition is:
y × f(x) ≥ 1
Now check all points:
| Point | Actual y | f(x) | y × f(x) |
|---|---|---|---|
| A | -1 | -1 | 1 |
| B | +1 | +1 | 1 |
| C | +1 | +1 | 1 |
| D | -1 | -1 | 1 |
All points satisfy:
y × f(x) = 1
So all four points lie exactly on the margin.
Therefore:
Support Vectors = A, B, C, D
Step 10: Margin Calculation
Weight vector:
w = (2, 2, -4)
Norm of weight vector:
||w|| = √(2² + 2² + (-4)²)
||w|| = √(4 + 4 + 16)
||w|| = √24
||w|| = 4.899
Margin width formula:
Margin Width = 2 / ||w||
Margin Width = 2 / 4.899
Margin Width = 0.408
So the margin width in transformed space is:
0.408 units
Step 11: Predict New Data Point
Prediction function:
f(x) = 2x1 + 2x2 - 4x1x2 - 1
Decision rule:
If f(x) > 0 → Class +1
If f(x) < 0 → Class -1
New Point P(0, 0.8)
f(x) = 2(0) + 2(0.8) - 4(0×0.8) - 1
f(x) = 0 + 1.6 - 0 - 1
f(x) = 0.6
Since:
f(x) > 0
Prediction:
Class +1
New Point Q(0.9, 0.9)
f(x) = 2(0.9) + 2(0.9) - 4(0.9×0.9) - 1
f(x) = 1.8 + 1.8 - 4(0.81) - 1
f(x) = 3.6 - 3.24 - 1
f(x) = -0.64
Since:
f(x) < 0
Prediction:
Class -1
Step 12: Polynomial Kernel Formula
The Polynomial Kernel formula is:
K(xi, xj) = (xiᵀxj + c)^d
Where:
xi, xj = input data points
c = constant term
d = degree of polynomial
If:
d = 2
then the kernel creates quadratic relationships such as:
x1², x2², x1x2
These additional features help SVM solve non-linear problems.
Final Result
Original Dataset:
XOR dataset
Problem:
Not linearly separable in 2D space
Kernel Used:
Polynomial Kernel
New Feature Used:
x1x2
Decision Function:
f(x) = 2x1 + 2x2 - 4x1x2 - 1
Prediction Rule:
f(x) > 0 → Class +1
f(x) < 0 → Class -1
Support Vectors:
A(0,0), B(0,1), C(1,0), D(1,1)
Margin Width:
0.408 units
Python Implementation
from sklearn.svm import SVC
import numpy as np
# XOR dataset
X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])
# Class labels
y = np.array([-1, 1, 1, -1])
# Polynomial Kernel SVM
model = SVC(kernel="poly", degree=2, coef0=1, C=1)
# Train the model
model.fit(X, y)
# Predict training data
predictions = model.predict(X)
print("Training Predictions:")
for point, actual, pred in zip(X, y, predictions):
print(f"Point: {point}, Actual: {actual}, Predicted: {pred}")
# Predict new points
new_points = np.array([
[0, 0.8],
[0.9, 0.9],
[0.2, 0.1],
[0.8, 0.1]
])
new_predictions = model.predict(new_points)
print("\nNew Point Predictions:")
for point, pred in zip(new_points, new_predictions):
print(f"Point: {point}, Predicted Class: {pred}")
# Support vectors
print("\nSupport Vectors:")
print(model.support_vectors_)
print("\nSupport Vector Indices:")
print(model.support_)
print("\nNumber of Support Vectors per Class:")
print(model.n_support_)
Important Points
-
Polynomial Kernel helps SVM solve non-linear classification problems.
-
It creates additional polynomial features from the original features.
-
XOR data cannot be separated using a straight line in 2D space.
-
After transformation, the data becomes separable in higher-dimensional space.
-
SVM then finds a linear hyperplane in the transformed space.
-
In the original space, this appears as a non-linear decision boundary.
-
Polynomial degree controls the complexity of the decision boundary.
-
High-degree polynomial kernels may cause overfitting.
Keywords
Polynomial Kernel SVM, Non Linear SVM, Kernel Trick, XOR Classification, Support Vector Machine, Polynomial Features, SVM Decision Boundary, Higher Dimensional Space, Kernel Function, Machine Learning Classification