Polynomial Kernel : Example - Machine Learning

Problem: XOR Dataset

Consider the following dataset:

Point	x1	x2	Class
A	0	0	-1
B	0	1	+1
C	1	0	+1
D	1	1	-1

Visual representation:

(0,1)  +1        (1,1)  -1


(0,0)  -1        (1,0)  +1

This is a non-linear classification problem.

A straight line cannot separate the two classes because the same class labels are placed diagonally.

So Linear SVM fails here.

Step 1: Why Linear SVM Fails

Linear SVM tries to find a straight-line boundary:

w1x1 + w2x2 + b = 0

But for the XOR dataset:

-1 class points: (0,0), (1,1)
+1 class points: (0,1), (1,0)

The classes are diagonally opposite.

No single straight line can separate them correctly.

So we need a non-linear transformation.

Step 2: Polynomial Kernel Idea

A Polynomial Kernel helps SVM create curved decision boundaries.

Instead of using only:

x1, x2

we create additional polynomial features such as:

x1², x2², x1x2

For this XOR example, the most useful new feature is:

z3 = x1 × x2

So we transform the original 2D data into 3D space:

Original features:
(x1, x2)

Transformed features:
(z1, z2, z3) = (x1, x2, x1x2)

Step 3: Transform the Dataset

Point	x1	x2	x1x2	Transformed Point	Class
A	0	0	0	(0,0,0)	-1
B	0	1	0	(0,1,0)	+1
C	1	0	0	(1,0,0)	+1
D	1	1	1	(1,1,1)	-1

Now the data is represented in a higher-dimensional feature space.

Step 4: Define a Linear Hyperplane in Transformed Space

In transformed space, SVM tries to find:

w1z1 + w2z2 + w3z3 + b = 0

Let us choose:

w1 = 2
w2 = 2
w3 = -4
b = -1

So the decision function becomes:

f(z) = 2z1 + 2z2 - 4z3 - 1

Since:

z1 = x1
z2 = x2
z3 = x1x2

we can write:

f(x) = 2x1 + 2x2 - 4x1x2 - 1

Step 5: Check Point A(0,0)

Actual class:

y = -1

Substitute in the decision function:

f(x) = 2(0) + 2(0) - 4(0×0) - 1

f(x) = 0 + 0 - 0 - 1

f(x) = -1

Since:

f(x) < 0

Prediction:

Class -1

Correct.

Step 6: Check Point B(0,1)

Actual class:

y = +1

f(x) = 2(0) + 2(1) - 4(0×1) - 1

f(x) = 0 + 2 - 0 - 1

f(x) = 1

Since:

f(x) > 0

Prediction:

Class +1

Correct.

Step 7: Check Point C(1,0)

Actual class:

y = +1

f(x) = 2(1) + 2(0) - 4(1×0) - 1

f(x) = 2 + 0 - 0 - 1

f(x) = 1

Since:

f(x) > 0

Prediction:

Class +1

Correct.

Step 8: Check Point D(1,1)

Actual class:

y = -1

f(x) = 2(1) + 2(1) - 4(1×1) - 1

f(x) = 2 + 2 - 4 - 1

f(x) = -1

Since:

f(x) < 0

Prediction:

Class -1

Correct.

Step 9: Verify SVM Margin Condition

SVM condition is:

y × f(x) ≥ 1

Now check all points:

Point	Actual y	f(x)	y × f(x)
A	-1	-1	1
B	+1	+1	1
C	+1	+1	1
D	-1	-1	1

All points satisfy:

y × f(x) = 1

So all four points lie exactly on the margin.

Therefore:

Support Vectors = A, B, C, D

Step 10: Margin Calculation

Weight vector:

w = (2, 2, -4)

Norm of weight vector:

||w|| = √(2² + 2² + (-4)²)

||w|| = √(4 + 4 + 16)

||w|| = √24

||w|| = 4.899

Margin width formula:

Margin Width = 2 / ||w||

Margin Width = 2 / 4.899

Margin Width = 0.408

So the margin width in transformed space is:

0.408 units

Step 11: Predict New Data Point

Prediction function:

f(x) = 2x1 + 2x2 - 4x1x2 - 1

Decision rule:

If f(x) > 0 → Class +1
If f(x) < 0 → Class -1

New Point P(0, 0.8)

f(x) = 2(0) + 2(0.8) - 4(0×0.8) - 1

f(x) = 0 + 1.6 - 0 - 1

f(x) = 0.6

Since:

f(x) > 0

Prediction:

Class +1

New Point Q(0.9, 0.9)

f(x) = 2(0.9) + 2(0.9) - 4(0.9×0.9) - 1

f(x) = 1.8 + 1.8 - 4(0.81) - 1

f(x) = 3.6 - 3.24 - 1

f(x) = -0.64

Since:

f(x) < 0

Prediction:

Class -1

Step 12: Polynomial Kernel Formula

The Polynomial Kernel formula is:

K(xi, xj) = (xiᵀxj + c)^d

Where:

xi, xj = input data points
c      = constant term
d      = degree of polynomial

If:

d = 2

then the kernel creates quadratic relationships such as:

x1², x2², x1x2

These additional features help SVM solve non-linear problems.

Final Result

Original Dataset:
XOR dataset

Problem:
Not linearly separable in 2D space

Kernel Used:
Polynomial Kernel

New Feature Used:
x1x2

Decision Function:
f(x) = 2x1 + 2x2 - 4x1x2 - 1

Prediction Rule:
f(x) > 0 → Class +1
f(x) < 0 → Class -1

Support Vectors:
A(0,0), B(0,1), C(1,0), D(1,1)

Margin Width:
0.408 units

Python Implementation

from sklearn.svm import SVC
import numpy as np

# XOR dataset
X = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])

# Class labels
y = np.array([-1, 1, 1, -1])

# Polynomial Kernel SVM
model = SVC(kernel="poly", degree=2, coef0=1, C=1)

# Train the model
model.fit(X, y)

# Predict training data
predictions = model.predict(X)

print("Training Predictions:")
for point, actual, pred in zip(X, y, predictions):
    print(f"Point: {point}, Actual: {actual}, Predicted: {pred}")

# Predict new points
new_points = np.array([
    [0, 0.8],
    [0.9, 0.9],
    [0.2, 0.1],
    [0.8, 0.1]
])

new_predictions = model.predict(new_points)

print("\nNew Point Predictions:")
for point, pred in zip(new_points, new_predictions):
    print(f"Point: {point}, Predicted Class: {pred}")

# Support vectors
print("\nSupport Vectors:")
print(model.support_vectors_)

print("\nSupport Vector Indices:")
print(model.support_)

print("\nNumber of Support Vectors per Class:")
print(model.n_support_)

Important Points

Polynomial Kernel helps SVM solve non-linear classification problems.
It creates additional polynomial features from the original features.
XOR data cannot be separated using a straight line in 2D space.
After transformation, the data becomes separable in higher-dimensional space.
SVM then finds a linear hyperplane in the transformed space.
In the original space, this appears as a non-linear decision boundary.
Polynomial degree controls the complexity of the decision boundary.
High-degree polynomial kernels may cause overfitting.

Keywords

Polynomial Kernel SVM, Non Linear SVM, Kernel Trick, XOR Classification, Support Vector Machine, Polynomial Features, SVM Decision Boundary, Higher Dimensional Space, Kernel Function, Machine Learning Classification