Exploring the Depths of Support Vector Machines: Hard Margin SVM

Support Vector Machines are powerful tools in machine learning for classifying data and predicting values. They are popular in various fields like bioinformatics and financial forecasting because they handle complex data problems well. This article aims to explain SVMs in a simple way, covering topics like maximal margin classifier, support vectors, kernel trick and infinite dimensional mapping.

What is SVMs

An SVM performs classification at its core by finding the hyperplane that best divides a dataset into classes. The unique aspect of SVMs lies in their ability to find the optimal hyperplane that maximizes the margin- the distance between the hyperplane and the nearest data points from each class. A larger margin helps the SVM make better predictions on new data as decision boundaries are much clearer. The SVM does this with the help of Support Vectors.

Support Vectors

Support Vectors are the subset of training data that are directly involved in constructing the optimal separating hyperplane These points are crucial because they lie on the edge of the margin and influence the position and orientation of the hyperplane.

Mathematically, if you have a dataset with class labels $y_i \in$ {-1, +1} and feature $X_i$ the support vectors satisfy the condition:

$y_i (w \cdot x_i + b) = 1$

From where this equation comes?

The hyperplane in d-dimensional space is defined by the equation:

W.X - b = 0
where:

w is the normal vector( normal to the hyperplane).
b is the bias term (offset from the origin).

For a given data point $X_i$ with class label $y_i$ :

if $y_i=1$ , the data point belongs to the positive class.
if $y_i=-1$ , the data point belongs to the negative class.

We want the hyperplane to correctly classify these data points, so we need to ensure that:

Points from the positive class are on one side of the hyperplane. For points on the positive side: $w \cdot x_i - b \geq 1$
Points from the negative class are on the other side.

For points on the negative side:
$w \cdot x_i - b \leq -1$

To combine these constraints into simple form, we use the class label $y_i$ , the constraint can be written as:

$y_i (w \cdot x_i + b) \geq 1$
Here's why:

if $y_i$ = 1, the constraint becomes $w \cdot x_i - b \geq 1$ , which ensures correct classification for positive class points.
if if $y_i$ = -1, the constraint becomes $w \cdot x_i - b \leq 1$ , which ensure correct classification for negative class points.

Margin Calculation

In SVMs, the margin is the distance between the hyperplane and the closest data points from each class(support vectors).

To calculate the margin, we use the following formula:
$\text{Margin} = \frac{2}{||w||}$

From where do we get this formula

The perpendicular distance d from point $X_i$ to hyperplane is:
$d = \frac{|w \cdot x_i - b|}{||w||}$
Now for the support vectors, the distance from the hyperplane is exactly
$d = \frac{1}{||w||}$
This is because support vectors lie on the boundaries of the margin, where
$w \cdot x_i - b = 1$
Therefore:
$d = \frac{1}{||w||}$

Now we need the distance between the both hyperplanes
$w \cdot x_i - b = 1$
$w \cdot x_i - b = -1$
Therefore the distance will be:
$\text{Margin} = \frac{2}{||w||}$

Understanding Hard Margin SVM

The term "Hard Margin" comes from the fact that the algorithm requires all data points to be classified with a margin of at least 1. In other words, there are no allowances for misclassification. These strict requirements are why it's called a "hard" margin

Formulating the Hard Margin SVM

1. Objective function

The goal of Hard margin SVM is to maximize the margin between the two classes. As we previously discussed:

$\text{Margin} = \frac{2}{||w||}$

To maximize the margin, we need to minimize its reciprocal
$\text{Minimize } \frac{1}{2} |w|^2$

Why squaring the norm? Because it provides smoothness and differentiability. This makes it easier to compute gradients and perform optimization using gradient-based methods.
Minimizing the squared norm is equivalent to minimizing the norm because minimizing the $||w||^2$ will always lead to the same optimal W as minimizing $||w||$

2. Constraints:

The constraints ensure that each point is correctly classified and lies at least at a distance of 1 from the hyperplane.
$y_i (w \cdot x_i + b) \geq 1$

3. Hard Margin SVM Optimization Problem:

Putting it all together, the hard-margin SVM optimization problem is:
$\text{Minimize } \frac{1}{2} |w|^2 \text{ subject to } y_i (w \cdot x_i - b) \geq 1, \, \forall i$

Now we need to solve this problem to find the solution

Problem with Hard Margin

While Hard Margin SVMs are effective for linearly separable data, they come with certain limitation

It fails in the case of outliers and misclassified data

In this two points are outliers, in this scenario, hard SVM fails to plot the decision boundary as it tries to classify all the points but is unable to classify these two points.