Support Vector Machines are powerful tools in machine learning for classifying data and predicting values. They are popular in various fields like bioinformatics and financial forecasting because they handle complex data problems well. This article aims to explain SVMs in a simple way, covering topics like maximal margin classifier, support vectors, kernel trick and infinite dimensional mapping.
What is SVMs
An SVM performs classification at its core by finding the hyperplane that best divides a dataset into classes. The unique aspect of SVMs lies in their ability to find the optimal hyperplane that maximizes the margin- the distance between the hyperplane and the nearest data points from each class. A larger margin helps the SVM make better predictions on new data as decision boundaries are much clearer. The SVM does this with the help of Support Vectors.
Support Vectors
Support Vectors are the subset of training data that are directly involved in constructing the optimal separating hyperplane These points are crucial because they lie on the edge of the margin and influence the position and orientation of the hyperplane.
Mathematically, if you have a dataset with class labels {-1, +1} and feature the support vectors satisfy the condition:
From where this equation comes?
The hyperplane in d-dimensional space is defined by the equation:
W.X - b = 0
where:
- w is the normal vector( normal to the hyperplane).
- b is the bias term (offset from the origin).
For a given data point with class label :
- if , the data point belongs to the positive class.
- if , the data point belongs to the negative class.
We want the hyperplane to correctly classify these data points, so we need to ensure that:
- Points from the positive class are on one side of the hyperplane. For points on the positive side:
-
Points from the negative class are on the other side.
For points on the negative side:
To combine these constraints into simple form, we use the class label , the constraint can be written as:
Here's why:
- if = 1, the constraint becomes , which ensures correct classification for positive class points.
- if if = -1, the constraint becomes , which ensure correct classification for negative class points.
Margin Calculation
In SVMs, the margin is the distance between the hyperplane and the closest data points from each class(support vectors).
To calculate the margin, we use the following formula:
From where do we get this formula
The perpendicular distance d from point
to hyperplane is:
Now for the support vectors, the distance from the hyperplane is exactly
This is because support vectors lie on the boundaries of the margin, where
Therefore:
Now we need the distance between the both hyperplanes
Therefore the distance will be:
Understanding Hard Margin SVM
The term "Hard Margin" comes from the fact that the algorithm requires all data points to be classified with a margin of at least 1. In other words, there are no allowances for misclassification. These strict requirements are why it's called a "hard" margin
Formulating the Hard Margin SVM
1. Objective function
The goal of Hard margin SVM is to maximize the margin between the two classes. As we previously discussed:
To maximize the margin, we need to minimize its reciprocal
Why squaring the norm? Because it provides smoothness and differentiability. This makes it easier to compute gradients and perform optimization using gradient-based methods.
Minimizing the squared norm is equivalent to minimizing the norm because minimizing the
will always lead to the same optimal W as minimizing
2. Constraints:
The constraints ensure that each point is correctly classified and lies at least at a distance of 1 from the hyperplane.
3. Hard Margin SVM Optimization Problem:
Putting it all together, the hard-margin SVM optimization problem is:
Now we need to solve this problem to find the solution
Problem with Hard Margin
While Hard Margin SVMs are effective for linearly separable data, they come with certain limitation
It fails in the case of outliers and misclassified data
In this two points are outliers, in this scenario, hard SVM fails to plot the decision boundary as it tries to classify all the points but is unable to classify these two points.
To tackle this Soft-Margin SVM is used
Top comments (0)