By this time we already know how SVM works geometrically and if you don’t then you can have a look here. Once we understand the geometrical interpretation of SVM it is very trivial to understand the working of SVM but why the Mathematical formulation? If we have a good grasp of the internal mathematics of not only SVM but any ML model then we will be able to change the model as per our requirements. So let’s understand the mathematics of SVM in some details
Let we have some points and they are further linearly separable. We have a hyperplane π separating thee two classes and along with it we have two more hyperplanes π+ and π- just as we talked about while understanding the geometrical interpretation. The distance d between π+ and π- is called the margin.
The equation of the hyperplane π :
W^T => transpose of W
Here W is not necessarily an unit vector, it can be any vector
let π+ : (W^T)X + b =+1
π-: (W^T )X + b = -1
Here many of us may have a question that why we took the value of +1 or -1 but actually that is not something we should be concerned about because the value can be anything and not necessarily be +1 or -1. We will look into it in details a bit later.
Now we need to calculate the value of the margin or the value of d in the above fig-
We know that the distance from a point to a plane or line is given by :
‖(W^T)P‖ / ‖W‖
So from the +ve hyperplane the distance to π will be :
‖(W^T)X + b ‖ / ‖W‖ = 1 / ‖W‖
This is the the distance from the +ve hyperplane to π , so we need to add the distance from the -ve hyperplane to π also
So margin (d) = 2 / ‖W‖
Now we need to maximize this margin and find the w* and b*( the value of w and b respectively for the maximized margin) provided the fact that all the +ve points are at one side and al the -ve points are at another side.
Mathematically we can write this statement as :
Along with this equation we also keep in mind the constraint i.e. all +ve points will be at one side and all -ve points will be at another side.
We already assumed that:
π+ : (W^T)X + b =+1
π-: (W^T )X + b = -1
Some conclusions from the above two equations:
a) If the support vector is on π+ then for those points
i) y_i= +ve
ii) (W^T)X + b = +ve
So y_i[(W^T)X +b] = +ve
b) If the support vector is on π- then for those points
i) y_i= -ve
ii) (W^T)X + b = -ve
So y_i[(W^T)X +b] = +ve
For the two support vectors , one on π+ and another on π- , the value of y_i[(W^T)X + b] is +ve
Now if we take any other point, suppose ‘a’ . The point ‘a’ for sure has y_i=+1 because its a +ve point and as it is further above the +ve hyperplane the value of (W^T)X + b will be greater than +1. So for the point ‘a’ , the value of y_i[(W^T)X + b] will be greater than +1
Again if we take any other point, suppose ‘b’ . The point ‘b’ for sure has y_i=-1 because its a -ve point and as it is further below the -ve hyperplane the value of (W^T)X + b will be lesser than -1. So for the point ‘b’ , the value of y_i[(W^T)X + b] will be greater than +1
So for all points other than the support vectors , y_i[(W^T)X + b] will be greater than +1 provided the fact that the constraints we mentioned above are satisfied. The constraint optimization problem for SVM can thus be defined as-
Now the limitation with this formulation is that it works if and only if the data is linearly separable. If the data is almost linearly separable then this formulation isn’t going to work. This formulation is called the Hard Margin SVM because we are very concerned about the position of the data points.
To overcome this limitation we have another formulation called the Soft margin SVM which works even if the data is not completely linearly separable.
Top comments (0)