Dipti Moryani

Posted on Sep 16

Machine Learning Using Support Vector Machines (SVM)

#ai #algorithms #machinelearning

Machine learning has brought forward a wide range of techniques for classification, prediction, and pattern recognition. Among these, Support Vector Machines (SVM) stand out as one of the most powerful and versatile algorithms. SVMs are widely used because they work effectively with both simple and complex datasets, can handle non-linear patterns, and often deliver high accuracy compared to many other models.

At its core, SVM is a data classification method that separates data points into categories using hyperplanes. While the concept may sound technical, it is both intuitive and practical once broken down. This blog explains how SVM works, how to implement it in R, how to tune models for better accuracy, and how SVM compares with other methods like linear regression.

Understanding the Basics of SVM

The main idea behind SVM is to separate data points into different classes using a line, plane, or hyperplane.

In two dimensions, this separator is a straight line.

In three dimensions, it becomes a plane.

For data with higher dimensions, the separating surface is called a hyperplane.

SVM’s uniqueness lies in the fact that it does not simply look for any separating line but instead seeks the most optimal hyperplane—the one that maximizes the distance (called the margin) between the two classes. This makes the classifier robust and less sensitive to noise in the dataset.

A Simple Example

Imagine you have two kinds of data points: red circles and blue squares. On a two-dimensional plot, the red and blue items are positioned in a way that they can be clearly separated.

Any line dividing the two groups will classify the data correctly.

However, SVM asks: Which line will classify future data points most reliably?

If the line is drawn too close to one set of points, even a small shift in data may cause errors. Instead, SVM chooses the line that sits in the middle and is farthest from the nearest points of both groups. These closest points are called support vectors, and the distance between them and the line is the margin.

Maximizing this margin ensures the classifier is not just accurate for the current data but also robust for future, unseen data.

The Mathematics of SVM

To simplify, consider a separating line expressed as:

𝑦

𝑎
𝑥
+
𝑏
y=ax+b

For classification, we create two parallel lines:

𝑎
𝑥
+

𝑏

1
ax+b=1 for one class

𝑎
𝑥
+

𝑏

−
1
ax+b=−1 for the other class

The distance between these two lines represents the margin. Mathematically, this margin is:

𝑚

2
∣
∣
𝑎
∣
∣
m=
∣∣a∣∣
2

The objective of SVM is to maximize this margin, which translates to minimizing
∣
∣
𝑎
∣
∣
2
/
2
∣∣a∣∣
2
/2 under the classification constraints.

While the math may look intimidating, libraries in R and Python handle this optimization efficiently. As data becomes multidimensional, solving this manually becomes impractical, which is where these tools become indispensable.

Implementing SVM in R

To make the concept concrete, let’s look at a simple implementation in R. For this demonstration, we generate sample data with two features, x and y.

Sample data

x = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
y = c(3,4,5,4,8,10,10,11,14,20,23,24,32,34,35,37,42,48,53,60)

Create a dataframe

train = data.frame(x,y)

Visualize

plot(train, pch=16)

This plot shows a linear trend, suggesting that linear regression might also fit well. Let’s compare both models.

Linear Regression in R

Linear regression

model <- lm(y ~ x, train)

Plot regression line

abline(model)

The regression line fits the data fairly well, but is it the best model? To check, we move to SVM.

Applying SVM in R

To use SVM, we install and load the e1071 package:

Install package if needed

install.packages("e1071")

Load the library

library(e1071)

Fit SVM model

model_svm <- svm(y ~ x , train)

Make predictions

pred <- predict(model_svm, train)

Plot predictions

points(train$x, pred, col = "blue", pch=4)

Visually, the SVM predictions follow the actual data points more closely than the regression line. But to be certain, we measure accuracy using Root Mean Square Error (RMSE).

Comparing Model Performance

RMSE for linear regression

error <- model$residuals
lm_error <- sqrt(mean(error^2)) # ~3.83

RMSE for SVM

error_2 <- train$y - pred
svm_error <- sqrt(mean(error_2^2)) # ~2.70

Results:

Linear Regression RMSE: ~3.83

SVM RMSE: ~2.70

Clearly, SVM provides a lower error and better fit. But this is only the beginning—SVM can be tuned further for even greater accuracy.

Tuning SVM Models

SVM performance depends heavily on parameters such as cost and epsilon. The cost parameter controls the trade-off between a smooth decision boundary and classifying training points correctly, while epsilon influences how much deviation is tolerated.

In R, the tune function allows us to systematically test different parameter values using a grid search.

svm_tune <- tune(svm, y ~ x, data = train,
ranges = list(epsilon = seq(0,1,0.01), cost = 2^(2:9)))

print(svm_tune)

The output reveals the best combination of epsilon and cost, along with performance metrics.

Best Model and RMSE

Extract best model

best_mod <- svm_tune$best.model

Predictions

best_mod_pred <- predict(best_mod, train)

RMSE of tuned model

error_best_mod <- train$y - best_mod_pred
best_mod_RMSE <- sqrt(mean(error_best_mod^2)) # ~1.29

Through tuning, RMSE reduces from 2.70 to 1.29—almost half the error rate. This demonstrates the power of fine-tuning SVM models.

Visualization of Tuning Results

Grid search results can also be visualized:

plot(svm_tune)

The plot uses color gradients to show performance across parameter values. Darker regions indicate better performance, helping narrow the range for further tuning.

Linear vs. Non-Linear SVM

So far, we’ve worked with linear SVM because our example dataset had a linear trend. But real-world data is rarely linear. SVM addresses this challenge with kernels, mathematical functions that project data into higher dimensions where it becomes linearly separable.

Common kernels include:

Linear kernel – best for linearly separable data.

Polynomial kernel – useful for capturing polynomial relationships.

Radial Basis Function (RBF) – widely used for non-linear data.

Sigmoid kernel – works like neural networks for certain cases.

By choosing the right kernel, SVM can handle complex datasets such as image recognition, bioinformatics, and sentiment analysis.

Strengths and Limitations of SVM
Strengths

Works well with both linear and non-linear data.

Effective in high-dimensional spaces.

Robust against overfitting when tuned correctly.

Provides good accuracy even with smaller datasets.

Limitations

Computationally expensive with very large datasets.

Requires careful parameter tuning (cost, epsilon, kernel).

Harder to interpret compared to simpler models like logistic regression.

Real-World Applications of SVM

SVM is not just a theoretical concept—it is widely applied across industries:

Text and Sentiment Analysis – Classifying reviews, tweets, or documents as positive, negative, or neutral.

Image Recognition – Used for handwriting recognition, facial recognition, and object classification.

Healthcare – Detecting diseases based on genetic or diagnostic data.

Finance – Identifying fraudulent transactions or predicting stock trends.

Bioinformatics – Classifying proteins and genes into categories.

Summary

Support Vector Machines (SVM) are a powerful classification and regression tool that work well even with irregular, noisy, or non-linear data.

SVM separates data by finding the optimal hyperplane with the maximum margin.

In R, packages like e1071 make it easy to apply SVM on real datasets.

Compared to linear regression, SVM often delivers lower errors and better performance.

Tuning parameters using grid search significantly improves accuracy.

SVM’s versatility, with kernels, allows it to tackle both simple and highly complex problems.

While tuning can be computationally heavy and sometimes tricky, the robustness and accuracy of SVM make it one of the go-to algorithms for machine learning practitioners.

With the basics and implementation covered here, you can now experiment with SVM on different datasets, explore kernel functions, and tune parameters to uncover its full potential.

This article was originally published on Perceptive Analytics.
In Atlanta, our mission is simple — to enable businesses to unlock value in data. For over 20 years, we’ve partnered with more than 100 clients — from Fortune 500 companies to mid-sized firms — helping them solve complex data analytics challenges. As a leading Snowflake Consultants in Atlanta, we turn raw data into strategic insights that drive better decisions.

DEV Community