Dipti M

Posted on Sep 15

SVR in R: Build Accurate Regression Models Step by Step

#beginners #programming #tutorial #webdev

Regression is one of the most widely used techniques in statistics and machine learning. It helps us understand how variables relate to each other and, more importantly, allows us to make predictions. While simple methods like Simple Linear Regression (SLR) work well when relationships are strictly linear, real-world data often contains noise, outliers, and non-linear patterns. This is where Support Vector Regression (SVR) comes in.

SVR builds upon the powerful concept of Support Vector Machines (SVM) — a popular classification method — and extends it to predict continuous values. In this article, we will:

Review the basics of Simple Linear Regression (SLR) and its limitations.

Introduce Support Vector Regression (SVR) and its advantages.

Implement both SLR and SVR in R and compare results.

Learn how to tune SVR parameters for better accuracy.

Explore real-world applications of SVR in business and research.

By the end, you will have a solid grasp of how SVR works and why it can be a game-changer compared to traditional regression models.

Simple Linear Regression (SLR) – A Starting Point

Simple Linear Regression models the relationship between one independent variable (X) and one dependent variable (Y). The goal is to fit a straight line that best predicts Y from X. The general equation is:

𝑌

𝛼
+
𝛽
𝑋
+
𝜖
Y=α+βX+ϵ

Where:

𝑌
Y = dependent variable

𝑋
X = independent variable

𝛼
α = intercept (value of Y when X = 0)

𝛽
β = slope (change in Y for a unit change in X)

𝜖
ϵ = error term

The line is fitted using Ordinary Least Squares (OLS), which minimizes the sum of squared differences between the predicted values (
𝑌
^
Y
^
) and the actual observed values (Y).

Implementing SLR in R

Let’s work with a simple dataset SVM.csv that has two variables: X and Y.

Load dataset

data = read.csv("SVM.csv", header = TRUE)
head(data)

Scatter plot of data

plot(data, main = "Scatter Plot of X vs Y")

Fit linear regression model

model = lm(Y ~ X, data)

Overlay regression line

abline(model, col = "blue")

The scatter plot might show a negative relationship between X and Y, and the blue line represents the best-fit line.

Limitations of SLR

At first glance, this seems effective. But when data contains non-linear patterns, SLR struggles. Some challenges include:

Inflexibility: It can only capture straight-line relationships.

Sensitivity to outliers: Extreme values can heavily skew the regression line.

Assumptions: SLR assumes normal distribution of errors and constant variance — often violated in practice.

To evaluate model performance, we calculate Root Mean Square Error (RMSE):

𝑅
𝑀
𝑆

𝐸

1
𝑛
∑
(
𝑌
𝑖
−
𝑌
^
𝑖
)
2
RMSE=
n
1

∑(Y
i

−
Y
^
i

)
2

Lower RMSE indicates better predictive accuracy.

Predict values using linear model

predY = predict(model, data)

Install and load hydroGOF for RMSE

install.packages("hydroGOF")
library(hydroGOF)

Calculate RMSE

rmse(predY, data$Y)

Suppose RMSE = 0.94. This is decent, but far from optimal if the relationship is non-linear.

2. Support Vector Regression (SVR) – A Smarter Alternative

Support Vector Regression (SVR) adapts the principles of SVM to regression tasks. Instead of finding a line that minimizes squared error, SVR tries to fit a function that keeps predictions within a certain margin of tolerance (epsilon) while being as flat as possible.

In other words:

Errors smaller than a threshold (
𝜖
ϵ) are ignored.

Errors larger than
𝜖
ϵ are penalized.

A cost parameter (C) controls how much we penalize these errors.

This makes SVR more robust, flexible, and capable of modeling non-linear relationships.

Kernel Functions – The Secret Sauce

SVR uses kernel functions to transform data into higher dimensions where non-linear relationships become linear. Common kernels include:

Linear kernel – good for simple linear data.

Polynomial kernel – captures polynomial relationships.

Radial Basis Function (RBF) kernel – handles complex non-linearities.

Sigmoid kernel – similar to neural networks.

The RBF kernel is most commonly used as a default.

3. Implementing SVR in R

We use the e1071 package to run SVR:

Install and load SVR package

install.packages("e1071")
library(e1071)

Fit SVR model

modelsvm = svm(Y ~ X, data)

Predict using SVR

predYsvm = predict(modelsvm, data)

Plot predictions

plot(data, main = "SVR vs Actual")
points(data$X, predYsvm, col = "red", pch = 16)

Here, red points represent SVR predictions. They typically lie much closer to actual values compared to SLR predictions.

RMSE for SVR

rmse(predYsvm, data$Y)

Suppose RMSE = 0.43, which is much lower than 0.94 for SLR.

4. Tuning the SVR Model

The default SVR model is good, but we can tune parameters for even better results. The two main parameters are:

Epsilon (
𝜖
ϵ) – maximum tolerated error before penalization.

Cost (C) – penalty for errors beyond epsilon.

We can tune them using tune():

OptModelsvm = tune(svm, Y ~ X, data = data,
ranges = list(epsilon = seq(0, 1, 0.1),
cost = 1:100))

Best model

BstModel = OptModelsvm$best.model

Predictions

PredYBst = predict(BstModel, data)

RMSE

rmse(PredYBst, data$Y)

The tuned model may reduce RMSE further, say to 0.27 — a dramatic improvement over both SLR (0.94) and basic SVR (0.43).

5. Comparing Models

Model RMSE (lower is better) Fit to Data
SLR 0.94 Poor for non-linear data
Basic SVR 0.43 Captures non-linearity well
Tuned SVR 0.27 Best fit, optimized parameters

Clearly, SVR (especially tuned SVR) outperforms SLR in predictive accuracy.

6. Practical Applications of SVR

SVR isn’t just a theoretical improvement — it’s used in many industries:

Finance: Forecasting stock prices, credit risk modeling.

Marketing: Predicting customer churn, sales forecasts.

Healthcare: Disease progression modeling, drug effectiveness prediction.

Manufacturing: Predictive maintenance, quality control.

Energy & IoT: Power consumption prediction, anomaly detection in sensor data.

Whenever data relationships are non-linear and noisy, SVR is a strong choice.

7. Best Practices for Using SVR in R

Start simple: Begin with SLR to establish a baseline.

Use default RBF kernel: It works well in most cases.

Tune parameters: Adjust epsilon and cost for better performance.

Scale data: SVR is sensitive to feature scales. Normalize or standardize variables before training.

Validate results: Always use RMSE, cross-validation, or hold-out sets to check accuracy.

Conclusion

We started with Simple Linear Regression (SLR) — a straightforward but limited model that struggles with non-linearity. We then moved to Support Vector Regression (SVR), which extends SVM principles to regression and handles complex data far more effectively.

Through hands-on implementation in R, we saw how:

SLR gave an RMSE of 0.94.
SVR reduced RMSE to 0.43.
Tuned SVR achieved an even better RMSE of 0.27.

The key takeaways are:

SVR is flexible, non-parametric, and robust against non-linearities.
Proper kernel selection and tuning significantly improve results.
SVR has wide-ranging applications across industries.

In modern data science, relying only on linear regression can lead to misleading insights. By adopting Support Vector Regression, analysts and businesses can unlock more accurate predictions, smarter decisions, and greater competitive advantage.

Drive smarter business outcomes with Power BI consulting, streamline reporting with tailored tableau development services, and scale insights effectively through expert Tableau Consulting Services.

DEV Community