- Preparing the data
- Fitting the model and predicting test data
- Accuracy checking
- Source code listing
library(e1071)
library(caret)
Preparing the data
We'll use the Boston housing price dataset as a target regression data in this tutorial. We'll prepare data by splitting it into the train and test parts.
boston = MASS::Boston
set.seed(123)
indexes = createDataPartition(boston$medv, p = .9, list = F)
train = boston[indexes, ]
test = boston[-indexes, ]
Fitting the model and predicting test data
Train and test data are ready. Now, we can define the svm model with default parameters and fit it with train data. Here, we can change the kernel type into 'linear', 'polynomial', and 'sigmoid' for training and predicting. The default is a 'radial' kernel.
model_reg = svm(medv~., data=train)
print(model_reg)
Call:
svm(formula = medv ~ ., data = train)
Parameters:
SVM-Type: eps-regression
SVM-Kernel: radial
cost: 1
gamma: 0.07692308
epsilon: 0.1
Number of Support Vectors: 306
Next, we'll predict the test data and plot the results to compare visually.
pred = predict(model_reg, test)
x = 1:length(test$medv)
plot(x, test$medv, pch=18, col="red")
lines(x, pred, lwd="1", col="blue")
Accuracy checking
Finally, we'll check the prediction accuracy with the MSE, MAE, RMSE, and R-squared metrics.
mse = MSE(test$medv, pred)
mae = MAE(test$medv, pred)
rmse = RMSE(test$medv, pred)
r2 = R2(test$medv, pred, form = "traditional")
cat(" MAE:", mae, "\n", "MSE:", mse, "\n",
"RMSE:", rmse, "\n", "R-squared:", r2)
MAE: 1.877403
MSE: 6.028015
RMSE: 2.455202
R-squared: 0.914078
In this tutorial, we have briefly learned how to use an 'e1071' package's svm function for the regression problem. Thank you for reading and the full source code is listed below.
Source code listing
library(e1071)
library(caret)
# Regression example
boston = MASS::Boston
set.seed(123)
indexes = createDataPartition(boston$medv, p = .9, list = F)
train = boston[indexes, ]
test = boston[-indexes, ]
model_reg = svm(medv~., data=train)
print(model_reg)
pred = predict(model_reg, test)
x=1:length(test$medv)
plot(x, test$medv, pch=18, col="red")
lines(x, pred, lwd="1", col="blue")
# accuracy check
mse = MSE(test$medv, pred)
mae = MAE(test$medv, pred)
rmse = RMSE(test$medv, pred)
r2 = R2(test$medv, pred, form = "traditional")
cat(" MAE:", mae, "\n", "MSE:", mse, "\n",
"RMSE:", rmse, "\n", "R-squared:", r2)
http://www.analyticspath.com
ReplyDeleteThis information you have shared is really a lot helpful. Was searching for this info from a while. Looking forward for further such interesting postings from you
I am copying this program and it is working, the seed is correct and I also tried changing it, but my image with the regression line is different, and the R-squared is <0.8. The number of Support vectors is 302. Gamma and Epsilon are the same (since I copied the Source code). Maybe the dataset is different in 2021? Doesn't look like it from the graph. There's a significant difference between Rsquared <0.8 and yours >0.9
ReplyDeleteme too get R2= .79 why?
ReplyDeleteI have been working with SVR, and I just learned (and figured I would share) the R-squared is not an appropriate performance metric for goodness-of-fit for nonlinear models like SVR.
ReplyDeleteHere is a blurb explaining why: https://statisticsbyjim.com/regression/r-squared-invalid-nonlinear-regression/
And here is the suggestion for using a pesudo R-squared that compares the SVR model to just an intercept model: https://towardsdatascience.com/the-complete-guide-to-r-squared-adjusted-r-squared-and-pseudo-r-squared-4136650fc06c
I am still using RMSE, MAE, and MAPE for my other performance metrics, though. They appear to work fine for nonlinear models.