Classification Example with Support Vector Machines in R

   Support Vector Machines (SVM) is a supervised learning method and can be used for regression and classification problems. The SVM algorithm works well in classification problems. The 'e1071' package provides 'svm' function to apply the support vector machines model in R.
   The caret package's train() function can also implement the SVM model. In this tutorial, we'll briefly learn how to implement the SVM algorithm with both 'e1071' and 'caret' methods to classify Iris dataset in R. The tutorial covers:
  1. Preparing data
  2. The 'e1071' method
  3. The caret's train method
  4. Source code listing
   We'll start by loading the required libraries for this tutorial.

library(e1071)
library(caret)


Preparing data

   We'll use the Iris dataset in this tutorial. First, we'll prepare it by splitting it into the train and test parts.

data(iris)
set.seed(123)
 
indexes = createDataPartition(iris$Species, p = .9, list = F)
train = iris[indexes, ]
test = iris[-indexes, ]


The 'e1071' method

The 'e1071' package provides svm() function and we'll define the model by using it. We'll include the train data into the function.

model_svm = svm(Species~., data=train)
print(model_svm)

Call:
svm(formula = Species ~ ., data = train)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.25 

Number of Support Vectors:  49

Now, we can predict test data with the fitted model.

pred = predict(model_svm, test)

Finally, we'll predict the test data and check the accuracy with the confusion matrix.

cm = confusionMatrix(test$Species, pred)
print(cm)
Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa          5          0         0
  versicolor      0          5         0
  virginica       0          0         5

Overall Statistics
                                    
               Accuracy : 1         
                 95% CI : (0.782, 1)
    No Information Rate : 0.3333    
    P-Value [Acc > NIR] : 6.969e-08 
                                    
                  Kappa : 1         
 Mcnemar's Test P-Value : NA        

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            1.0000           1.0000
Specificity                 1.0000            1.0000           1.0000
Pos Pred Value              1.0000            1.0000           1.0000
Neg Pred Value              1.0000            1.0000           1.0000
Prevalence                  0.3333            0.3333           0.3333
Detection Rate              0.3333            0.3333           0.3333
Detection Prevalence        0.3333            0.3333           0.3333
Balanced Accuracy           1.0000            1.0000           1.0000


The caret's train method

In this method, we'll use the caret's train() function. We'll define the 'svmRadial' method in a method.

model = train(Species~., data=train, method="svmRadial")
print(model)
Support Vector Machines with Radial Basis Function Kernel 

135 samples
  4 predictor
  3 classes: 'setosa', 'versicolor', 'virginica' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
Resampling results across tuning parameters:

  C     Accuracy   Kappa    
  0.25  0.9381422  0.9060675
  0.50  0.9415891  0.9112876
  1.00  0.9479245  0.9208977

Tuning parameter 'sigma' was held constant at a value of 0.5648255
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.5648255 and C = 1.

pred = predict(model, test)
cm = confusionMatrix(test$Species, pred)
print(cm)
Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa          5          0         0
  versicolor      0          5         0
  virginica       0          0         5

Overall Statistics
                                    
               Accuracy : 1         
                 95% CI : (0.782, 1)
    No Information Rate : 0.3333    
    P-Value [Acc > NIR] : 6.969e-08 
                                    
                  Kappa : 1         
 Mcnemar's Test P-Value : NA        

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            1.0000           1.0000
Specificity                 1.0000            1.0000           1.0000
Pos Pred Value              1.0000            1.0000           1.0000
Neg Pred Value              1.0000            1.0000           1.0000
Prevalence                  0.3333            0.3333           0.3333
Detection Rate              0.3333            0.3333           0.3333
Detection Prevalence        0.3333            0.3333           0.3333
Balanced Accuracy           1.0000            1.0000           1.0000

  In this tutorial, we've briefly learned how to use the 'e1071' package's svm function to classify data in R. The full source code is listed below.


Source code listing

library(e1071)
library(caret)
 
# Classification example
data(iris)
set.seed(123)

indexes = createDataPartition(iris$Species, p = .9, list = F)
train = iris[indexes, ]
test = iris[-indexes, ]
 
model_svm = svm(Species~., data=train)
print(model_svm)
 
pred = predict(model_svm, test)

# accuracy check 
cm = confusionMatrix(test$Species, pred)
print(cm)  
 
# caret train method 
model = train(Species~., data=train, method="svmRadial")
print(model) 
 
pred = predict(model, test)
cm = confusionMatrix(test$Species, pred)
print(cm)  


No comments:

Post a Comment