In this post, we'll briefly learn how to classify the Iris dataset with the 'neuralnet' package in R. The tutorial covers:
- Preparing the data
- Defining the model
- Prediction and accuracy check
- Source code listing
library(neuralnet)
library(caret)
Preparing the data
We'll load the Iris dataset and check the content of it.
data("iris")
str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Next, we'll split the dataset into the train and test parts.
set.seed(123)
indexes=createDataPartition(iris$Species, p=.85, list = F)
train = iris[indexes, ]
test = iris[-indexes, ]
We'll extract x and y label parts from the test data to compare the predicted results later. Here column 5 is Species that is a label data.
xtest = test[, -5]
ytest = test[, 5]
Defining the model
We'll define the model with the 'neuralnet' function and fit it on train data. We'll train the function with the below parameters.
nnet=neuralnet(Species~., train, hidden = c(4,3), linear.output = FALSE)
The hidden parameter in the neuralnet function specifies the number of hidden neurons in each layer. You can change the inputs. However, it takes time to train the model with a large number of neurons.
We can plot the fitted model and check what is inside the model.
plot(nnet)
Prediction and accuracy check
Next, we'll predict the test data with the model.
ypred = neuralnet::compute(nnet, xtest)
yhat = ypred$net.result
print(yhat)
[,1] [,2] [,3]
3 1.000000e+00 1.345375e-14 1.346945e-64
17 1.000000e+00 1.342452e-14 1.349464e-64
25 1.000000e+00 1.345269e-14 1.347036e-64
32 1.000000e+00 1.342934e-14 1.349048e-64
36 1.000000e+00 1.345266e-14 1.347038e-64
69 3.233607e-23 1.000000e+00 9.862038e-11
70 2.575823e-17 1.000000e+00 1.687434e-20
71 2.171171e-35 4.022875e-05 9.998378e-01
82 2.499568e-17 1.000000e+00 1.040542e-20
96 2.635722e-17 1.000000e+00 4.848410e-20
101 1.622106e-55 8.186289e-23 1.000000e+00
107 5.660145e-48 5.500966e-14 1.000000e+00
122 1.924033e-53 6.920406e-20 1.000000e+00
125 1.017510e-54 2.793636e-21 1.000000e+00
129 2.989258e-55 5.776680e-22 1.000000e+00
The prediction result shows the probability of each class. We need the extract the class with the highest prediction values as the predicted result.
yhat=data.frame("yhat"=ifelse(max.col(yhat[ ,1:3])==1, "setosa",
ifelse(max.col(yhat[ ,1:3])==2, "versicolor", "virginica")))
Finally, we'll check the prediction accuracy with the confusion matrix function.
cm = confusionMatrix(as.factor(ytest), yhat$yhat)
print(cm)
Confusion Matrix and Statistics
Reference
Prediction setosa versicolor virginica
setosa 5 0 0
versicolor 0 4 1
virginica 0 0 5
Overall Statistics
Accuracy : 0.9333
95% CI : (0.6805, 0.9983)
No Information Rate : 0.4
P-Value [Acc > NIR] : 2.523e-05
Kappa : 0.9
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: setosa Class: versicolor Class: virginica
Sensitivity 1.0000 1.0000 0.8333
Specificity 1.0000 0.9091 1.0000
Pos Pred Value 1.0000 0.8000 1.0000
Neg Pred Value 1.0000 1.0000 0.9000
Prevalence 0.3333 0.2667 0.4000
Detection Rate 0.3333 0.2667 0.3333
Detection Prevalence 0.3333 0.3333 0.3333
Balanced Accuracy 1.0000 0.9545 0.9167
In this tutorial, we've briefly learned how to classify data with 'neuralnet' in R. The full source code is listed below.
Source code listing
library(neuralnet)
library(caret)
data("iris")
str(iris)
set.seed(123)
indexes=createDataPartition(iris$Species, p=.85, list = F)
train = iris[indexes, ]
test = iris[-indexes, ]
xtest = test[, -5]
ytest = test[, 5]
nnet=neuralnet(Species~., train, hidden = c(4,3), linear.output = FALSE)
plot(nnet)
ypred = neuralnet::compute(nnet, xtest)
yhat = ypred$net.result
print(yhat)
yhat=data.frame("yhat"=ifelse(max.col(yhat[ ,1:3])==1, "setosa",
ifelse(max.col(yhat[ ,1:3])==2, "versicolor", "virginica")))
cm=confusionMatrix(as.factor(ytest), yhat$yhat)
print(cm)
Getting this error in the confusion matrix Error: `data` and `reference` should be factors with the same levels
ReplyDeletecm=confusionMatrix(as.factor(ytest), as.factor(yhat$yhat))
DeleteHow do you know which probability column represents which class?
ReplyDeleteIts depend on the your order of output variable formula
DeleteSir, when we will do 10 fold cross validation then how we will give this for final
ReplyDeleteHow can we make prediction of classification of future observations that havent yet been sen? prediction of classification of a new dataset?
ReplyDelete