Linearly changing of classification errors instead of exponentially improves the model accuracy and decreases its vulnerability to noises in data.
In this post, we'll learn how to classify data with the 'LogitBoost' function in R. The 'caTools' package provides LogitBoost function. The tutorial covers:
- Preparing the data
- Fitting the model and prediction
- Source code listing
library(caTools) library(caret)
Preparing the data
We'll use the 'iris' dataset as a target classification dataset in this tutorial. First, we'll load it and split it into the train and test parts.
data("iris") set.seed(123) indexes = createDataPartition(iris$Species, p = .9, list = F) train = iris[indexes, ] test = iris[-indexes, ]
Next, we'll separate x input and y label parts for the train and test data. Here, column 5 is a Y label data in the Iris dataset.
xtrain = train[, -5] ytrain = train[, 5] xtest = test[, -5] ytest = test[, 5]
Fitting the model and prediction
Next, we'll define the model and fit it with the train data. Here nIter defines the iteration number.
logBoost = LogitBoost(xtrain, ytrain, nIter=50) print(logBoost)
You can check the fitted model by using the print command.
Now, we can predict the test data with the trained model.
yhat = predict(logBoost, xtest)
Next, we'll check the prediction accuracy with the confusion matrix function.
cm = confusionMatrix(ytest, yhat) print(cm)
Confusion Matrix and Statistics
Reference
Prediction setosa versicolor virginica
setosa 5 0 0
versicolor 0 5 0
virginica 0 0 5
Overall Statistics
Accuracy : 1
95% CI : (0.782, 1)
No Information Rate : 0.3333
P-Value [Acc > NIR] : 6.969e-08
Kappa : 1
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: setosa Class: versicolor Class: virginica
Sensitivity 1.0000 1.0000 1.0000
Specificity 1.0000 1.0000 1.0000
Pos Pred Value 1.0000 1.0000 1.0000
Neg Pred Value 1.0000 1.0000 1.0000
Prevalence 0.3333 0.3333 0.3333
Detection Rate 0.3333 0.3333 0.3333
Detection Prevalence 0.3333 0.3333 0.3333
Balanced Accuracy 1.0000 1.0000 1.0000
In this tutorial, we've briefly learned how to classify data with the LogitBoosting function in R. The full source code is listed below.
Source code listing
library(caTools)
library(caret)
data("iris") set.seed(123) indexes = createDataPartition(iris$Species, p = .9, list = F) train = iris[indexes, ] test = iris[-indexes, ]
xtrain = train[, -5] ytrain = train[, 5] xtest = test[, -5] ytest = test[, 5]
logBoost = LogitBoost(xtrain, ytrain, nIter=50)
print(logBoost)
yhat = predict(logBoost, xtest)
cm = confusionMatrix(ytest, yhat)
print(cm)
References for reading:
1. http://www.cis.upenn.edu/~mkearns/teaching/COLT/schapire.pdf
2. http://stat.ethz.ch/~dettling/boosting.html
No comments:
Post a Comment