In this post, we'll learn a simple usage of 'treebag' bagging method for classification problem in R. You may read a help page of each function and other resources if you are interested to know more about them.
We need caret library and iris dataset in this tutorial. We'll start including them into our source code.
library(caret) data(iris)
Preparing data
Next, we'll split iris dataset into a train and test parts.
set.seed(12)
indexes <- createDataPartition(iris$Species, p = .9, list = F) train <- iris[indexes, ] test <- iris[-indexes, ]
A 'bag' function method
The 'bag' function requires a bagControl parameter, and we define it as below.
bagCtrl <- bagControl(fit = ctreeBag$fit, predict = ctreeBag$pred, aggregate = ctreeBag$aggregate)
Fitting a model with 'bag' function.
fit <- bag(Species~., data = train, bagControl = bagCtrl) print(fit) Call: bag.formula(formula = Species ~ ., data = train, bagControl = bagCtrl) B: 10 Training data: 4 variables and 135 samples All variables were used in each model
Finally, we'll predict test data and print the result.
pred <- predict(fit, test) df <- data.frame(predicted = pred, actual = test$Species) print(df) predicted actual 1 setosa setosa 2 setosa setosa 3 setosa setosa 4 setosa setosa 5 setosa setosa 6 versicolor versicolor 7 versicolor versicolor 8 versicolor versicolor 9 versicolor versicolor 10 versicolor versicolor 11 virginica virginica 12 virginica virginica 13 virginica virginica 14 virginica virginica 15 versicolor virginica
Caret 'train' method
Caret 'train' function requires training control parameter, and we define it. Here, we use cross-validation method, and fold number is 5.
trCtrl <- trainControl(method = "cv", number = 5)
Building a model with train function
cr.fit <- train(Species~., data = train, method = "treebag", trControl = trCtrl, metric = "Accuracy") print(cr.fit) Bagged CART 135 samples 4 predictor 3 classes: 'setosa', 'versicolor', 'virginica' No pre-processing Resampling: Cross-Validated (5 fold) Summary of sample sizes: 108, 108, 108, 108, 108 Resampling results: Accuracy Kappa 0.9407407 0.9111111
Predicting test data and printing the result.
cr.pred <- predict(cr.fit, test) cr.df <- data.frame(predicted = cr.pred, actual = test$Species) print(cr.df) predicted actual 1 setosa setosa 2 setosa setosa 3 setosa setosa 4 setosa setosa 5 setosa setosa 6 versicolor versicolor 7 versicolor versicolor 8 versicolor versicolor 9 versicolor versicolor 10 versicolor versicolor 11 virginica virginica 12 virginica virginica 13 virginica virginica 14 virginica virginica 15 versicolor virginica
In this post, we have learned how to use a bag and treebag functions for classification problem in R. I hope you have found this post useful.
No comments:
Post a Comment