In this post, we'll learn how to build an LVQ model, and classify data in R. 'class' library provides a required function for this classification. There are modified versions of LVQ function such as lvq1(), olvq1(), lvq2(), and lvq3(). We use olqv1(), optimized LVQ function in this tutorial.
To split data and check the results we use 'caret' library.
library(class) library(caret)
Preparing data
First, we'll create a simple dataset for this tutorial.
set.seed(123)
n = 10000
a = sample(1:10, n, replace = T)
b = sample(10:20, n, replace = T)
f = ifelse(a > 5 & b > 10, "red",
ifelse(a < 3 | b < 4, "yellow", "green"))
df = data.frame(a = a, b = b, flag = as.factor(f))
head(df)
a b flag
1 3 13 green
2 8 13 red
3 5 19 green
4 9 13 red
5 10 11 red
6 1 13 yellow
Next, we'll split data into the train and test parts.
index = createDataPartition(df$flag, p = .8, list = F) trainData = df[index, ] testData = df[-index, ]
Convert train and test data into a matrix type and flag column to factor type.
train = data.matrix(trainData[, c("a","b")]) test = data.matrix(testData[, c("a","b")]) train_label = factor(trainData[, "flag"]) test_label = testData$flag |
lvqinit() initialize an LVQ codebook, here we set 100 to the size argument.
codeBook = lvqinit(train, train_label, size = 100)
olvq1() represents the training set in a codebook.
buildCodeBook = olvq1(train, train_label, codeBook)
lvtest() classifies a test data with the above codebook.
Predicting test data
Next, we'll predict test data and check the classification result.
predict = lvqtest(buildCodeBook, test)
To check the results, we use a confusion matrix.
confusionMatrix(test_label, predict) Confusion Matrix and Statistics Reference Prediction green red yellow green 703 0 0 red 0 896 0 yellow 0 0 399 Overall Statistics Accuracy : 1 95% CI : (0.9982, 1) No Information Rate : 0.4484 P-Value [Acc > NIR] : < 2.2e-16 Kappa : 1 Mcnemar's Test P-Value : NA Statistics by Class: Class: green Class: red Class: yellow Sensitivity 1.0000 1.0000 1.0000 Specificity 1.0000 1.0000 1.0000 Pos Pred Value 1.0000 1.0000 1.0000 Neg Pred Value 1.0000 1.0000 1.0000 Prevalence 0.3519 0.4484 0.1997 Detection Rate 0.3519 0.4484 0.1997 Detection Prevalence 0.3519 0.4484 0.1997 Balanced Accuracy 1.0000 1.0000 1.0000
In this post, we've briefly learned how to classify data with LVQ in R. I hope you have found it useful!
The source code is listed below.
library(class)
library(caret)
set.seed(123)
n = 10000
a = sample(1:10, n, replace = T)
b = sample(10:20, n, replace = T)
f = ifelse(a>5 & b>10, "red", ifelse(a<|b<4, "yellow", "green"))
df = data.frame(a = a, b = b, flag = as.factor(f))
head(df)
index = createDataPartition(df$flag, p = .8, list = F)
trainData = df[index, ]
testData = df[-index, ]
train = data.matrix(trainData[, c("a","b")])
test = data.matrix(testData[, c("a","b")])
train_label = factor(trainData[, "flag"])
test_label = testData$flag
codeBook = lvqinit(train, train_label, size = 100)
buildCodeBook = olvq1(train, train_label, codeBook)
predict = lvqtest(buildCodeBook, test)
confusionMatrix(test_label, predict)
No comments:
Post a Comment