DataTechNotes: Classification with Learning Vector Quantization in R

   Learning Vector Quantization (LVQ) is a classification algorithm for binary and multiclass problems. LVQ model creates codebook vectors by learning training dataset. Codebook vectors represent class regions. They contain elements that placed around the respective class according to their matching level. If the element matches, it comes closer to the target class, if it does not match, it moves farther from it. With this codebooks, the model classifies new data.
   In this post, we'll learn how to build an LVQ model, and classify data in R. 'class' library provides a required function for this classification. There are modified versions of LVQ function such as lvq1(), olvq1(), lvq2(), and lvq3(). We use olqv1(), optimized LVQ function in this tutorial.
   To split data and check the results we use 'caret' library.

library(class)
library(caret)

Preparing data

First, we'll create a simple dataset for this tutorial.

set.seed(123)
n = 10000
a = sample(1:10, n, replace = T)
b = sample(10:20, n, replace = T)
f = ifelse(a > 5 & b > 10, "red", 
           ifelse(a < 3 | b < 4, "yellow", "green"))

df = data.frame(a = a, b = b, flag = as.factor(f))

head(df)
   a  b   flag
1  3 13  green
2  8 13    red
3  5 19  green
4  9 13    red
5 10 11    red
6  1 13 yellow

Next, we'll split data into the train and test parts.

index = createDataPartition(df$flag, p = .8, list = F)

trainData = df[index, ]
testData = df[-index, ]

Convert train and test data into a matrix type and flag column to factor type.

train = data.matrix(trainData[, c("a","b")])
test = data.matrix(testData[, c("a","b")])
 
train_label = factor(trainData[, "flag"])
test_label = testData$flag

Building a codebook

lvqinit() initialize an LVQ codebook, here we set 100 to the size argument.

codeBook = lvqinit(train, train_label, size = 100)

olvq1() represents the training set in a codebook.

buildCodeBook = olvq1(train, train_label, codeBook)

lvtest() classifies a test data with the above codebook.

Predicting test data

Next, we'll predict test data and check the classification result.

predict = lvqtest(buildCodeBook, test)

To check the results, we use a confusion matrix.

confusionMatrix(test_label, predict)
Confusion Matrix and Statistics

          Reference
Prediction green red yellow
    green    703   0      0
    red        0 896      0
    yellow     0   0    399

Overall Statistics
                                     
               Accuracy : 1          
                 95% CI : (0.9982, 1)
    No Information Rate : 0.4484     
    P-Value [Acc > NIR] : < 2.2e-16  
                                     
                  Kappa : 1          
 Mcnemar's Test P-Value : NA         

Statistics by Class:

                     Class: green Class: red Class: yellow
Sensitivity                1.0000     1.0000        1.0000
Specificity                1.0000     1.0000        1.0000
Pos Pred Value             1.0000     1.0000        1.0000
Neg Pred Value             1.0000     1.0000        1.0000
Prevalence                 0.3519     0.4484        0.1997
Detection Rate             0.3519     0.4484        0.1997
Detection Prevalence       0.3519     0.4484        0.1997
Balanced Accuracy          1.0000     1.0000        1.0000

In this post, we've briefly learned how to classify data with LVQ in R. I hope you have found it useful!
The source code is listed below.


library(class)
library(caret)

set.seed(123)
n = 10000
a = sample(1:10, n, replace = T)
b = sample(10:20, n, replace = T)

f = ifelse(a>5 & b>10, "red", ifelse(a<|b<4, "yellow", "green"))

df = data.frame(a = a, b = b, flag = as.factor(f))

head(df)

index = createDataPartition(df$flag, p = .8, list = F)

trainData = df[index, ]
testData = df[-index, ]

train = data.matrix(trainData[, c("a","b")])
test = data.matrix(testData[, c("a","b")])

train_label = factor(trainData[, "flag"])
test_label = testData$flag

codeBook = lvqinit(train, train_label, size = 100)
buildCodeBook = olvq1(train, train_label, codeBook)

predict = lvqtest(buildCodeBook, test)

confusionMatrix(test_label, predict)

DataTechNotes

Pages

Classification with Learning Vector Quantization in R

No comments:

Post a Comment