DataTechNotes: Classification Example with RadiusNeighborsClassifier in Python

RadiusNeighborsClassifier is a type of nearest-neighbor classification method and it implements radius-based neighbor classification that learning is based the number of neighbors within a fixed radius.

Nearest-neighbor classification is an instance-based learning method. In this type of learning the algorithm compares the test data with the instances stored in the memory.

In this tutorial, we'll briefly learn how to classify data by using Scikit-learn's RadiusNeighborsClassifier class in Python. The tutorial covers:

Preparing the data
Training the model
Predicting and accuracy check
Iris dataset classification example
Source code listing

We'll start by loading the required libraries and functions.

from sklearn.neighbors import RadiusNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score

Preparing the data

First, we'll generate random classification dataset with make_classification() function. The dataset contains 2 classes with 5 features and the number of samples is 5000.

x, y = make_classification(n_samples=5000, n_features=5, 
                           n_classes=2, n_clusters_per_class=1)

Then, we'll extract 15 percent of dataset as a test data and use all x and y as a training data. Here, we only extract test data from the dataset to do a prediction. Because, there is a possibility that the unseen test data may contain the samples that not covered by the model training scope, so we use all data as a training.

_, xtest, _, ytest=train_test_split(x, y, test_size=0.15)

Training the model

Next, we'll define the classifier by using the RadiusNeighborsClassifier class by its default parameters.

rnc = RadiusNeighborsClassifier()
print(rnc)

RadiusNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                          metric_params=None, n_jobs=None, outlier_label=None,
                          p=2, radius=1.0, weights='uniform')

Then, we'll fit the it with x and y data. After the training the classifier, we'll check the model accuracy score.

rnc.fit(x, y)

score = rnc.score(x, y)
print("Training score: ", score)

Score:  0.9606

Predicting and accuracy check

Now, we can predict the test data by using the trained model. After the prediction, we'll check the accuracy level by using the confusion matrix function.

ypred = rnc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)

[[383  18]
 [ 26 323]]

cr = classification_report(ytest, ypred)
print(cr)

auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)

We can also create a classification report by using classification_report() function on predicted data to check the other accuracy metrics.

cr = classification_report(ytest, ypred)
print(cr)

              precision    recall  f1-score   support

           0       0.94      0.96      0.95       401
           1       0.95      0.93      0.94       349

    accuracy                           0.94       750
   macro avg       0.94      0.94      0.94       750
weighted avg       0.94      0.94      0.94       750

Area Under the Curver (AUC) for predicted data can be seen as below.

auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)

ROC AUC y: 0.9403

Iris dataset classification example

In this part of the tutorial, we'll apply the same method to classify the Iris dataset. First, we'll load the Iris dataset with load_iris() function, extract the x and y parts, then get the test data to predict.

print("Iris dataset classification")

iris = load_iris()
x, y = iris.data, iris.target
_, xtest, _, ytest=train_test_split(x, y, test_size=0.15)

Then, we'll fit the classifier, predict test data, and check the accuracy.

rnc = RadiusNeighborsClassifier()
print(rnc)

rnc.fit(x, y)
score = rnc.score(x, y)
print("Score: ", score)

ypred = nsvc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)

Iris dataset classification

RadiusNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                          metric_params=None, n_jobs=None, outlier_label=None,
                          p=2, radius=1.0, weights='uniform')
Score:  0.9733333333333334
[[11  0  0]
 [ 0  5  0]
 [ 0  1  6]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        11
           1       0.83      1.00      0.91         5
           2       1.00      0.86      0.92         7

    accuracy                           0.96        23
   macro avg       0.94      0.95      0.94        23
weighted avg       0.96      0.96      0.96        23

In this tutorial, we've briefly learned how to classify data by using Scikit-learn's RadiusNeighborsClassifier class in Python. The full source code is listed below.

Source code listing

from sklearn.neighbors import RadiusNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score

x, y = make_classification(n_samples=5000, n_features=5, 
                           n_classes=2, n_clusters_per_class=1)

_, xtest, _, ytest=train_test_split(x, y, test_size=0.15)

rnc = RadiusNeighborsClassifier()
print(rnc)

rnc.fit(x, y)

score = rnc.score(x, y)
print("Training score: ", score)

ypred = rnc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)

auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)

# Iris dataset example
print("Iris dataset classification")
iris = load_iris()
x, y = iris.data, iris.target
_, xtest, _, ytest = train_test_split(x, y, test_size=0.15)

rnc = RadiusNeighborsClassifier()
print(rnc)

rnc.fit(x, y)
score = rnc.score(x, y)
print("Score: ", score)

ypred = nsvc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)

References:

Scikit learn API

DataTechNotes

Pages

Classification Example with RadiusNeighborsClassifier in Python

No comments:

Post a Comment