RadiusNeighborsClassifier is a type of nearest-neighbor classification method and it implements radius-based neighbor classification that learning is based the number of neighbors within a fixed radius.
Nearest-neighbor classification is an instance-based learning method. In this type of learning the algorithm compares the test data with the instances stored in the memory.
In this tutorial, we'll briefly learn how to classify data by using
Scikit-learn's RadiusNeighborsClassifier class in Python. The tutorial
covers:
- Preparing the data
- Training the model
- Predicting and accuracy check
- Iris dataset classification example
- Source code listing
from sklearn.neighbors import RadiusNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score
Preparing the data
First,
we'll generate random classification dataset with make_classification()
function. The dataset contains 2 classes with 5 features and the
number of samples is 5000.
x, y = make_classification(n_samples=5000, n_features=5,
n_classes=2, n_clusters_per_class=1)
Then, we'll extract 15 percent of dataset as a test data and use all x and y as a training data. Here, we only extract test data from the dataset to do a prediction. Because, there is a possibility that the unseen test data may contain the samples that not covered by the model training scope, so we use all data as a training.
_, xtest, _, ytest=train_test_split(x, y, test_size=0.15)
Training the model
Next,
we'll define the classifier by using the RadiusNeighborsClassifier class by its default parameters.
rnc = RadiusNeighborsClassifier()
print(rnc)
RadiusNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, outlier_label=None,
p=2, radius=1.0, weights='uniform')
Then, we'll fit the it with x and y data. After the training the classifier, we'll check the model accuracy score.
rnc.fit(x, y)
score = rnc.score(x, y)
print("Training score: ", score)
Score: 0.9606
Predicting and accuracy check
Now, we can predict the test data by using the trained model. After the
prediction, we'll check the accuracy level by using the confusion matrix
function.
ypred = rnc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)
[[383 18]
[ 26 323]]
cr = classification_report(ytest, ypred)
print(cr)
auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)
We can also create a classification report by using
classification_report() function on predicted data to check the other
accuracy metrics.
cr = classification_report(ytest, ypred)
print(cr)
precision recall f1-score support
0 0.94 0.96 0.95 401
1 0.95 0.93 0.94 349
accuracy 0.94 750
macro avg 0.94 0.94 0.94 750
weighted avg 0.94 0.94 0.94 750
Area Under the Curver (AUC) for predicted data can be seen as below.
auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)
ROC AUC y: 0.9403
Iris dataset classification example
In this part of the tutorial, we'll apply the same method to classify the Iris dataset. First, we'll load the Iris dataset with load_iris() function, extract the x and y parts, then get the test data to predict.
print("Iris dataset classification")
iris = load_iris()
x, y = iris.data, iris.target
_, xtest, _, ytest=train_test_split(x, y, test_size=0.15)
Then, we'll fit the classifier, predict test data, and check the accuracy.
rnc = RadiusNeighborsClassifier()
print(rnc)
rnc.fit(x, y)
score = rnc.score(x, y)
print("Score: ", score)
ypred = nsvc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)
cr = classification_report(ytest, ypred)
print(cr)
Iris dataset classification
RadiusNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, outlier_label=None,
p=2, radius=1.0, weights='uniform')
Score: 0.9733333333333334
[[11 0 0]
[ 0 5 0]
[ 0 1 6]]
precision recall f1-score support
0 1.00 1.00 1.00 11
1 0.83 1.00 0.91 5
2 1.00 0.86 0.92 7
accuracy 0.96 23
macro avg 0.94 0.95 0.94 23
weighted avg 0.96 0.96 0.96 23
In this tutorial, we've briefly learned how to classify data by using
Scikit-learn's RadiusNeighborsClassifier class in Python. The full source code is listed below.
Source code listing
from sklearn.neighbors import RadiusNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score
x, y = make_classification(n_samples=5000, n_features=5,
n_classes=2, n_clusters_per_class=1)
_, xtest, _, ytest=train_test_split(x, y, test_size=0.15)
rnc = RadiusNeighborsClassifier()
print(rnc)
rnc.fit(x, y)
score = rnc.score(x, y)
print("Training score: ", score)
ypred = rnc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)
cr = classification_report(ytest, ypred)
print(cr)
auc_y = roc_auc_score(ytest, ypred)
print("ROC AUC y: %.4f" % auc_y)
# Iris dataset example
print("Iris dataset classification")
iris = load_iris()
x, y = iris.data, iris.target
_, xtest, _, ytest = train_test_split(x, y, test_size=0.15)
rnc = RadiusNeighborsClassifier()
print(rnc)
rnc.fit(x, y)
score = rnc.score(x, y)
print("Score: ", score)
ypred = nsvc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)
cr = classification_report(ytest, ypred)
print(cr)
References:
No comments:
Post a Comment