DataTechNotes: Multi-output Classification Example with MultiOutputClassifier in Python

Scikit-learn API provides a MulitOutputClassifier class that helps to classify multi-output data. In this tutorial, we'll learn how to classify multi-output (multi-label) data with this method in Python. Multi-output data contains more than one y label data for a given X input data. The tutorial covers:

Preparing the data
Defining the model
Predicting and accuracy check
Source code listing

We'll start by loading the required libraries for this tutorial.

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_auc_score
from sklearn.metrics import classification_report
from sklearn.datasets import make_multilabel_classification
from sklearn.svm import SVC
from sklearn.multioutput import MultiOutputClassifier

Preparing the data

We can generate a multi-output data with a make_multilabel_classification function. The target dataset contains 10 features (x), 2 classes (y), and 5000 samples. We'll define them in the parameters of the function.

x, y = make_multilabel_classification(n_samples=5000, n_features=10,

                                      n_classes=2, random_state=0)

The generated data looks as below. There are 10 features and 2 labels in this dataset.

for i in range(10): 
 print(x[i]," => ", y[i])

[ 5. 11.  8.  7.  7.  9.  0.  8.  5.  5.]  =>  [1 1]
[1. 2. 6. 1. 6. 8. 1. 9. 3. 8.]  =>  [0 1]
[8. 3. 7. 6. 4. 7. 0. 4. 7. 6.]  =>  [1 1]
[3. 4. 9. 4. 3. 7. 0. 2. 7. 8.]  =>  [1 1]
[ 8.  7. 10.  8.  7.  4.  1.  4. 10.  9.]  =>  [1 1]
[ 6.  5. 10.  5.  5.  3.  7.  6.  1.  9.]  =>  [0 0]
[ 7.  4. 13.  6.  5.  4.  1.  4.  5. 10.]  =>  [1 1]
[ 5.  2.  3. 14. 10.  4.  2.  0.  6. 12.]  =>  [1 0]
[10.  3.  1.  5.  7.  9.  3.  3.  4.  3.]  =>  [0 0]
[ 5.  4.  9.  5.  8. 10.  0.  8.  3.  9.]  =>  [0 1]

Next, we'll split the data into the train and test parts.

xtrain, xtest, ytrain, ytest=train_test_split(x, y, train_size=0.95, random_state=0)
print(len(xtest))

Defining the model

We'll define the model with the MultiOutputClassifier class of sklearn. As an estimator, we'll implement Support Vector Classifier, SVM with gamma='scale' parameter and then we'll include the estimator into the MultiOutputClassifier class.

svc = SVC(gamma="scale")
model = MultiOutputClassifier(estimator=svc)

We can check the parameters of the model by the print command.

print(model)

MultiOutputClassifier(estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                                    class_weight=None, coef0=0.0,
                                    decision_function_shape='ovr', degree=3,
                                    gamma='scale', kernel='rbf', max_iter=-1,
                                    probability=False, random_state=None,
                                    shrinking=True, tol=0.001, verbose=False),
                      n_jobs=None)

We'll fit the model with training data and check the training accuracy.

model.fit(xtrain, ytrain)
print(model.score(xtrain, ytrain))

0.8688421052631579

Predicting and accuracy check

We'll predict the test data.

yhat = model.predict(xtest)

We'll check the numbers of accuracy metrics for this prediction. Remember, we have two output labels in the ytest and the yhat data, thus we need to use them accordingly.
First, we'll check the area under the ROC with the roc_auc_score function.

auc_y1 = roc_auc_score(ytest[:,0],yhat[:,0])
auc_y2 = roc_auc_score(ytest[:,1],yhat[:,1])

print("ROC AUC y1: %.4f, y2: %.4f" % (auc_y1, auc_y2))

ROC AUC y1: 0.9206, y2: 0.9202

The second method is to check the confusion matrics.

cm_y1 = confusion_matrix(ytest[:,0],yhat[:,0])
cm_y2 = confusion_matrix(ytest[:,1],yhat[:,1])

print(cm_y1)

[[ 80   8]
 [ 11 151]]

print(cm_y2)

[[ 77   9]
 [  9 155]]

Finally, we'll check the classification report with the classification_report function.

cr_y1 = classification_report(ytest[:,0],yhat[:,0])
cr_y1 = classification_report(ytest[:,0],yhat[:,0])

print(cr_y1)

               precision    recall  f1-score   support

           0       0.88      0.91      0.89        88
           1       0.95      0.93      0.94       162

    accuracy                           0.92       250
   macro avg       0.91      0.92      0.92       250
weighted avg       0.92      0.92      0.92       250

print(cr_y2)

               precision    recall  f1-score   support

           0       0.88      0.91      0.89        88
           1       0.95      0.93      0.94       162

    accuracy                           0.92       250
   macro avg       0.91      0.92      0.92       250
weighted avg       0.92      0.92      0.92       250

In this tutorial, we've briefly learned how to classify multi-output data with MultiOutputClassifier in Python.

Source code listing

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_auc_score
from sklearn.metrics import classification_report
from sklearn.datasets import make_multilabel_classification
from sklearn.svm import SVC
from sklearn.multioutput import MultiOutputClassifier

x, y = make_multilabel_classification(n_samples=5000, n_features=10,
                                      n_classes=2, random_state=0)

for i in range(10): 
 print(x[i]," => ", y[i])

xtrain, xtest, ytrain, ytest=train_test_split(x, y, train_size=0.95, random_state=0)
print(len(xtest))

svc = SVC(gamma="scale")
model = MultiOutputClassifier(estimator=svc)
print(model)

model.fit(xtrain, ytrain)
print(model.score(xtrain, ytrain))

yhat = model.predict(xtest)
auc_y1 = roc_auc_score(ytest[:,0],yhat[:,0])
auc_y2 = roc_auc_score(ytest[:,1],yhat[:,1])

print("ROC AUC y1: %.4f, y2: %.4f" % (auc_y1, auc_y2))

cm_y1 = confusion_matrix(ytest[:,0],yhat[:,0])
cm_y2 = confusion_matrix(ytest[:,1],yhat[:,1])

print(cm_y1)
print(cm_y2)

cr_y1 = classification_report(ytest[:,0],yhat[:,0])
cr_y2 = classification_report(ytest[:,1],yhat[:,1])

print(cr_y1)
print(cr_y2)

Reference:

Scikit-learn API

4 comments:

Saurabh AgarwalJuly 13, 2020 at 5:11 AM
A simple and useful article
spacedustpiOctober 9, 2020 at 7:22 AM
Thanks for a to-the-point tutorial. I think you have a typo: "cr_y1 = classification_report(ytest[:,0],yhat[:,0])" is typed twice. I think you meant "cr_y1 = classification_report(ytest[:,0],yhat[:,0])" for the second one.
AnonymousSeptember 4, 2024 at 4:05 AM
hello, how about shap analysis under this multioutput classification?

Pages

Multi-output Classification Example with MultiOutputClassifier in Python

4 comments: