Support Vector Machines (SVM) is a widely used supervised learning method and it can be used for regression, classification, anomaly detection problems. The SVM based classier is called the SVC (Support Vector Classifier) and we can use it in classification problems. It uses the C regularization parameter to optimize the margin in hyperplane and it is also called C-SVC.
In this tutorial, we'll briefly learn how to classify data by using
Scikit-learn's SVC class in Python. The tutorial
covers:
- Preparing the data
- Training the model
- Predicting and accuracy check
- Iris dataset classification example
- Source code listing
from sklearn.svm import SVC from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report
Preparing the data
First, we'll generate random classification dataset with make_classification() function. The dataset contains 3 classes with 10 features and the number of samples is 5000.
x, y = make_classification(n_samples=5000, n_features=10, n_classes=3, n_clusters_per_class=1)
Then, we'll split the data into train and test parts. Here, we'll extract 15 percent of it as test data.
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.15)
Training the model
Next, we'll define the classifier by using the SVC class. We can use the default parameters of the class. The parametes can be changed according to classification
data content.
svc = SVC() print(svc) SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
Then, we'll fit the model on train data and check the model accuracy score.
svc.fit(xtrain, ytrain) score = svc.score(xtrain, ytrain) print("Score: ", score) Score: 0.9312941176470588
We can also apply a cross-validation training method to the model and check the training score.
cv_scores = cross_val_score(svc, xtrain, ytrain, cv=10) print("CV average score: %.2f" % cv_scores.mean())
CV average score: 0.92
Predicting and accuracy check
Now, we can predict the test data by using the trained model. After the
prediction, we'll check the accuracy level by using the confusion matrix
function.
ypred = svc.predict(xtest) cm = confusion_matrix(ytest, ypred) print(cm) [[247 7 1] [ 47 191 3] [ 3 3 248]]
We can also create a classification report by using
classification_report() function on predicted data to check the other
accuracy metrics.
cr = classification_report(ytest, ypred) print(cr) precision recall f1-score support 0 0.83 0.97 0.89 255 1 0.95 0.79 0.86 241 2 0.98 0.98 0.98 254 accuracy 0.91 750 macro avg 0.92 0.91 0.91 750 weighted avg 0.92 0.91 0.91 750
Iris dataset classification example
We'll load the Iris dataset with load_iris() function, extract the x and y parts, then split into the train and test parts.
print("Iris dataset classification with SVC")
iris = load_iris() x, y = iris.data, iris.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)
Then, we'll use the same method mentioned the above.
svc = SVC() print(svc) svc.fit(xtrain, ytrain) score = svc.score(xtrain, ytrain) print("Score: ", score) cv_scores = cross_val_score(svc, xtrain, ytrain, cv=10) print("CV average score: %.2f" % cv_scores.mean()) ypred = svc.predict(xtest) cm = confusion_matrix(ytest, ypred) print(cm) cr = classification_report(ytest, ypred) print(cr)
Iris dataset classification with SVC SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) Score: 0.9921259842519685 CV average score: 0.98 [[9 0 0] [0 8 2] [0 1 3]] precision recall f1-score support 0 1.00 1.00 1.00 9 1 0.89 0.80 0.84 10 2 0.60 0.75 0.67 4 accuracy 0.87 23 macro avg 0.83 0.85 0.84 23 weighted avg 0.88 0.87 0.87 23
In this tutorial, we've briefly learned how to classify data by using
Scikit-learn's SVC class in Python. The full source code is listed below.
Source code listing
from sklearn.svm import SVC from sklearn.datasets import load_iris from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report x, y = make_classification(n_samples=5000, n_features=10, n_classes=3, n_clusters_per_class=1) xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15) svc = SVC() print(svc) svc.fit(xtrain, ytrain) score = svc.score(xtrain, ytrain) print("Score: ", score) cv_scores = cross_val_score(svc, xtrain, ytrain, cv=10) print("CV average score: %.2f" % cv_scores.mean()) ypred = svc.predict(xtest) cm = confusion_matrix(ytest, ypred) print(cm) cr = classification_report(ytest, ypred) print(cr) # Iris dataset classification print("Iris dataset classification with SVC") iris = load_iris() x, y = iris.data, iris.target xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15) svc = SVC() print(svc) svc.fit(xtrain, ytrain) score = svc.score(xtrain, ytrain) print("Score: ", score) cv_scores = cross_val_score(svc, xtrain, ytrain, cv=10) print("CV average score: %.2f" % cv_scores.mean()) ypred = svc.predict(xtest) cm = confusion_matrix(ytest, ypred) print(cm) cr = classification_report(ytest, ypred) print(cr)
References:
No comments:
Post a Comment