In this tutorial, we'll briefly learn how to detect anomaly in a dataset by using the One-class SVM method in Python. The Scikit-learn API provides the OneClassSVM class for this algorithm and we'll use it in this tutorial. The tutorial covers:
- Preparing the data
- Defining the model and prediction
- Anomaly detection with scores
- Source code listing
If you want to know other anomaly detection methods, please check out my A Brief Explanation of 8 Anomaly Detection Methods with Python tutorial.
We'll start by loading the required libraries for this tutorial.
from sklearn.svm import OneClassSVM from sklearn.datasets import make_blobs from numpy import quantile, where, random import matplotlib.pyplot as plt
Preparing the data
We'll create a random sample dataset for this tutorial by using the make_blob() function. We'll check the dataset by visualizing it in a plot.
random.seed(13)
x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(8, 8))
plt.scatter(x[:,0], x[:,1])
plt.show()
Defining the model and prediction
We'll define the model by using the OneClassSVM class of Scikit-learn API. Here, we'll set RBF for kernel type and define the gamma and the 'nu' arguments.
svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.03)
print(svm)
OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma=0.001, kernel='rbf',
max_iter=-1, nu=0.03, shrinking=True, tol=0.001, verbose=False)
We'll fit the model with x dataset and get the prediction data by using the fit() and predict() method.
svm.fit(x)
pred = svm.predict(x)
Next, we'll extract the negative outputs as the outliers.
anom_index = where(pred==-1)
values = x[anom_index]
Finally, we'll visualize the results in a plot by highlighting the anomalies with a color.
plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()
Anomaly detection with scores
We can find anomalies by using their scores. In this method, we'll define the model, fit it on the x data by using the fit_predict() method. We'll calculate the outliers according to the score value of each element.
svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.02)
print(svm)
Next, we'll fit the model on x dataset, then extract the samples score.
pred = svm.fit_predict(x)
scores = svm.score_samples(x)
Next, we'll obtain the threshold value from the scores by using the quantile function. Here, we'll get the lowest 3 percent of score values as the anomalies.
thresh = quantile(scores, 0.03)
print(thresh)
3.994389673293594
Next, we'll extract the anomalies by comparing the threshold value and identify the values of elements.
index = where(scores<=thresh)
values = x[index]
Finally, we can visualize the results in a plot by highlighting the anomalies with a color.
plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()
In this tutorial, we've learned how to detect the anomalies with the One-class SVM method by using the Scikit-learn's OneClassSVM class in Python. We've seen two types of outlier detection methods with OneClassSVM. The full source code is listed below.
Source code listing
from sklearn.svm import OneClassSVM from sklearn.datasets import make_blobs from numpy import quantile, where, random import matplotlib.pyplot as plt random.seed(13) x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(8, 8)) plt.scatter(x[:,0], x[:,1]) plt.show() svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.03) print(svm) svm.fit(x) pred = svm.predict(x) anom_index = where(pred==-1) values = x[anom_index] plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0], values[:,1], color='r') plt.show() svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.02) print(svm) pred = svm.fit_predict(x) scores = svm.score_samples(x) thresh = quantile(scores, 0.03) print(thresh) index = where(scores<=thresh) values = x[index] plt.scatter(x[:,0], x[:,1]) plt.scatter(values[:,0], values[:,1], color='r') plt.show()
References:
No comments:
Post a Comment