Based on support vector machines method, Nu Support Vector Regression (NuSVR) is an algorithm to solve the regression problems. The NuSVR algorithm applies nu parameter by replacing the the epsilon parameter of SVR method. The Scikit-learn explains that the parameter nu is an upper bound on the fraction of training errors and a lower bound of
the fraction of support vectors¹.
In this tutorial, we'll briefly learn how to fit and predict regression data by using
Scikit-learn's NuSVR class in Python. The tutorial
covers:
- Preparing the data
- Training the model
- Predicting and accuracy check
- Boston dataset prediction
- Source code listing
from sklearn.svm import NuSVR from sklearn.datasets import load_boston from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.preprocessing import scale import matplotlib.pyplot as plt
Preparing the data
First,
we'll generate random regression data with make_regression()
function. The dataset contains 10 features and 1000 samples.
To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 15 percent of the samples as test data.
x, y = make_regression(n_samples=1000, n_features=10) print(x[0:2]) print(y[0:2])
[[ 1.01646401 -0.41404149 -0.33426236 -2.31816799 -0.60889924 0.80205365 0.50961324 2.21412708 -0.04765094 -1.29481218] [-0.01471556 -1.22287924 -0.4500027 0.8349292 -1.74252028 -0.71654997 0.58212652 2.1221269 -1.71193889 -0.16591502]] [-289.88769812 -373.7687416 ]
To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 15 percent of the samples as test data.
x = scale(x) y = scale(y)
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)
Training the model
Next, we'll define the regressor by using the NuSVR
class. Here, we can use default parameters of the model.
nsvr = NuSVR() print(nsvr) NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf', max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)
Then, we'll fit the model on train data and check the model accuracy score.
nsvr.fit(xtrain, ytrain) score = nsvr.score(xtrain, ytrain) print("R-squaered:", score)
R-squaered: 0.99581178159984
We can also apply a cross-validation method to the model and check the training accuracy.
cv_score = cross_val_score(nsvr, x, y, cv = 10) print("CV mean score: ", cv_score.mean())
CV mean score: 0.9743050797057672
Predicting and accuracy check
Now, we can predict the test data by using the trained model. We can check the accuracy of predicted data by using MSE and RMSE metrics.
ypred = nsvr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse*(1/2.0)) MSE: 0.01787051983592968 RMSE: 0.00893525991796484
Finally, we'll visualize the original and predicted data in a plot.
x_ax = range(len(ytest)) plt.plot(x_ax, ytest, label="original") plt.plot(x_ax, ypred, label="predicted") plt.title("Test and predicted data") plt.legend() plt.show()
Boston housing dataset prediction
We'll apply the same method we've learned above to the Boston housing price regression dataset. We'll load it by using load_boston() function, scale and split into train and test parts. Then, we'll define model, check accuracy, and predict test data.
print("Boston housing dataset prediction.") boston = load_boston() x, y = boston.data, boston.target x = scale(x) y = scale(y) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15) nsvr = NuSVR() nsvr.fit(xtrain, ytrain) score = nsvr.score(xtrain, ytrain) print("R-squaered:", score) cv_score = cross_val_score(nsvr, x, y, cv=10) print("CV mean score: ", cv_score.mean()) ypred = nsvr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse*(1/2.0))
x_ax = range(len(ytest)) plt.plot(x_ax, ytest, label="original") plt.plot(x_ax, ypred, label="predicted") plt.title("Boston test and predicted data") plt.legend() plt.show()
Boston housing dataset prediction. R-squaered: 0.8829677625633515 CV mean score: 0.5229267100173134 MSE: 0.101282412378955 RMSE: 0.0506412061894775
In this tutorial, we've briefly learned how to fit and predict regression data by using
Scikit-learn API's NuSVR class in Python. The full
source code is listed below.
Source code listing
from sklearn.svm import NuSVR from sklearn.datasets import load_boston from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.preprocessing import scale import matplotlib.pyplot as plt x, y = make_regression(n_samples=1000, n_features=10) print(x[0:2]) print(y[0:2]) x = scale(x) y = scale(y) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15) nsvr = NuSVR() print(nsvr) nsvr.fit(xtrain, ytrain) score = nsvr.score(xtrain, ytrain) print("R-squaered:", score) cv_score = cross_val_score(nsvr, x, y, cv=10) print("CV mean score: ", cv_score.mean()) ypred = nsvr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse*(1/2.0)) x_ax = range(len(ytest)) plt.plot(x_ax, ytest, label="original") plt.plot(x_ax, ypred, label="predicted") plt.title("Test and predicted y data") plt.legend() plt.show() print("Boston housing dataset prediction.") boston = load_boston() x, y = boston.data, boston.target x = scale(x) y = scale(y) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15) nsvr = NuSVR() nsvr.fit(xtrain, ytrain) score = nsvr.score(xtrain, ytrain) print("R-squaered:", score) cv_score = cross_val_score(nsvr, x, y, cv=10) print("CV mean score: ", cv_score.mean()) ypred = nsvr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse*(1/2.0)) x_ax = range(len(ytest)) plt.plot(x_ax, ytest, label="original") plt.plot(x_ax, ypred, label="predicted") plt.title("Boston test and predicted data") plt.legend() plt.show()
References:
No comments:
Post a Comment