Based on support vector machines method, the Linear SVR is an algorithm to solve the regression problems. The Linear SVR
algorithm applies linear kernel method and it works well with large datasets. L1 or L2 method can be specified as a loss function in this model.
In this tutorial, we'll briefly learn how to fit and predict regression data by using
Scikit-learn's LinearSVR class in Python. The tutorial
covers:
- Preparing the data
- Training the model
- Predicting and accuracy check
- Boston dataset prediction
- Source code listing
from sklearn.svm import LinearSVR from sklearn.datasets import load_boston from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.preprocessing import scale import matplotlib.pyplot as plt
Preparing the data
First,
we'll generate random regression data with make_regression()
function. The dataset contains 10 features and 1000 samples.
To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 15 percent of the samples as test data.
x, y = make_regression(n_samples=1000, n_features=10) print(x[0:2]) print(y[0:2])
[[ 0.07940349 -0.62826076 1.35829589 -0.94757278 0.4330519 0.06052787
-0.59091938 0.14826325 -0.76850621 -0.84848105]
[-0.2728921 -0.63341441 -0.86528475 0.56128328 -0.34668921 1.30640379
-0.18253121 -0.05468702 0.41798946 0.30962429]]
[-131.66928697 -38.6226293 ]
To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 15 percent of the samples as test data.
x = scale(x) y = scale(y)
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)
Training the model
Next, we'll define the regressor model by using the LinearSVR
class. Here, we can use default parameters of the LinearSVR class.
lsvr = LinearSVR(verbose=0, dual=True) print(lsvr) LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,
random_state=None, tol=0.0001, verbose=0)
Then, we'll fit the model on train data and check the model accuracy score.
lsvr.fit(xtrain, ytrain) score = lsvr.score(xtrain, ytrain) print("R-squared:", score)
R-squared: 1.0
We can also apply a cross-validation method to the model and check the training accuracy.
cv_score = cross_val_score(lsvr, x, y, cv = 10) print("CV mean score: ", cv_score.mean())
CV mean score: 1.0
Predicting and accuracy check
Now, we can predict the test data by using the trained model. We can
check the accuracy of predicted data by using MSE and RMSE metrics.
ypred = nsvr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse**(1/2.0)) MSE: 0.01787051983592968 RMSE: 0.00893525991796484
Finally, we'll visualize the original and predicted data in a plot.
x_ax = range(len(ytest)) plt.plot(x_ax, ytest, label="original") plt.plot(x_ax, ypred, label="predicted") plt.title("Test and predicted data") plt.legend() plt.show()
Boston housing dataset prediction
We'll
apply the same method we've learned above to the Boston housing price
regression dataset. We'll load it by using load_boston() function, scale
and split into train and test parts. Then, we'll define model, check
accuracy, and predict test data.
print("Boston housing dataset prediction.") boston = load_boston() x, y = boston.data, boston.target x = scale(x) y = scale(y) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15) lsvr = LinearSVR(verbose=0) lsvr.fit(xtrain, ytrain) score = lsvr.score(xtrain, ytrain) print("R-squared:", score) cv_score = cross_val_score(lsvr, x, y, cv=10) print("CV mean score: ", cv_score.mean()) ypred = lsvr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse**(1/2.0))
x_ax = range(len(ytest)) plt.plot(x_ax, ytest, label="original") plt.plot(x_ax, ypred, label="predicted") plt.title("Boston test and predicted data") plt.legend() plt.show()
Boston housing dataset prediction. R-squared: 0.6938345064487695 CV mean score: 0.2838069239279085
MSE: 0.2388146523953546
RMSE: 0.1194073261976773
In this tutorial, we've briefly learned how to fit and predict regression data by using
Scikit-learn API's LinearSVR class in Python. The full
source code is listed below.
Source code listing
from sklearn.svm import LinearSVR from sklearn.datasets import load_boston from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.preprocessing import scale import matplotlib.pyplot as plt x, y = make_regression(n_samples=1000, n_features=30) print(x[0:2]) print(y[0:2]) x = scale(x) y = scale(y) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15) lsvr = LinearSVR() print(lsvr) lsvr.fit(xtrain, ytrain) score = lsvr.score(xtrain, ytrain) print("R-squared:", score) cv_score = cross_val_score(lsvr, x, y, cv=10) print("CV mean score: ", cv_score.mean()) ypred = lsvr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse**(1/2.0)) x_ax = range(len(ytest)) plt.plot(x_ax, ytest, linewidth=1, label="original") plt.plot(x_ax, ypred, linewidth=1.1, label="predicted") plt.title("y-test and y-predicted data") plt.legend() plt.show() print("Boston housing dataset prediction.") boston = load_boston() x, y = boston.data, boston.target x = scale(x) y = scale(y) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15) lsvr = LinearSVR() lsvr.fit(xtrain, ytrain) score = lsvr.score(xtrain, ytrain) print("R-squared:", score) cv_score = cross_val_score(lsvr, x, y, cv=10) print("CV mean score: ", cv_score.mean()) ypred = lsvr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse**(1/2.0)) x_ax = range(len(ytest)) plt.plot(x_ax, ytest, label="original") plt.plot(x_ax, ypred, label="predicted") plt.title("Boston test and predicted data") plt.legend() plt.show()
References:
No comments:
Post a Comment