In this tutorial, we'll briefly learn how to fit and predict regression data by using ARDRegression class in Python. We'll apply the model for a randomly generated
regression data and Boston housing price dataset to check the performance. The
tutorial
covers:
- Preparing the data
- Training the model
- Predicting and accuracy check
- Boston housing dataset prediction
- Source code listing
from sklearn.linear_model import ARDRegression from sklearn.datasets import load_boston from sklearn.datasets import make_regression from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split from sklearn.preprocessing import scale import matplotlib.pyplot as plt from sklearn import set_config
Preparing the data
First,
we'll generate random regression data with make_regression()
function. The dataset contains 10 features and 5000 samples.
To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 10 percent of the samples as test data.
x, y = make_regression(n_samples=5000, n_features=10)
print(x[0:2])
print(y[0:2])
[[ 1.773 2.534 0.693 -1.11 1.492 0.631 -0.577 0.085 -1.308 1.024]
[ 1.953 -1.362 1.294 1.025 0.463 -0.485 -1.849 1.858 0.483 -0.52 ]]
[120.105 262.69 ]
To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 10 percent of the samples as test data.
x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.10)
Training the model
Next, we'll define the regressor model by using the ARDRegression
class. Here, we can use default parameters of the ARDRegression class. The default values can be seen in below.
set_config(print_changed_only=False)
ardr = ARDRegression()
print(ardr)
ARDRegression(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True,
fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,
normalize=False, threshold_lambda=10000.0, tol=0.001,
verbose=False)
Then, we'll fit the model on train data and check the model accuracy score.
dtr.fit(xtrain, ytrain)
ardr.fit(xtrain, ytrain)
score = ardr.score(xtrain, ytrain)
print("R-squared:", score)
R-squared: 1.0
Predicting and accuracy check
Now, we can predict the test data by using the trained model. We can
check the accuracy of predicted data by using MSE and RMSE metrics.
ypred = ardr.predict(xtest) mse = mean_squared_error(ytest, ypred) print("MSE: ", mse) print("RMSE: ", mse**(1/2.0))
MSE: 1.0459020366671401e-22
RMSE: 5.2295101833357005e-23
Finally, we'll visualize the original and predicted data in a plot.
x_ax = range(len(ytest))
plt.plot(x_ax, ytest, linewidth=1, label="original")
plt.plot(x_ax, ypred, linewidth=1.1, label="predicted")
plt.title("y-test and y-predicted data")
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend(loc='best',fancybox=True, shadow=True)
plt.grid(True)
plt.show()
Running the above code provides a plot that shows the the original and predicted test data.
Boston housing dataset prediction
We'll
apply the same method we've learned above to the Boston housing price
regression dataset. We'll load it by using load_boston() function, scale
and split into the train and test parts. Then, we'll define model by
changing some of the parameter values, check training accuracy, and
predict test data.
print("Boston housing dataset prediction.")
boston = load_boston()
x, y = boston.data, boston.target
x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)
ardr = ARDRegression()
ardr.fit(xtrain, ytrain)
score = ardr.score(xtrain, ytrain)
print("R-squared:", score)
ypred = ardr.predict(xtest)
mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))
x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend(loc='best',fancybox=True, shadow=True)
plt.grid(True)
plt.show()
Boston housing dataset prediction.
R-squared: 0.730951555514822
MSE: 0.1362112343271604
RMSE: 0.0681056171635802
In this tutorial, we've briefly learned how to fit and predict regression data by using
Scikit-learn API's ARDRegression class in Python. The full
source code is listed below.
Source code listing
from sklearn.linear_model import ARDRegression
from sklearn.datasets import load_boston
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt
from sklearn import set_config
x, y = make_regression(n_samples=5000, n_features=10)
print(x[0:2])
print(y[0:2])
x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.10)
set_config(print_changed_only=False)
ardr = ARDRegression()
print(ardr)
ardr.fit(xtrain, ytrain)
score = ardr.score(xtrain, ytrain)
print("R-squared:", score)
ypred = ardr.predict(xtest)
mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))
x_ax = range(len(ytest))
plt.plot(x_ax, ytest, linewidth=1, label="original")
plt.plot(x_ax, ypred, linewidth=1.1, label="predicted")
plt.title("y-test and y-predicted data")
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend(loc='best',fancybox=True, shadow=True)
plt.grid(True)
plt.show()
print("Boston housing dataset prediction.")
boston = load_boston()
x, y = boston.data, boston.target
x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)
ardr = ARDRegression()
ardr.fit(xtrain, ytrain)
score = ardr.score(xtrain, ytrain)
print("R-squared:", score)
ypred = ardr.predict(xtest)
mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse**(1/2.0))
x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend(loc='best',fancybox=True, shadow=True)
plt.grid(True)
plt.show()
References:
Hi! I think there's a mistake when you calculate the RMSE : you define RMSE as being print mse*(1/2.0) which means 0.5*mse and actually rmse = mse**(0.5) (for the root).
ReplyDeleteNice example tho.
Correct, thanks!
Delete