- Preparing data
- Regression with Lasso
- Regression with LassoCV
- Source code listing
We'll start by loading the required libraries.
from sklearn.datasets import load_boston from sklearn.linear_model import Lasso, LassoCV from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split import numpy as np import matplotlib.pyplot as plt
Preparing data
We use Boston house-price dataset as regression data in this tutorial. After loading the dataset, first, we'll separate it into the x - feature and y - label, then split into the train and test parts. Here, we'll extract 15 percent of the dataset as test data.
boston = load_boston() x, y = boston.data, boston.target xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.15)
Regression with Lasso
Lasso regularization in a model can described,
L1 = (wx + b - y) + a|w|
w - weight, b - bias, y - label (original), a - alpha constant. If we set 0 value into a, it becomes a linear regression model. Thus for Lasso, alpha should be a > 0.
To define the model we use default parameters of Lasso class ( default alpha is 1). Then we'll fit the model with training data.
model=Lasso().fit(x, y)
print(model)
Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000, normalize=False, positive=False, precompute=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False)
Next, we'll check the score (R-squared), predict test data, check the accuracy, and print all the metrics.
score = model.score(x, y) ypred = model.predict(xtest) mse = mean_squared_error(ytest, ypred) print("Alpha:{0:.2f}, R2:{1:.2f}, MSE:{2:.2f}, RMSE:{3:.2f}" .format(model.alpha, score, mse, np.sqrt(mse)))
Alpha:1.00, R2:0.68, MSE:27.10, RMSE:5.21
Here, we can change the alpha value to improve model accuracy. To find out what value works well with the model, we'll find out with the LassoCV class.
Regression with LassoCV
LassoCV applies cross-validation method to find out the best model. We'll set multiple alpha values and to train the model.
alphas = [0.1,0.3, 0.5, 0.8, 1] lassocv = LassoCV(alphas=alphas, cv=5).fit(x,y) print(lassocv)
LassoCV(alphas=[0.1, 0.3, 0.5, 0.8, 1], copy_X=True, cv=5, eps=0.001, fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False, precompute='auto', random_state=None, selection='cyclic', tol=0.0001, verbose=False)
Next, we'll check the score (R-squared), predict test data, check the accuracy, and print all the metrics.
score = lassocv.score(x,y) ypred = lassocv.predict(xtest) mse = mean_squared_error(ytest,ypred) print("Alpha:{0:.2f}, R2:{1:.3f}, MSE:{2:.2f}, RMSE:{3:.2f}" .format(lassocv.alpha_, score, mse, np.sqrt(mse)))
Alpha:0.30, R2:0.721, MSE:20.24, RMSE:4.50
Finally, we can visualize the result in a plot.
x_ax = range(len(xtest)) plt.scatter(x_ax, ytest, s=5, color="blue", label="original") plt.plot(x_ax, ypred,lw=0.8, color="red", label="predicted") plt.legend() plt.show()
In this post, we've briefly learned how to use Ridge and RidgeCV classes for regression data analysis in Python. The full source code is listed below. Thank you for reading!
Source code listing
from sklearn.datasets import load_boston from sklearn.linear_model import Lasso, LassoCV from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split import numpy as np import matplotlib.pyplot as plt boston = load_boston() x, y = boston.data, boston.target xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.15) model = Lasso().fit(x, y) print(model) score = model.score(x, y) ypred = model.predict(xtest) mse = mean_squared_error(ytest,ypred) print("Alpha:{0:.2f}, R2:{1:.2f}, MSE:{2:.2f}, RMSE:{3:.2f}" .format(model.alpha, score, mse, np.sqrt(mse))) x_ax = range(len(ypred)) plt.scatter(x_ax, ytest, s=5, color="blue", label="original") plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted") plt.legend() plt.show() alphas = [0.1,0.3, 0.5, 0.8, 1] lassocv = LassoCV(alphas=alphas, cv=5).fit(x,y) print(lassocv) score = lassocv.score(x,y) ypred = lassocv.predict(xtest) mse = mean_squared_error(ytest,ypred) print("Alpha:{0:.2f}, R2:{1:.3f}, MSE:{2:.2f}, RMSE:{3:.2f}" .format(lassocv.alpha_, score, mse, np.sqrt(mse))) x_ax = range(len(xtest)) plt.scatter(x_ax, ytest, s=5, color="blue", label="original") plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted") plt.legend() plt.show()
No comments:
Post a Comment