In this post, we'll briefly learn how to use Lasso regularization in R. A 'glmnet' package provides regularization functions for Lasso. The tutorial covers:
- Preparing the data
- Defining the model
- Predicting test data
- Source code listing
library(glmnet) library(caret)
Preparing the data
We'll use Boston housing dataset in this tutorial, we'll load it, split into the train test parts.
set.seed(123) boston = MASS::Boston indexes = createDataPartition(boston$medv, p=.85, list=F) train = boston[indexes, ] test = boston[-indexes, ]
You can check the dataset with 'str(boston)' command. Here, the 'medv' variable is y, label column, and remainings are the x, feature data.
Now, we'll separate x and y parts of the train and test data. An x input data should be in matrix type and we'll convert it.
xtrain = as.matrix(train)[,-14] ytrain = train[,14] xtest = as.matrix(test)[,-14] ytest = test[,14]
Defining the model
Next, we'll find out the lambda factor which defines the amount of shrinkage, with the 'glmnet' cross-validation function. We'll run cv.glmnet function with the alpha=1 parameter that defines the Lasso method.
lasso_cv = cv.glmnet(xtrain, ytrain, family="gaussian", alpha=1)
We can check the coefficients.
coef(lasso_cv) 14 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) 21.7893929324 crim -0.0122937347 zn 0.0098930474 indus . chas 2.5311725502 nox -5.7841042218 rm 3.9974018400 age . dis -0.4873179429 rad . tax -0.0003229633 ptratio -0.8262576443 black 0.0068882072 lstat -0.5161864470
The model can also be plotted.
plot(lasso_cv)
best_lambda = lasso_cv$lambda.min cat(best_lambda) 0.01014079
Next, we'll fit the model again by using the best-lambda value.
lasso_mod = glmnet(xtrain, ytrain, family = "gaussian", alpha = 1, lambda = best_lambda)
We'll check the coefficients again.
coef(lasso_mod) 14 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) 37.021345010 crim -0.086726277 zn 0.050935275 indus 0.011735446 chas 3.073769966 nox -18.075419693 rm 3.676920080 age 0.004979978 dis -1.446401516 rad 0.270290914 tax -0.011341606 ptratio -0.940283913 black 0.008557049 lstat -0.521339466
Predicting test data
Finally, we'll predict the xtest data with the trained model and check the accuracy with MSE, MAE, RMSE, and R-squared metrics.
yhat = predict(lasso_mod, xtest) mse = mean((ytest - yhat) ^ 2) mae = MAE(ytest, yhat) rmse = RMSE(ytest, yhat) r2 = R2(ytest, yhat, form = "traditional") cat(" MAE:", mae, "\n", "MSE:", mse, "\n", "RMSE:", rmse, "\n", "R-squared:", r2) MAE: 3.889155 MSE: 24.04217 RMSE: 4.903281 R-squared: 0.5536334
We can visualize the results in a plot.
x = 1:length(ytest) plot(x, ytest, ylim=c(min(yhat), max(ytest)), pch=20, col="red") lines(x, yhat, lwd="1", col="blue") legend("topleft", legend=c("medv", "pred-medv"), col=c("red", "blue"), lty=1,cex = 0.8, lwd=1, bty='n')
In this tutorial, we've briefly learned how to use the glmnet lasso method to fit and predict the regression data. The full source code is listed below.
Source code listing
library(glmnet) library(caret)
set.seed(123)
boston = MASS::Boston
indexes = createDataPartition(boston$medv, p=.85, list=F)
train = boston[indexes, ]
test = boston[-indexes, ]
xtrain = as.matrix(train)[,-14]
ytrain = train[,14]
xtest = as.matrix(test)[,-14]
ytest = test[,14]
lasso_cv = cv.glmnet(xtrain, ytrain, family="gaussian", alpha=1)
coef(lasso_cv)
plot(lasso_cv)
best_lambda = lasso_cv$lambda.min
cat(best_lambda)
lasso_mod = glmnet(xtrain, ytrain, family = "gaussian",
alpha = 1, lambda = best_lambda)
coef(lasso_mod)
yhat = predict(lasso_mod, xtest)
mse = mean((ytest - yhat) ^ 2)
mae = MAE(ytest, yhat)
rmse = RMSE(ytest, yhat)
r2 = R2(ytest, yhat, form = "traditional")
cat(" MAE:", mae, "\n", "MSE:", mse, "\n",
"RMSE:", rmse, "\n", "R-squared:", r2)
x = 1:length(ytest)
plot(x, ytest, ylim=c(min(yhat), max(ytest)), pch=20, col="red")
lines(x, yhat, lwd="1", col="blue")
legend("topleft", legend=c("medv", "pred-medv"),
col=c("red", "blue"), lty=1,cex = 0.8, lwd=1, bty='n')
No comments:
Post a Comment