- Preparing data
- Model fitting and prediction
- Source code listing
library(mgcv)
library(corrplot)
Preparing data
In this tutorial, we'll use the Boston housing dataset as a regression dataset. A gam function requires smoothing parameters to fit the model. Thus, first, we'll identify the highly correlated features to the target variable 'medv' of the dataset. We can check the correlation of the features to each other as a following.
boston = MASS::Boston cors = cor(boston) corrplot(cors, method="number")
The correlation matrix shows that rm, lstat, ptratio, and indus features are highly correlated to the medv variable. We can use them as a smoothing variable in the gam.
Model fitting and prediction
Now, we can define the gam model and fit it with the Boston dataset. Here, we set 'rm' and 'lstat' features as a smoothing factor. We can also add the remaining variables of the dataset.
bgam=gam(medv~s(rm)+s(lstat)+ptratio+indus+crim+zn+age, data=boston) summary(bgam)
Family: gaussian
Link function: identity
Formula:
medv ~ s(rm) + s(lstat) + ptratio + indus + crim + zn + age
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.54656 2.08187 15.153 < 2e-16 ***
ptratio -0.52355 0.10112 -5.178 3.29e-07 ***
indus 0.00383 0.04052 0.095 0.9247
crim -0.13005 0.02498 -5.207 2.84e-07 ***
zn -0.01682 0.01065 -1.579 0.1150
age 0.01848 0.01055 1.751 0.0806 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(rm) 6.514 7.680 24.30 2e-16 ***
s(lstat) 6.272 7.451 34.29 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.807 Deviance explained = 81.4%
GCV = 16.948 Scale est. = 16.319 n = 506
Next, we'll predict Boston data with the fitted model.
pred = predict(bgam, newdata = boston[,-14])
We can compare the predicted result with the original one by visualizing them in a plot.
x = 1:nrow(boston) plot(x, boston$medv, col="blue", type = "l") lines(x, pred, col="red", type = "l" ) legend("bottomleft", legend=c("y-fitted", "y-origianl"), col=c("red", "blue"), lty=1, cex=0.7)
In this tutorial, we've briefly learned how to use gam model for the regression problem in R. Source code is listed below.
Source code listing
library(mgcv)
library(corrplot)
boston = MASS::Boston cors = cor(boston) corrplot(cors, method="number")
bgam=gam(medv~s(rm)+s(lstat)+ptratio+indus+crim+zn+age, data=boston)
summary(bgam)
pred = predict(bgam, newdata = boston[,-14])
x = 1:nrow(boston)
plot(x, boston$medv, col="blue", type = "l")
lines(x, pred, col="red", type = "l" )
legend("bottomleft", legend=c("y-fitted", "y-origianl"),
col=c("red", "blue"), lty=1, cex=0.7)
Good demonstration
ReplyDelete