DataTechNotes: Understanding Ridge regularization with R

To reduce overfit in regression model, Ridge method uses L2-norm regularization which is the sum of squared coefficients. The method penalizes the large coefficients by reducing coefficient values. To implement Ridge regularization, we apply the same procedure as we did in Lasso regularization. The only difference is that we set 0 value to the alpha parameter (it was 1 in Lasso) of cv.glmnet() and glmnet() functions.

Creating test data.

library(glmnet)
n <- 50
set.seed(123)
  
a <- sample(1:20, n, replace = T)/10
b <- sample(1:10, n, replace = T)/10
c <- sort(sample(1:10, n, replace = T))
z <- (a*b)/2 +c + sample(-10:10, n, replace = T)/10
df <- data.frame(a,b,c,z)
 
x <- as.matrix(df)[,-4] 
y <- z

Cross-validation with cv.glmnet() to find out lambda value. Alpha = 0 defines Ridge shrinkage. Lambda is the value of shrinkage level.

ridge_cv <- cv.glmnet(x, y, family = "gaussian", alpha = 0)
summary(ridge_cv)
           Length Class  Mode     
lambda     99     -none- numeric  
cvm        99     -none- numeric  
cvsd       99     -none- numeric  
cvup       99     -none- numeric  
cvlo       99     -none- numeric  
nzero      99     -none- numeric  
name        1     -none- character
glmnet.fit 12     elnet  list     
lambda.min  1     -none- numeric  
lambda.1se  1     -none- numeric

coef(ridge_cv)
4 x 1 sparse Matrix of class "dgCMatrix"
                     1
(Intercept) 0.73133456
a           0.04165009
b           0.09203077
c           0.91116936

best_lambda <- ridge_cv$lambda.min
cat(best_lambda)
0.3233251

plot(ridge_cv)

Fitting with the best lambda value.

ridge_mod <- glmnet(x, y, family = "gaussian", 
                     alpha = 0, lambda = best_lambda)
coef(ridge_mod)
4 x 1 sparse Matrix of class "dgCMatrix"
                    s0
(Intercept) 0.59480957
a           0.06545458
b           0.09290615
c           0.93056465

Predicting test data with a model and calculatin RMSE, R-squared, and MSE values.

pred <- predict(ridge_mod, x)
rmse <- sqrt(mean((pred - y)^2))
R2 <- 1 - (sum((y - pred)^2)/sum((y - mean(y))^2))
mse <- mean((y - pred)^2)
cat(" RMSE:", rmse, "\n", "R-squared:", R2, "\n", "MSE:", mse)
 RMSE: 0.5457135 
 R-squared: 0.9665464 
 MSE: 0.2978032

plot(1:n, y,pch=16)
lines(1:n, pred, type="l", col="red")

cbind(df, z_pred=as.vector(pred))
     a   b  c      z    z_pred
1  0.6 0.1  1  1.730  1.573938
2  1.6 0.5  1  1.400  1.676555
3  0.9 0.8  1  1.160  1.658608
4  1.8 0.2  1  0.680  1.661774
5  1.9 0.6  2  1.770  2.636046
6  0.1 0.3  2  1.815  2.490356
7  1.1 0.2  2  2.310  2.546520
8  1.8 0.8  2  2.120  2.648082
9  1.2 0.9  2  2.440  2.618100
10 1.0 0.4  3  2.600  3.489121
11 2.0 0.7  3  3.700  3.582447
12 1.0 0.1  3  2.750  3.461249
13 1.4 0.4  3  3.580  3.515302
14 1.2 0.3  4  3.880  4.423486
15 0.3 0.9  4  3.835  4.420320
16 1.8 0.5  4  4.550  4.481339
17 0.5 0.9  4  4.725  4.433411
18 0.1 0.9  4  3.445  4.407229
19 0.7 0.8  4  4.080  4.437211
20 2.0 0.5  5  5.000  5.424995
21 1.8 0.8  5  6.020  5.439776
22 1.4 0.7  5  4.790  5.404304
23 1.3 0.8  5  6.320  5.407049
24 2.0 0.1  5  5.600  5.387833
25 1.4 0.5  5  5.750  5.385722
26 1.5 0.3  6  6.425  6.304251
27 1.1 0.4  6  5.920  6.287360
28 1.2 0.7  6  6.520  6.321777
29 0.6 0.4  6  6.920  6.254633
30 0.3 0.2  7  7.230  7.146980
31 2.0 0.3  7  8.000  7.267543
32 1.9 0.7  7  7.265  7.298160
33 1.4 0.5  7  7.750  7.246852
34 1.6 0.8  7  7.140  7.287814
35 0.1 0.2  7  7.210  7.133889
36 1.0 0.5  8  8.250  8.151234
37 1.6 1.0  8  8.300  8.236960
38 0.5 0.9  8  8.325  8.155670
39 0.7 0.9  8  9.215  8.168761
40 0.5 0.2  9  9.850  9.021200
41 0.3 0.2  9  8.530  9.008109
42 0.9 0.7  9  8.915  9.093835
43 0.9 0.4  9 10.180  9.065963
44 0.8 0.7 10 10.580 10.017854
45 0.4 0.4 10 10.980  9.963800
46 0.3 0.2 10  9.930  9.938674
47 0.5 0.8 10 10.000 10.007508
48 1.0 0.1 10 10.350  9.975201
49 0.6 0.5 10  9.450  9.986182
50 1.8 0.6 10 10.740 10.074018

DataTechNotes

Pages

Understanding Ridge regularization with R

No comments:

Post a Comment