In this tutorial, we'll learn how to build the Variational Autoencoders (VAE) and generate the images in R. Classical autoencoder simply learns how to encode input and decode the output based on given data using in between randomly generated latent space layer. By using this method we can not increase the model training ability by updating parameters in learning.
The variational autoencoders, on the other hand, apply some statistical findings by using learned mean and standard deviations to learn the distribution. The latent space mean and variance are kept to update in each layer and this helps to improve the generator model. The tutorial covers,
- Preparing the data
- Defining the encoder
- Defining the VAE model
- Defining generator
- Generating images
- Source code listing
In my previous post, we learned how to create classical autoencoders with simple dense and convolutional layers in R you can check them in below link.
How to Build Simple Autoencoder with Keras in R
Convolutional Autoencoder Example with Keras in R
Let's get started by loading the Keras packages for R.
library(keras)
Preparing the data
We'll use MNIST handwritten digit dataset to train the VAE. After loading it, we'll scale it into the range of [0, 1]. VAE training requires only input data so that we focus only on x part of the dataset.
c(c(xtrain, ytrain), c(xtest, ytest)) %<-% dataset_mnist()
print(dim(xtrain))
input_size = dim(xtrain)[2]*dim(xtrain)[3]
x_train = xtrain/255
x_test = xtest/255
latent_size = 10
[1] 60000 28 28
Next, we'll define the input data size that comes from image dimension and latent vector size to use in a model later.
x_train <- array_reshape(x_train, c(nrow(x_train), input_size))
x_test <- array_reshape(x_test, c(nrow(x_test), input_size))
print(dim(x_train))
[1] 60000 784
Defining the encoder
We'll start defining the encoder. After the first layers, we'll extract the mean and log variance of this layer. We can create a z layer based on those two parameters to generate an input image.
encoder <- keras_model(enc_input, z_mean) enc_input <- layer_input(shape = c(input_size)) layer_one <- layer_dense(enc_input, units=256, activation = "relu") z_mean <- layer_dense(layer_one, latent_size) z_log_var <- layer_dense(layer_one, latent_size) encoder <- keras_model(enc_input, z_mean)
summary(encoder)
Model: "model_1"
_________________________________________________________________________________
Layer (type) Output Shape Param #
=================================================================================
input_2 (InputLayer) [(None, 784)] 0
_________________________________________________________________________________
dense_3 (Dense) (None, 256) 200960
_________________________________________________________________________________
dense_4 (Dense) (None, 10) 2570
=================================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________________________
The latent space sampling function helps to sample the distribution by using mean and variance and returns sampled latent vector. The decoder uses this information to generate images.
sampling <- function(arg){
z_mean <- arg[, 1:(latent_size)]
z_log_var <- arg[, (latent_size + 1):(2 * latent_size)]
epsilon <- k_random_normal(shape = c(k_shape(z_mean)[[1]]), mean=0)
z_mean + k_exp(z_log_var/2)*epsilon
}
z <- layer_concatenate(list(z_mean, z_log_var)) %>%
layer_lambda(sampling)
Defining the VAE model
Next, we'll define the decoder layer by z vector data. The VAE model contains input and final output layers.
decoder_layer <- layer_dense(units = 256, activation = "relu") decoder_mean <- layer_dense(units = input_size, activation = "sigmoid") h_decoded <- decoder_layer(z) x_decoded_mean <- decoder_mean(h_decoded) vae <- keras_model(enc_input, x_decoded_mean) summary(vae)
Model: "model_2" _________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================= input_2 (InputLayer) [(None, 784)] 0 _________________________________________________________________________________ dense_3 (Dense) (None, 256) 200960 input_2[0][0] _________________________________________________________________________________ dense_4 (Dense) (None, 10) 2570 dense_3[0][0] _________________________________________________________________________________ dense_5 (Dense) (None, 10) 2570 dense_3[0][0] _________________________________________________________________________________ concatenate (Concatenate) (None, 20) 0 dense_4[0][0] dense_5[0][0] _________________________________________________________________________________ lambda (Lambda) (None, 10) 0 concatenate[0][0] _________________________________________________________________________________ dense_6 (Dense) (None, 256) 2816 lambda[0][0] _________________________________________________________________________________ dense_7 (Dense) (None, 784) 201488 dense_6[0][0] ================================================================================= Total params: 410,404 Trainable params: 410,404 Non-trainable params: 0 _________________________________________________________________________________
vae_loss <- function(input, x_decoded_mean){
xent_loss=(input_size/1.0)*loss_binary_crossentropy(input, x_decoded_mean)
kl_loss=-0.5*k_mean(1+z_log_var-k_square(z_mean)-k_exp(z_log_var), axis=-1)
xent_loss + kl_loss
}
Finally, we'll compile the model by using optimizer and loss function.
vae %>% compile(optimizer = "rmsprop", loss = vae_loss)
Defining generator
The generator model helps to generate images by using encoded data.
dec_input <- layer_input(shape = latent_size) h_decoded_2 <- decoder_layer(dec_input) x_decoded_mean_2 <- decoder_mean(h_decoded_2) generator <- keras_model(dec_input, x_decoded_mean_2) summary(generator)
Model: "model_3" _________________________________________________________________________________ Layer (type) Output Shape Param # ================================================================================= input_3 (InputLayer) [(None, 10)] 0 _________________________________________________________________________________ dense_6 (Dense) (None, 256) 2816 _________________________________________________________________________________ dense_7 (Dense) (None, 784) 201488 ================================================================================= Total params: 204,304 Trainable params: 204,304 Non-trainable params: 0 _________________________________________________________________________________
Training the model
Finally, we'll train the VAE model on train data.
vae %>% fit(
x_train, x_train,
shuffle = TRUE,
epochs = 20,
batch_size = 64,
validation_data = list(x_test, x_test)
)
If you get the error like the below in this stage, then you may need to use below command to fix it.
Error in py_call_impl(callable, dots$args, dots$keywords) :
RuntimeError: in user code:
C:\Users\abcd\AppData\Local\r-miniconda\envs\r-reticulate\lib\site-packages\tensorflow\python\keras\engine\training.py:571 train_function *
outputs = self.distribute_strategy.run(
.....
tensorflow::tf$compat$v1$disable_eager_execution()
Generating images
Now, we can encode and generate images by using the above model. Here, I use the first 10 images of the xtest data.
n = 10
test = x_test[0:n,]
x_test_encoded <- predict(encoder, test)
decoded_imgs = generator %>% predict(x_test_encoded)
pred_images = array_reshape(decoded_imgs, dim=c(dim(decoded_imgs)[1], 28, 28))
orig_imgaes = array_reshape(test, dim=c(dim(test)[1], 28, 28))
We'll check the generated images visually. The left column is the original and the right one is the generated images.
op = par(mfrow=c(n,2), mar=c(1,0,0,0))
for (i in 1:n)
{
plot(as.raster(orig_imgaes[i,,]))
plot(as.raster(pred_images[i,,]))
}
In this tutorial, we've briefly learned how to build the VAE model and generate the image in R. The full source code is listed below.
Source code listing
library(keras)
c(c(xtrain, ytrain), c(xtest, ytest)) %<-% dataset_mnist()
print(dim(xtrain))
input_size = dim(xtrain)[2]*dim(xtrain)[3]
x_train = xtrain/255
x_test = xtest/255
latent_size = 10
x_train <- array_reshape(x_train, c(nrow(x_train), input_size)) x_test <- array_reshape(x_test, c(nrow(x_test), input_size)) print(dim(x_train))
encoder <- keras_model(enc_input, z_mean) enc_input <- layer_input(shape = c(input_size)) layer_one <- layer_dense(enc_input, units=256, activation = "relu") z_mean <- layer_dense(layer_one, latent_size) z_log_var <- layer_dense(layer_one, latent_size) encoder <- keras_model(enc_input, z_mean)
summary(encoder)
sampling <- function(arg){ z_mean <- arg[, 1:(latent_size)] z_log_var <- arg[, (latent_size + 1):(2 * latent_size)] epsilon <- k_random_normal(shape = c(k_shape(z_mean)[[1]]), mean=0) z_mean + k_exp(z_log_var/2)*epsilon }
z <- layer_concatenate(list(z_mean, z_log_var)) %>% layer_lambda(sampling)
decoder_layer <- layer_dense(units = 256, activation = "relu") decoder_mean <- layer_dense(units = input_size, activation = "sigmoid") h_decoded <- decoder_layer(z) x_decoded_mean <- decoder_mean(h_decoded) vae <- keras_model(enc_input, x_decoded_mean) summary(vae)
vae_loss <- function(input, x_decoded_mean){ xent_loss=(input_size/1.0)*loss_binary_crossentropy(input, x_decoded_mean) kl_loss=-0.5*k_mean(1+z_log_var-k_square(z_mean)-k_exp(z_log_var), axis=-1) xent_loss + kl_loss }
vae %>% compile(optimizer = "rmsprop", loss = vae_loss) summary(vae)
dec_input <- layer_input(shape = latent_size) h_decoded_2 <- decoder_layer(dec_input) x_decoded_mean_2 <- decoder_mean(h_decoded_2) generator <- keras_model(dec_input, x_decoded_mean_2) summary(generator)
vae %>% fit( x_train, x_train, shuffle = TRUE, epochs = 20, batch_size = 64, validation_data = list(x_test, x_test) )
n = 10 test = x_test[0:n,] x_test_encoded <- predict(encoder, test) decoded_imgs = generator %>% predict(x_test_encoded) pred_images = array_reshape(decoded_imgs, dim=c(dim(decoded_imgs)[1], 28, 28)) orig_imgaes = array_reshape(test, dim=c(dim(test)[1], 28, 28))
op = par(mfrow=c(n,2), mar=c(1,0,0,0)) for (i in 1:n) { plot(as.raster(orig_imgaes[i,,])) plot(as.raster(pred_images[i,,])) }
References:
No comments:
Post a Comment