Batch Normalization is a technique to normalize the activation between the layers in neural networks to improve the training speed and accuracy (by regularization) of the model. It is intended to reduce the internal covariate shift for neural networks.
The internal covariate shift means that if the first layer changes its parameters based on back-propagation feedback, the second layer also needs to adjust its parameters based on the output of the first layer, and the third layer after the second and so on. Consequent readjustment in network layers destabilizes all the subsequent layers' learning process. This makes the training process slow especially the networks with a large number of layers. Batch Normalization is used to overcome this issue.
Batch Normalization works well with image data training and it is widely used in training of Generative Adversarial Networks (GAN) models.
In this tutorial, we'll learn how to apply batch normalization in deep learning networks with Keras in Python. The tutorial covers.
- Normalization
- Preparing the data
- Building the model
- Comparing the training results
- Source code listing
from keras.utils import to_categorical from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D from keras.layers import Dense, Flatten, Dropout from keras.layers import BatchNormalization from keras.datasets import mnist from keras.optimizers import RMSprop import matplotlib.pyplot as plt
Normalization
Normalization is a method to scale the input data with 0 mean and 1 standard deviation, that is all values are distributed between -1 and 1. It converts raw numbers into the distribution values. The below example shows how to normalize the data and its values after normalization.
import sklearn.preprocessing as prep data =[[10, 321, -22, 3210, 23, -321]] norm = prep.normalize(data) print(norm)
[[ 0.00308441 0.09900951 -0.0067857 0.99009512 0.00709414 -0.09900951]]
Here, the data values are scaled in a range between -1 and 1. This conversion improves model training speed and the same approach is used in Batch Normalization. In neural networks, ever layer applies a separate normalization layer so that it is called a Batch Normalization.
Preparing the data
For this tutorial, we'll use the 'mnist' dataset. We'll start by loading the dataset and check the training set length.
(trainX, trainY), (testX, testY) = mnist.load_data() print(trainX.shape)
(60000, 28, 28)
To make the training process lighter, I'll use some part of the dataset.
trainX = trainX[1:8001,] trainY = trainY[1:8001,] testX = testX[1:201,] testY = testY[1:201,]
Next, we'll reshape the training data and convert the output data into a categorical type.
trainX = trainX.reshape((trainX.shape[0], 28,28,1)) testX = testX.reshape((testX.shape[0], 28,28,1)) trainY = to_categorical(trainY) testY = to_categorical(testY)
Building the model
In Keras, we can easily implement Batch Normalization by adding the BatchNormalization() layer into the model.
model= Sequential() model.add(Conv2D(32, (3,3), activation="relu", input_shape=(28,28,1)))
model.add(BatchNormalization())
...
We'll write a function to train the model for both with and without a Batch Normalization. By setting true to the bn parameter the function adds BatchNormalization layer into the model. After the training, it returns the training history of the model. Then we'll compare training history with both methods.
def build_model(trainX, trainY, testX, testY, bn=False): model= Sequential() model.add(Conv2D(32, (3,3), activation="relu", input_shape=(28,28,1))) model.add(Conv2D(64, (3,3), activation="relu")) if(bn): model.add(BatchNormalization()) model.add(MaxPooling2D((2,2))) model.add(Dropout(0.2)) model.add(Flatten()) model.add(Dense(128, activation="relu")) model.add(Dropout(0.2)) if(bn): model.add(BatchNormalization()) model.add(Dense(10, activation="softmax")) model.compile(loss="categorical_crossentropy", optimizer=RMSprop(), metrics=["accuracy"]) print(model.summary()) history = model.fit(trainX, trainY, epochs=30, batch_size=16, validation_data=(testX, testY), verbose=0) _, acc = model.evaluate(testX, testY, verbose=0) if(bn): print("Accuracy with BN: ", acc) else: print("Accuracy without BN: ", acc) return history
Comparing the training results
Next, we'll train the 'mnist' data with the above function. First, we call the function without Batch Normalization. It takes a few munites to train the data on the CPU. After the training is finished, we can check the model and its accuracy.
model_hist = build_model(trainX, trainY, testX, testY)
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_9 (Conv2D) (None, 26, 26, 32) 320 _________________________________________________________________ conv2d_10 (Conv2D) (None, 24, 24, 64) 18496 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 12, 12, 64) 0 _________________________________________________________________ dropout_9 (Dropout) (None, 12, 12, 64) 0 _________________________________________________________________ flatten_5 (Flatten) (None, 9216) 0 _________________________________________________________________ dense_9 (Dense) (None, 128) 1179776 _________________________________________________________________ dropout_10 (Dropout) (None, 128) 0 _________________________________________________________________ dense_10 (Dense) (None, 10) 1290 ================================================================= Total params: 1,199,882 Trainable params: 1,199,882 Non-trainable params: 0 _________________________________________________________________ None Accuracy without BN: 0.91
Next, we'll call the function by applying the Batch Normalization.
model_hist_bn = build_model(trainX, trainY, testX, testY, bn=True)
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_13 (Conv2D) (None, 26, 26, 32) 320 _________________________________________________________________ conv2d_14 (Conv2D) (None, 24, 24, 64) 18496 _________________________________________________________________ batch_normalization_9 (Batch (None, 24, 24, 64) 256 _________________________________________________________________ max_pooling2d_7 (MaxPooling2 (None, 12, 12, 64) 0 _________________________________________________________________ dropout_13 (Dropout) (None, 12, 12, 64) 0 _________________________________________________________________ flatten_7 (Flatten) (None, 9216) 0 _________________________________________________________________ dense_13 (Dense) (None, 128) 1179776 _________________________________________________________________ dropout_14 (Dropout) (None, 128) 0 _________________________________________________________________ batch_normalization_10 (Batc (None, 128) 512 _________________________________________________________________ dense_14 (Dense) (None, 10) 1290 ================================================================= Total params: 1,200,650 Trainable params: 1,200,266 Non-trainable params: 384 _________________________________________________________________ None Accuracy with BN: 0.975
Finally, we'll visualize both training results in a plot.
f = plt.figure() f.add_subplot(1,2,1) plt.title("Train without Batch Normalization") plt.plot(model_hist.history['acc'], label='train') plt.plot(model_hist.history['val_acc'], label="test") plt.legend() f.add_subplot(1,2,2) plt.title("Train with Batch Normalization") plt.plot(model_hist_bn.history['acc'], label='train') plt.plot(model_hist_bn.history['val_acc'], label="test") plt.legend() plt.show()
In this tutorial, we've briefly learned Batch Normalization and how to apply it to the neural networks in Keras.
Source code listing
from keras.utils import to_categorical from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D from keras.layers import Dense, Flatten, Dropout from keras.layers import BatchNormalization from keras.datasets import mnist from keras.optimizers import RMSprop import matplotlib.pyplot as plt (trainX, trainY), (testX, testY) = mnist.load_data() print(trainX.shape) trainX = trainX[1:8001,] trainY = trainY[1:8001,] testX = testX[1:201,] testY = testY[1:201,] trainX = trainX.reshape((trainX.shape[0], 28,28,1)) testX = testX.reshape((testX.shape[0], 28,28,1)) trainY = to_categorical(trainY) testY = to_categorical(testY) def build_model(trainX, trainY, testX, testY, bn=False): model= Sequential() model.add(Conv2D(32, (3,3), activation="relu", input_shape=(28,28,1))) model.add(Conv2D(64, (3,3), activation="relu")) if(bn): model.add(BatchNormalization()) model.add(MaxPooling2D((2,2))) model.add(Dropout(0.2)) model.add(Flatten()) model.add(Dense(128, activation="relu")) model.add(Dropout(0.2)) if(bn): model.add(BatchNormalization()) model.add(Dense(10, activation="softmax")) model.compile(loss="categorical_crossentropy", optimizer=RMSprop(), metrics=["accuracy"]) print(model.summary()) history = model.fit(trainX, trainY, epochs=30, batch_size=16, validation_data=(testX, testY), verbose=0) _, acc = model.evaluate(testX, testY, verbose=0) if(bn): print("Accuracy with BN: ", acc) else: print("Accuracy without BN: ", acc) return history model_hist = build_model(trainX, trainY, testX, testY) model_hist_bn = build_model(trainX, trainY, testX, testY,bn=True) f = plt.figure() f.add_subplot(1,2,1) plt.title("Train without Batch Normalization") plt.plot(model_hist.history['acc'], label='train') plt.plot(model_hist.history['val_acc'], label="test") plt.legend() f.add_subplot(1,2,2) plt.title("Train with Batch Normalization") plt.plot(model_hist_bn.history['acc'], label='train') plt.plot(model_hist_bn.history['val_acc'], label="test") plt.legend() plt.show()
Referense
Batch Normalization: Accelerating Deep Network Training byReducing Internal Covariate Shift
No comments:
Post a Comment