tensorflow
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training,
as the parameters of the previous layers change. This slows down the training by requiring lower
learning rates and careful parameter initialization, and makes it notoriously hard to train models
with saturating nonlinearities. Batch Normalization is used to address this kind of problems known
as vanishing/exploding gradients problems. The technique consists of adding operations such as zero
centering, normalizing the inputs, scaling and shifting the results. These sequence of operations
are added just before the activation function of each layer of our deep neural network.
According to the original paper by Sergey Ioffe, Google Inc and Christian Szegedy, Google Inc (arXiv:1502.03167
[cs.LG]), Batch Normalization achieves the same accuracy with 14 times fewer training steps, and
beats the original model by a significant margin. Using an ensemble of batch-normalized networks,
they improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation
error (and 4.8% test error), exceeding the accuracy of human raters.
Here I will be showing you how you can implement batch normalization on the MNIST dataset using
tensorflow.
view notebook