What is the benefit of batch normalization?
The model is less sensitive to hyperparameter tuning. High learning rates become acceptable, which results in faster training of the model. Weight initialization becomes an easy task. Using different non-linear activation functions becomes feasible. Deep neural networks are simplified because of batch normalization. It introduces mild regularisation in the network.