# AI News, Batch Normalization in Neural Networks

- On Sunday, September 30, 2018
- By Read More

## Batch Normalization in Neural Networks

If the input layer is benefiting from it, why not do the same thing also for the values in the hidden layers, that are changing all the time, and get 10 times or more improvement in the training speed.

In other words, if an algorithm learned some X to Y mapping, and if the distribution of X changes, then we might need to retrain the learning algorithm by trying to align the distribution of X with the distribution of Y.

To increase the stability of a neural network, batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

Consequently, batch normalization adds two trainable parameters to each layer, so the normalized output is multiplied by a “standard deviation” parameter (gamma) and add a “mean” parameter (beta).

From the original batch-norm paper Batch normalization and pre-trained networks like VGG: VGG doesn’t have a batch norm layer in it because batch normalization didn’t exist before VGG.

If we insert a batch norm in a pre-trained network, it will change the pre-trained weights, because it will subtract the mean and divide by the standard deviation for the activation layers and we don’t want that to happen because we need those pre-trained weights to stay the same.

- On Wednesday, February 19, 2020

**Lecture 6 | Training Neural Networks I**

In Lecture 6 we discuss many practical issues for training modern neural networks. We discuss different activation functions, the importance of data ...

**Layers - Keras**

Here I talk about Layers, the basic building blocks of Keras. Layers are essentially little functions that are stateful - they generally have weights associated with ...

**Lecture 7 | Training Neural Networks II**

Lecture 7 continues our discussion of practical issues for training neural networks. We discuss different update rules commonly used to optimize neural networks ...

**Lecture 3 | Loss Functions and Optimization**

Lecture 3 continues our discussion of linear classifiers. We introduce the idea of a loss function to quantify our unhappiness with a model's predictions, and ...

**Lesson 5: Practical Deep Learning for Coders**

INTRO TO NLP AND RNNS We start by combining everything we've learned so far to see what that buys us; and we discover that we get a Kaggle-winning result ...

**TensorFlow Hub (TensorFlow Dev Summit 2018)**

Andrew Gasparovic and Jeremiah Harmsen dicuss TF Hub, a new library built to foster the publication, discovery, and consumption of reusable parts of machine ...

**Lecture 16 | Adversarial Examples and Adversarial Training**

In Lecture 16, guest lecturer Ian Goodfellow discusses adversarial examples in deep learning. We discuss why deep networks and other machine learning ...

**Lecture 9 | CNN Architectures**

In Lecture 9 we discuss some common architectures for convolutional neural networks. We discuss architectures which performed well in the ImageNet ...

**Lesson 2: Practical Deep Learning for Coders**

CONVOLUTIONAL NEURAL NETWORKS For last week's assignment your goal was to get into the top 50% of the Kaggle Dogs v Cats competition. This week ...

**Keras Tutorial TensorFlow | Deep Learning with Keras | Building Models with Keras | Edureka**

TensorFlow Training - ** This Edureka Keras Tutorial TensorFlow video (Blog: .