AI News, Machine Learning Blog Software Development News
- On Monday, June 4, 2018
- By Read More
Machine Learning Blog Software Development News
I’ve been using it heavily and contributing to the project periodically for quite some time and I definitely recommend it to anyone who wants to work on Deep Learning.
Its default behavior has changed over time, nevertheless it still causes problems to many users and as a result there are several related open issues on Github.
The idea behind it is that since the pre-trained model was fit on images, the bottom convolutions can recognize features like lines, edges and other useful patterns meaning you can use its weights either as good initialization values or partially retrain the network with your data. Keras
It addresses the vanishing gradient problem by standardizing the output of the previous layer, it speeds up the training by reducing the number of required iterations and it enables the training of deeper neural networks.
Even though it is not recommended, the user is also able to statically change the learning_phase to a specific value but this needs to happen before any model or tensor is added in the graph.
Before v2.1.3 when the BN layer was frozen (trainable = False) it kept updating its batch statistics, something that caused epic headaches to its users.
If we do update partially its weights and the next layers are also frozen, they will never get the chance to adjust to the updates of the mini-batch statistics leading to higher error.
Unfortunately, by doing so you get no guarantees that the mean and variance of your new dataset inside the BN layers will be similar to the ones of the original dataset.
Let’s assume that we fine-tune the model from Convolution k+1 up until the top of the network (right side) and we keep frozen the bottom (left side).
It will also cause the rest of the network (from CONV k+1 and later) to be trained with inputs that have different scales comparing to what will receive during inference.
During training your network can adapt to these changes, nevertheless the moment you switch to prediction-mode, Keras will use different standardization statistics, something that will swift the distribution of the inputs of the next layers leading to poor results.
One way to detect it is to set statically the learning phase of Keras to 1 (train mode) and to 0 (test mode) and evaluate your model in each case.
as you can see on the examples on the next subsections, the best way to do this is to start with a clean session and change the learning_phase before any tensor is defined in the graph.
If the accuracy is close to 50% but the AUC is close to 1 (and also you observe differences between train/test mode on the same dataset), it could be that the probabilities are out-of-scale due the BN statistics.
By applying the above fix, when a BN layer is frozen it will no longer use the mini-batch statistics but instead use the ones learned during training.
Even though I wrote the above implementation recently, the idea behind it is heavily tested on real-world problems using various workarounds that have the same effect.
For example, the discrepancy between training and testing modes and can be avoided by splitting the network in two parts (frozen and unfrozen) and performing cached training (passing data through the frozen model once and then using them to train the unfrozen network).
Here are a few important points about the experiment: The code for the experiment is shown below: Let’s check the results on Keras v2.1.5: As we can see above, during training the model learns very well the data and achieves on the training set near-perfect accuracy.
After the training is completed we evaluate the model using 3 different learning_phase configurations: Dynamic, Static = 0 (test mode) and Static = 1 (training mode).
As we can see the first two configurations will provide identical results in terms of loss and accuracy and their value matches the reported accuracy of the model on the validation set in the last iteration.
We will do 10 epochs to train the top classification layer using RSMprop and then we will do another 5 to fine-tune everything after the 139th layer using SGD(lr=1e-4, momentum=0.9).
- On Wednesday, September 18, 2019
Beautiful Super Model Photo Shoot on Beach Ever
Watch this Unusually Stunning Beautiful Hot Models of the World ... She has an alluring style, posts pictures of exotic beaches, family photos, ... She has many ...
Penelope Cruz Sex Scene en Manolete