AI News, Understanding deep learning requires re-thinkinggeneralization
- On Sunday, September 30, 2018
- By Read More
Understanding deep learning requires re-thinkinggeneralization
Understanding deep learning requires re-thinking generalization Zhang et al., ICLR’17 This paper has a wonderful combination of properties: the results are easy to understand, somewhat surprising, and then leave you pondering over what it all might mean for a long while afterwards!
Generalisation is the difference between just memorising portions of the training data and parroting it back, and actually developing some meaningful intuition about the dataset that can be used to make predictions.
the CIFAR 10 (50,000 training images split across 10 classes, 10,000 validation images) and the ILSVRC (ImageNet) 2012 (1,281,167 training, 50,000 validation images, 1000 classes) datasets and variations of the Inception network architecture.
Here are three key observations from this first experiment: If you take the network trained on random labels, and then see how well it performs on the test data, it of course doesn’t do very well at all because it hasn’t truly learned anything about the dataset.
by randomizing labels alone we can force the generalization error of a model to jump up considerably without changing the model, its size, hyperparameters, or the optimizer.
A hypothesis for why this happens is that the random pixel images are more separated from each other than the random label case of images that originally all belonged to the same class, but now must be learned as differing classes due to label swaps.
This shows that neural networks are able to capture the remaining signal in the data, while at the same time fit the noisy part using brute-force.
So maybe we need a way to tease apart the true potential for generalisation that exists in the dataset, and how efficient a given model architecture is at capturing this latent potential.
We show that explicit forms of regularization, such as weight decay, dropout, and data augmentation, do not adequately explain the generalization error of neural networks: Explicit regularization may improve generalization performance, but is neither necessary nor by itself sufficient for controlling generalization error.
Though this doesn’t explain why certain architectures generalize better than other architectures, it does suggest that more investigation is needed to understand exactly what the properties are that are inherited by models trained using SGD.
There exists a two-layer neural network with ReLU activations and 2n + d weights that can represent any function on a sample of size n in d dimensions.
This situation poses a conceptual challenge to statistical learning theory as traditional measures of model complexity struggle to explain the generalization ability of large artificial neural networks.
- On Tuesday, March 26, 2019
How to Make an Image Classifier - Intro to Deep Learning #6
We're going to make our own Image Classifier for cats & dogs in 40 lines of Python! First we'll go over the history of image classification, then we'll dive into the ...
Lecture 15 | Efficient Methods and Hardware for Deep Learning
In Lecture 15, guest lecturer Song Han discusses algorithms and specialized hardware that can be used to accelerate training and inference of deep learning ...
Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property
Authors: Garrett Goh (Pacific Northwest National Laboratory); Charles Siegel (Pacific Northwest National Laboratory); Abhinav Vishnu (Pacific Northwest ...
How to Make a Text Summarizer - Intro to Deep Learning #10
I'll show you how you can turn an article into a one-sentence summary in Python with the Keras machine learning library. We'll go over word embeddings, ...
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, Lizhen Qu We present a theoretically grounded approach to train deep neural networks ...
How to Learn from Little Data - Intro to Deep Learning #17
One-shot learning! In this last weekly video of the course, i'll explain how memory augmented neural networks can help achieve one-shot classification for a ...
How to Design a Convolutional Neural Network | Lecture 8
Designing a good model usually involves a lot of trial and error. It is still more of an art than science. The tricks and design patterns that I present in this video are ...
Lecture 10 - Neural Networks
Neural Networks - A biologically inspired model. The efficient backpropagation learning algorithm. Hidden layers. Lecture 10 of 18 of Caltech's Machine ...
One-hot Encoding explained
In this video, we discuss what one-hot encoding is, how this encoding is used in machine learning and artificial neural networks, and what is meant by having ...
Cost-Effective Training of Deep CNNs with Active Model Adaptation
Authors: Sheng-Jun Huang (NUAA); Jia-Wei Zhao (NUAA); Zhao-Yang Liu (NUAA) Abstract: Deep convolutional neural networks have achieved great success ...