AI News, The future of Artificial Intelligence – as imagined in1989

The future of Artificial Intelligence – as imagined in1989

While they hit the scene two years ago, Generative Adversarial Networks (GANs) have become the darlings of this year’s NIPS conference.

So far I’ve seen talks demonstrating their utility in everything from generating realistic images, predicting and filling in missing video segments, rooms, maps, and objects of various sorts.

They are even being applied to the world of high energy particle physics, pushing the state of the art of inference within the language of quantum field theory.

The discriminative model takes as input data from both the generative model and real data and tries to correctly distinguish between them.

The result (if everything goes well) is a generative model which, given some random inputs, will output data which appears to be a plausible sample from your dataset (eg cat faces).

Generative Adversarial Networks (GANs) in 50 lines of code (PyTorch)

This powerful technique seems like it must require a metric ton of code just to get started, right?

There are really only 5 components to think about: 1.) R: In our case, we’ll start with the simplest possible R — a bell curve.

This function takes a mean and a standard deviation and returns a function which provides the right shape of sample data from a Gaussian with those parameters.

2.) I: The input into the generator is also random, but to make our job a little bit harder, let’s use a uniform distribution rather than a normal one.

An introduction to Generative Adversarial Networks (with code in TensorFlow)

There has been a large resurgence of interest in generative models recently (see this blog post by OpenAI for example).

The prominent deep learning researcher and director of AI research at Facebook, Yann LeCun, recently cited GANs as being one of the most important new developments in deep learning: “There are many interesting recent development in deep learning…The most important one, in my opinion, is adversarial training (also called GAN for Generative Adversarial Networks).

Before looking at GANs, let’s briefly review the difference between generative and discriminative models: Both types of models are useful, but generative models have one interesting advantage over discriminative models –

This is very desirable when working on data modelling problems in the real world, as unlabelled data is of course abundant, but getting labelled data is often expensive at best and impractical at worst.

These two networks play a continuous game, where the generator is learning to produce more and more realistic samples, and the discriminator is learning to get better and better at distinguishing generated data from real data.

Source: The analogy that is often used here is that the generator is like a forger trying to produce some counterfeit material, and the discriminator is like the police trying to detect the forged items.

This setup may also seem somewhat reminiscent of reinforcement learning, where the generator is receiving a reward signal from the discriminator letting it know whether the generated data is accurate or not.

The key difference with GANs however is that we can backpropagate gradient information from the discriminator back to the generator network, so the generator knows how to adapt its parameters in order to produce output data that can fool the discriminator.

They are now producing excellent results in image generation tasks, generating images that are significantly sharper than those trained using other leading generative methods based on maximum likelihood training objectives.

In this case we found that it was important to make sure that the discriminator is more powerful than the generator, as otherwise it did not have sufficient capacity to learn to be able to distinguish accurately between generated and real samples.

It is not entirely clear how to generalise this to bigger problems however, and even in the simple case, it may be hard to guarantee that our generator distribution will always reach a point where early stopping makes sense.

The problem of the generator collapsing to a parameter setting where it outputs a very narrow distribution of points is “one of the main failure modes” of GANs according to a recent paper by Tim Salimans and collaborators at OpenAI.

In the paper, minibatch discrimination is defined to be any method where the discriminator is able to look at an entire batch of samples in order to decide whether they come from the generator or the real data.

The method can be loosely summarized as follows: In TensorFlow that translates to something like: We implemented the proposed minibatch discrimination technique to see if it would help with the collapse of the generator output distribution in our toy example.

With generative models that are based on maximum likelihood training, we can usually produce some metric based on likelihood (or some lower bound to the likelihood) of unseen test data, but that is not applicable here.

Some GAN papers have produced likelihood estimates based on kernel density estimates from generated samples, but this technique seems to break down in higher dimensional spaces.

How do GANs intuitively work?

And the discriminator: And our whole GAN architecture will be this: “The generator will try to generate fake images that fool the discriminator into thinking that they’re real.

And the discriminator will try to distinguish between a real and a generated image as best as it could when an image is fed.” They both get stronger together until the discriminator cannot distinguish between the real and the generated images anymore.

This inaccuracy of the discriminator occurs because the generator generates really realistic face images that it seems like they are actually real.

From the official GAN paper: In the ideal optimal state, the generator will know how to generate realistic face images and the discriminator will know what faces are composed of.

We train both the generator and the discriminator to make them stronger together and avoid making one network significantly stronger than the other by taking turn.

Suppose that the generator generated one image and the discriminator thought that this image has 0.40 probability of being a real image, how should the generator tweak its generated image to increase that probability to, say 0.41?

The answer is: In order to train the generator, the discriminator have to tell the generator how to tweak that generated image to be more realistic.

An intuitive introduction to Generative Adversarial Networks (GANs)

The discriminator starts by receives a 32x32x3 image tensor.

Each, works by reducing the feature vector’s spatial dimensions by half its size, also doubling the number of learned filters.

In the same way, every time the discriminator notices a difference between the real and fake images, it sends a signal to the generator.

By receiving it, the generator is able to adjust its parameters to get closer to the true data distribution.

The first, composed only with real images that come from the training set and the second, with only fake images — the ones created by the generator.

We want the discriminator to output probabilities close to 1 for real images and near 0 for fake images.

First, the generator does not know how to create images that resembles the ones from the training set.

And second, discriminator does not know how to categorize the images it receives as real or fake.

As training progresses, the generator starts to output images that look closer to the images from the training set.

That happens, because the generator trains to learn the data distribution that composes the training set images.

These models have the potential of unlocking unsupervised learning methods that would expand ML to new horizons.

Also, take a look at: Semi-supervised learning with GANs for an application on semi-supervised learning and my deep learning blog.

An Alternative Update Rule for Generative Adversarial Networks

It is mentioned in the original GAN paper (Goodfellow et al, 2014) that the algorithm can be interpreted as minimising Jensen-Shannon divergence under some ideal conditions.

To whet your appetite for the explanation and lightweight maths to come, here is a pretty animation of a generative model minimising KL divergence from some Swiss-roll toy data (more explanation to follow):

The theory that connects GANs to JS divergence assumes that this optimisation is performed exactly, until the discrimination error reaches convergence, and that $D$ is sufficiently flexible so it can represent the Bayes-optimal classifier.

In the original algorithm the generator directly tries to minimise the classification accuracy of $D$: $$\theta_{t+1} \leftarrow \theta_t - \epsilon_t \frac{\partial}{\partial\theta} \mathbb{E}_{z\sim \mathcal{N}} \log \left(1 - D\left(G(z;\theta_t);\psi_{t+1}\right)\right) $$

The authors noted that the update rule in M-step does not work very well in practice, because the gradients are very close to zero initially, when the discriminator has an easy job separating $P$ and $Q$.

Here, I'm going to talk about a third variant which can combines the alternative M-step above with the original M-step to obtain another meaningful update rule in the following form: $$\theta_{t+1} \leftarrow \theta_t + \epsilon_t \frac{\partial}{\partial\theta} \mathbb{E}_{z\sim \mathcal{N}} \log \frac{D\left(G(z;\theta_t);\psi_{t+1}\right)}{1 - D\left(G(z;\theta_t);\psi_{t+1}\right)} $$

We know that for any generative model $Q$, the theoretical optimal discriminator between $P$ and $Q$ is given by this formula (assuming equal class priors): $$D^{*}(x) = \frac{P(x)}{P(x) + Q(x)} $$ If assume that after the D-step, our discriminator $D$ is close to the Bayes-optimal $D^{*}$, the following approximation holds: $$\operatorname{KL}[Q\|P] = \mathbb{E}_{x\sim Q} \log\frac{Q(x)}{P(x)} $$

There is a whole area of research on direct importance estimation or direct probability-ratio estimation which is looking at ways these probability ratios can be estimated from data samples.

The reason this connection between GANs and direct probability ratio estimation may not be immediately obvious is because these older papers cared primarily about the problem of domain adaptation, not so much about generative modelling.

However, we also know convnets are not-so-good with out-of-sample predictions, and they also tend to be pretty bad at representing the gradients of decision functions as evidenced by adversarial examples (Goodfellow et al.

If $Q$ and $P$ are concentrated on different manifolds, the classification problem between them becomes trivial, and there are infinitely many classifiers achieving optimal Bayes risk, and the Jensen-Shannon divergence is always exactly 1 bit irrespective of parameters of $G$.

The GAN idea still makes sense at a high level, partly because we limit the classifiers to functions that can be modelled by ConvNets, but it's probably important to keep this limitation of the theoretical analysis in mind.

Generative Adversarial Nets - Fresh Machine Learning #2

This episode of Fresh Machine Learning is all about a relatively new concept called a Generative Adversarial Network. A model continuously tries to fool another ...

Lecture 13 | Generative Models

In Lecture 13 we move beyond supervised learning, and discuss generative modeling as a form of unsupervised learning. We cover the autoregressive ...

DALI 2017 - Workshop - Theory of Generative Adversarial Networks - Introduction

DALI 2017 Workshop on Theory of Generative Adversarial Networks Organizers: Sebastian ..

Latent Space Human Face Synthesis | Two Minute Papers #191

The paper "Optimizing the Latent Space of Generative Networks" is available here: Khan Academy's video on the Nash ..

How to Generate Video - Intro to Deep Learning #15

Generative Adversarial Networks. It's time. We're going to use a Deep Convolutional GAN to generate images of the alien language from the movie arrival that ...

Gradient Descent GAN Optimization is Locally Stable

Vaishnavh Nagarajan, J. Zico Kolter NIPS 2017 Abstract ============== Despite the growing prominence of generative ..

The Future of Deep Learning Research

Back-propagation is fundamental to deep learning. Hinton (the inventor) recently said we should "throw it all away and start over". What should we do?

CppCon 2017: Peter Goldsborough “A Tour of Deep Learning With C++”

— Presentation Slides, PDFs, Source Code and other presenter materials are available at: — Deep .

Dr. Yann LeCun, "How Could Machines Learn as Efficiently as Animals and Humans?"

Brown Statistics, NESS Seminar and Charles K. Colver Lectureship Series Deep learning has caused revolutions in computer perception and natural language ...