AI News, Variational Inference: Bayesian Neural Networks¶

Variational Inference: Bayesian Neural Networks¶

Programming at scale¶Probabilistic Programming allows very flexible creation of custom probabilistic models and is mainly concerned with insight and learning from your data.

The approach is inherently Bayesian so we can specify priors to inform and constrain our models and get uncertainty estimation in form of a posterior distribution.

Using MCMC sampling algorithms we can draw samples from this posterior to very flexibly estimate these models.

Instead of drawing samples from the posterior, these algorithms instead fit a distribution (e.g.

when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like ensemble learning (e.g.

Learning¶Now in its third renaissance, deep learning has been making headlines repeatadly by dominating almost any object recognition benchmark, kicking ass at Atari games, and beating the world-champion Lee Sedol at Go.

From a statistical point, Neural Networks are extremely good non-linear function approximators and representation learners.

While mostly known for classification, they have been extended to unsupervised learning with AutoEncoders and in all sorts of other interesting ways (e.g.

large part of the innoviation in deep learning is the ability to train these extremely complex models.

algorithms: training on sub-sets of the data -- stochastic gradient descent -- allows us to train these models on massive amounts of data.

A lot of innovation comes from changing the input layers, like for convolutional neural nets, or the output layers, like for MDNs. Bridging

Deep Learning and Probabilistic Programming¶On one hand we have Probabilistic Programming which allows us to build rather small and focused models in a very principled and well-understood way to gain insight into our data;

on the other hand we have deep learning which uses many heuristics to train huge and highly complex models that are amazing at prediction.

Recent innovations in variational inference allow probabilistic programming to scale model complexity as well as data size.

this would allow Probabilistic Programming to be applied to a much wider set of interesting problems, I believe this bridging also holds great promise for innovations in Deep Learning.

learning with informed priors: If we wanted to train a network on a new object recognition data set, we could bootstrap the learning by placing informed priors centered around weights retrieved from other pre-trained networks, like GoogLeNet.

Hierarchical Neural Networks: A very powerful approach in Probabilistic Programming is hierarchical modeling that allows pooling of things that were learned on sub-groups to the overall population (see my tutorial on Hierarchical Linear Regression in PyMC3).

Applied to Neural Networks, in hierarchical data sets, we could train individual neural nets to specialize on sub-groups while still being informed about representations of the overall population.

For example, imagine a network trained to classify car models from pictures of cars.

We could train a hierarchical neural network where a sub-neural network is trained to tell apart models from only a single manufacturer.

The intuition being that all cars from a certain manufactures share certain similarities so it would make sense to train individual networks that specialize on brands.

However, due to the individual networks being connected at a higher layer, they would still share information with the other specialized sub-networks about features that are useful to all brands.

early layers that extract visual lines could be identical in all sub-networks while the higher-order representations would be different.

For example, Bayesian non-parametrics could be used to flexibly adjust the size and shape of the hidden layers to optimally scale the network architecture to the problem at hand during training.

Inference: Scaling model complexity¶We could now just run a MCMC sampler like NUTS which works pretty well in this case, but as I already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers. Instead,

we will use ADVI variational inference algorithm which was recently added to PyMC3, and updated to use the operator variational inference (OPVI) framework.

samples are more convenient to work with, we can very quickly draw samples from the variational approximation using the sample method (this is just sampling from Normal distributions, so not at all the same like MCMC): In[39]:

plt.plot(-inference.hist)plt.ylabel('ELBO')plt.xlabel('iteration');Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). In[41]:

look at what the classifier has learned¶For this, we evaluate the class probability predictions on a grid over the whole input space. In[44]:

The mean of the posterior predictive for each class-label should be identical to maximum likelihood predicted values.

You can imagine that associating predictions with uncertainty is a critical property for many applications like health care.

To further maximize accuracy, we might want to train the model primarily on samples from that high-uncertainty region. Mini-batch

Moreover, training on mini-batches of data (stochastic gradient descent) avoids local minima and can lead to faster convergence. Fortunately,

I also think bridging the gap between Probabilistic Programming and Deep Learning can open up many new avenues for innovation in this space, as discussed above.

steps¶Theano, which is used by PyMC3 as its computational backend, was mainly developed for estimating neural networks and there are great libraries like Lasagne that build on top of Theano to make construction of the most common neural network architectures easy.

might argue that the above network isn't really deep, but note that we could easily extend it to have more layers, including convolutional ones to train on more challenging data sets, as demonstrated here. I

Neural Network Model - Deep Learning with Neural Networks and TensorFlow

Welcome to part three of Deep Learning with Neural Networks and TensorFlow, and part 45 of the Machine Learning tutorial series. In this tutorial, we're going to ...

Lecture 6 | Training Neural Networks I

In Lecture 6 we discuss many practical issues for training modern neural networks. We discuss different activation functions, the importance of data ...

Training/Testing on our Data - Deep Learning with Neural Networks and TensorFlow part 7

Welcome to part seven of the Deep Learning with Neural Networks and TensorFlow tutorials. We've been working on attempting to apply our recently-learned ...

Lecture 8: Recurrent Neural Networks and Language Models

Lecture 8 covers traditional language models, RNNs, and RNN language models. Also reviewed are important training problems and tricks, RNNs for other ...

Training Model - Training a neural network to play a game with TensorFlow and Open AI p.3

In this tutorial, we train our neural network model using TensorFlow with TFLearn, with the hopes that our model will learn how to play the CartPole game from ...

Lecture 16: Dynamic Neural Networks for Question Answering

Lecture 16 addresses the question ""Can all NLP tasks be seen as question answering problems?"". Key phrases: Coreference Resolution, Dynamic Memory ...

Train an artificial neural network with Keras

In this video, we demonstrate how to compile and train a Sequential model with Keras. Follow deeplizard on Twitter: Follow ..

On the difficulty of training recurrent and deep neural networks

Deep learning is quickly becoming a popular subject in machine learning. A lot of this success is due to the advances done in how these models are trained.

Lecture 13 | Generative Models

In Lecture 13 we move beyond supervised learning, and discuss generative modeling as a form of unsupervised learning. We cover the autoregressive ...

Intro - Training a neural network to play a game with TensorFlow and Open AI

This tutorial mini series is focused on training a neural network to play the Open AI environment called CartPole. The idea of CartPole is that there is a pole ...