AI News, Bias Correction of Learned Generative Models using Likelihood ... artificial intelligence

Microsoft Research Blog

Even state-of-the-art models have noticeable deficiencies in some of the generated samples: image models of faces have artifacts in the hair textures and makeup, text models often require repeated attempts at generating coherent completions of sentences or paragraphs, and other deficiencies.

To do this, we consider any non-negative weighting function \(w_\phi\) and combine it with our base model to induce an energy-based model with density: \( p_{\theta,\phi}\left(x\right)\propto\ p_\theta\left(x\right)w_\phi(x)\) The above model is an instantiation of a product-of-experts (PoE) model as it boosts a base (normalized) model \(p_\theta\) multiplicatively using a weighting function \(w_ϕ\).

If the weighting function corresponds to the ratio of data density to the model density (that is, \( w_\phi\left(x\right)\) = \( p_data (x)\)/ \(p_\theta\left(x\right) \) for all x), then the energy-based model recovers the data distribution (that is, \(p_{\theta,\phi}\left(x\right)\) =\(p_data(x)\) ).

In order to compute the density ratio, the data density (the numerator) is unavailable and model density (the denominator) is often intractable in practice in the case of variational autoencoders, generative adversarial networks, and many other generative models.

model-based off-policy policy evaluation on MuJoCo environments: weighting the contributions of simulated trajectories under the dynamics model (learned using off-policy data) leads to better estimates of the policy of interest.

While the proposed technique can correct for the model bias, the datasets used for training could also be biased (as is the case when the training dataset is scraped from Internet sites, such as Reddit), and our follow-up work uses similar techniques to mitigate dataset bias for achieving fairness in generative modeling.

How We Can Learn from The Brain to Learn How the Brain Learns

And so the data scientist’s quest is as old as life earth itself: all life forms have been extracting knowledge to serve their goal of survival in one way or another from the stream of data coming in from the environment.

The quest of science is likewise to extract knowledge from the world, albeit in a more painstakingly deliberate process., which roughly splits into two elements: building models of the world and comparing those models to data.

In the first stage, which can be called the encoder stage, we extract some kind of representation/model from data, which we hope in some way reflects some real (causal/probabilistic, etc.) structure behind the data.

They make predictions about incredibly complex processes, such as “What is this person that I met a couple of minutes ago going to do next?”, which implies that our brains build a model of every person we meet, integrate this model into pre-existing models of what a person is, further integrate several data modalities (how does this person look/speak/smell/move?) to fine-tune the model, and then use this new approximate model of the person to make predictions in real-time about his or her behavior or to classify them quickly as friend or foe.

But as the current standards in the fields of machine learning and artificial intelligence show, doing these kinds of things with a computer takes longer, spends much more resources and needs more data.

In a gloriously unvicious circle, we can look at the subject of our study itself for guidance: we can learn from the brain how the brain learns, finding inspiration on how to improve and structure our algorithms that in turn help us analyze and model brain data and behavioral data better (I will give a more concrete example of this soon).

They can capture profound and non-trivial structures and probability distributions behind data and can be used to efficiently generate predictions (for a more technical introduction and examples of hierarchical models, read the intro from Penny and Henson here).

As part of Karl Friston’s famous and controversial Bayesian Brain Hypothesis, he and his colleagues propose that hierarchical models could be implemented in the human cortex, and we might actually observe in fMRI data how they get updated in real-time behavioral experiments with human beings.

Different layers (as we will see) of these probabilistic models of the world could be distributed across different brain areas and in the different layers of the prefrontal cortex, so our models of the world would likewise be physically spread out over the brain.

The Gaussian Filter model is not only proposed to be implemented in some way in the brain, but can be “turned around” to analyze data from behavioral experiments, modeling how real brains learn in real life.

After a prediction is made, its layers are updated by minimizing a variational free energy (a lower-bound on the surprise, in the language of the Bayesian Brain Hypothesis) through propagating prediction errors, weighted by the inverse precision, upwards through the layers of the model.

If the prediction was bad, but the precision of the guess was really small, to begin with (which means the model was highly uncertain about its prediction on this level), the model also doesn’t get adjusted too strongly because it already assumed its prediction would be uncertain and probably off.

Al claim to have found evidence from neuroimaging studies that it could be doing just that, observing prediction errors being propagated into different brain regions/hierarchies, depending on the size of the prediction errors, and in turn linking them to different neurotransmitters, such as dopamine, involved in reward prediction.

Friston further proposes, based on ideas from Mumford, that this model can be linked to neuroanatomy: predictions might be passed by deep pyramidal cells, while prediction errors are encoded, for example., by superficial pyramidal cells.

But in a more general sense, hierarchical models are useful tools in many deep learning and data science applications, because of their power in structuring inference networks (for example for amortized inference of dynamical systems, in natural language processing or for learning better approximate posteriors in variational autoencoders) and can help make them more interpretable.

As I also explained in my article on Recurrent Neural Networks, building more structured models of the brain is a crucial step in understanding how the brain organizes itself, which can help us make more sense of data from brain measurements like fMRI, EEG, etc..

Eric Horvitz

0

followers

Technical Fellow and Director, Microsoft Research Labs at Microsoft, Technical Fellow and Director, Microsoft Research at Microsoft from 2015-2017, Distinguished Scientist and Director, Microsoft Research at Microsoft from 2013-2015, Distinguished Scientist and Deputy Managing Director, Microsoft Research at Microsoft from 2010-2013, Distinguished Scientist at Microsoft Research 2010

Naïve Bayes Classifier - Fun and Easy Machine Learning

The theory behind the Naive Bayes Classifier with fun examples and practical uses of it. Watch this video to learn more about it and how to apply it. ▻FREE ...

Calibrating Generative Models: The Probabilistic Chomsky-Schützenberger Hierarchy

Thomas Icard, Stanford Abstract: How might we assess the expressive capacity of different classes of probabilistic generative models? The subject of this talk is ...

argmax talks: Sepp Hochreiter

The argmax talks is a series of public scientific lectures by renown researchers in the fields of machine learning, artificial intelligence, and robotics, taking place ...

Neural network parameter variance over time

Each pixel/block is an individual parameter in the network. The visualization is showing a rolling variance of that parameter of the 20 most recent time steps.

Sampling of Attributed Networks From Hierarchical Generative Models

Author: Pablo Robles Granda, Department of Computer Science, Purdue University Abstract: Network sampling is a widely used procedure in social net-work ...

Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 14 – Transformers and Self-Attention

Professor Christopher Manning, Stanford University, Ashish Vaswani & Anna Huang, Google Professor Christopher Manning ..

Probabilistic Machine Learning and AI

How can a machine learn from experience? Probabilistic modelling provides a mathematical framework for understanding what learning is, and has therefore ...

Week 1 CS294-158 Deep Unsupervised Learning (1/30/19)

UC Berkeley CS294-158 Deep Unsupervised Learning Week 1 Instructors: Pieter Abbeel, Peter Chen, Jonathan Ho, Aravind Srinivas ...

Lecture 16 | Adversarial Examples and Adversarial Training

In Lecture 16, guest lecturer Ian Goodfellow discusses adversarial examples in deep learning. We discuss why deep networks and other machine learning ...