AI News, Attacking Machine Learning with Adversarial Examples

Attacking Machine Learning with Adversarial Examples

Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake;

At OpenAI, we think adversarial examples are a good aspect of security to work on because they represent a concrete problem in AI safety that can be addressed in the short term, and because fixing them is difficult enough that it requires a serious research effort.

(Though we'll need to explore many aspects of machine learning security to achieve our goal of building safe, widely distributed AI.) To get an idea of what adversarial examples look like, consider this demonstration from Explaining and Harnessing Adversarial Examples: starting with an image of a panda, the attacker adds a small perturbation that has been calculated to make the image be recognized as a gibbon with high confidence.

For example, attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a 'yield' or other sign, as discussed in Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples.

When we think about the study of AI safety, we usually think about some of the most difficult problems in that field — how can we ensure that sophisticated reinforcement learning agents that are significantly more intelligent than human beings behave in ways that their designers intended?

This creates a model whose surface is smoothed in the directions an adversary will typically try to exploit, making it difficult for them to discover adversarial input tweaks that lead to incorrect categorization.

(Distillation was originally introduced in Distilling the Knowledge in a Neural Network as a technique for model compression, where a small model is trained to imitate a large one, in order to obtain computational savings.) Yet even these specialized algorithms can easily be broken by giving more computational firepower to the attacker.

In other words, they look at a picture of an airplane, they test which direction in picture space makes the probability of the “cat” class increase, and then they give a little push (in other words, they perturb the input) in that direction.

If the model’s output is “99.9% airplane, 0.1% cat”, then a little tiny change to the input gives a little tiny change to the output, and the gradient tells us which changes will increase the probability of the “cat” class.

Let’s run a thought experiment to see how well we could defend our model against adversarial examples by running it in “most likely class” mode instead of “probability mode.” The attacker no longer knows where to go to find inputs that will be classified as cats, so we might have some defense.

The defense strategies that perform gradient masking typically result in a model that is very smooth in specific directions and neighborhoods of training points, which makes it harder for the adversary to find gradients indicating good candidate directions to perturb the input in a damaging way for the model.

Neither algorithm was explicitly designed to perform gradient masking, but gradient masking is apparently a defense that machine learning algorithms can invent relatively easily when they are trained to defend themselves and not given specific instructions about how to do so.

Adversarial examples in deep learning

The gradient descent consists in the following steps: first pick an initial value for x, then compute the derivative f’ of f according to x and evaluate it for our initial guess.

As the tangent is a good approximation of the curve in a tiny neighborhood, the value change applied to x is very small to ensure that we do not jump too far.

Our machine learning model will be the line y = ax + b and the model parameters will be a and b.

The loss is the squared difference of the real value y and ax + b, the prediction of our model.

And as before, we can evaluate this derivatives with our current values of a and b for each data point (x, y), which will give us the slopes of the tangents to the loss function and use these slopes to update a and b in order to minimize L.

We could just replace the x by random values and the loss value would increase by a tremendous amount but that’s not really subtle, in particular, it would be really obvious to a human plotting the data points.

Lecture 16 | Adversarial Examples and Adversarial Training

In Lecture 16, guest lecturer Ian Goodfellow discusses adversarial examples in deep learning. We discuss why deep networks and other machine learning ...

TensorFlow Tutorial #11 Adversarial Examples

How to fool a neural network into mis-classifying images by adding a little 'specialized' noise. Demonstrated on the Inception model.

'How neural networks learn' - Part II: Adversarial Examples

In this episode we dive into the world of adversarial examples: images specifically engineered to fool neural networks into making completely wrong decisions!

Adversarial attacks and defenses - NIPS 2017

Workshop posters: - - .

Generative Adversarial Networks (LIVE)

We're going to build a GAN to generate some images using Tensorflow. This will help you grasp the architecture and intuition behind adversarial approaches to ...

Adversarial Examples for Generative Models

Adversarial Examples for Generative Models Dawn Song Presented at the 1st Deep Learning and Security Workshop May 24, 2018 at the 2018 IEEE ...

Learn to Draw Samples: Application to Amortized MLE for Generative Adversarial Training, NIPS 2016

NIPS 2016 Workshop on Bayesian Deep learning Dilin Wang, Yihao Feng and Qiang Liu We propose a simple algorithm to ..

A Tutorial on Attacking DNNs using Adversarial Examples.

Created a tutorial on fooling/attacking deep neural networks using Adversarial Examples. Adversarial examples are created by adding perturbations to the data ...

CVPR18: Tutorial: Part 3: Generative Adversarial Networks

Organizers: Jun-Yan Zhu Taesung Park Mihaela Rosca Phillip Isola Ian Goodfellow. Description: Generative adversarial networks (GANs) have been at the ...

Adversarial Learning for Neural Dialogue Generation: Paper Presentation for Class

Class presentation of the Training section of Li, Jiwei, Will Monroe, Tianlin Shi, Alan Ritter, and Dan Jurafsky. "Adversarial learning for neural dialogue ...