AI News, Deconvolution and Checkerboard Artifacts

Deconvolution and Checkerboard Artifacts

When we look very closely at images generated by neural networks, we often see a strange checkerboard pattern of artifacts.

It’s more obvious in some cases than others, but a large fraction of recent models exhibit this behavior.

For excellent discussion of deconvolution, see [5, 6].) Unfortunately, deconvolution can easily have “uneven overlap,”

In particular, deconvolution has uneven overlap when the kernel size (the output window size) is not divisible by the stride (the spacing between points on the top).

For example, in one dimension, a stride 2, size 3 deconvolution has some outputs with twice the number of inputs as others,

(a learned value added to the output) it’s easy to output the average color.

which uses stride 2 size 4 deconvolutions, as an example.) There are probably a lot of factors at play here.

At best, deconvolution is fragile because it very easily represents artifact creating functions, even when the size is carefully chosen.

For example, you might resize the image (using nearest-neighbor interpolation or bilinear interpolation) and then do a convolutional layer.

This seems like a natural approach, and roughly similar methods have worked well in image super-resolution (eg.

Where deconvolution has a unique entry for each output window, resize-convolution is implicitly weight-tying in a way that discourages high frequency artifacts.

It might also point at trickier issues with naively using bilinear interpolation, where it resists high-frequency image features too strongly.

Simply switching out the standard deconvolutional layers for nearest-neighbor resize followed by convolution causes artifacts of different frequencies to disappear.

we can already see the artifacts: This suggests that the artifacts are due to this method of generating images, rather than adversarial training.

(It also suggests that we might be able to learn a lot about good generator design without the slow feedback cycle of training models.) Another reason to believe these artifacts aren’t GAN specific is that we see them in other kinds of models, and have found that they also go away when we switch to resize-convolution upsampling.

For example, consider real-time artistic style transfer [10] where a neural net is trained to directly generate style-transferred images.

We’ve found these to be vulnerable to checkerboard artifacts (especially when the cost doesn’t explicitly resist them).

(We’ve chosen to present this technique separately because we felt it merited more detailed discussion, and because it cut across multiple papers.) Whenever we compute the gradients of a convolutional layer,

Max pooling was previously linked to high-frequency artifacts in [12].) More recent work in feature visualization (eg.

If gradient artifacts can affect an image being optimized based on a neural networks gradients in feature visualization,

It seems possible that having some pixels affect the network output much more than others may exaggerate adversarial counter-examples.

The standard approach of producing images with deconvolution — despite its successes! — has some conceptually simple issues that lead to artifacts in produced images.

It suggests that there is low-hanging fruit to be found in carefully thinking through neural network architectures, even ones where we seem to have clean working solutions.

In the meantime, we’ve provided an easy to use solution that improves the quality of many approaches to generating images with neural networks.

We look forward to seeing what people do with it, and whether it helps in domains like audio, where high frequency artifacts would be particularly problematic.

Deconvolution and Checkerboard Artifacts

When we look very closely at images generated by neural networks, we often see a strange checkerboard pattern of artifacts.

It’s more obvious in some cases than others, but a large fraction of recent models exhibit this behavior.

For excellent discussion of deconvolution, see [5, 6].) Unfortunately, deconvolution can easily have “uneven overlap,”

In particular, deconvolution has uneven overlap when the kernel size (the output window size) is not divisible by the stride (the spacing between points on the top).

For example, in one dimension, a stride 2, size 3 deconvolution has some outputs with twice the number of inputs as others,

(a learned value added to the output) it’s easy to output the average color.

which uses stride 2 size 4 deconvolutions, as an example.) There are probably a lot of factors at play here.

At best, deconvolution is fragile because it very easily represents artifact creating functions, even when the size is carefully chosen.

For example, you might resize the image (using nearest-neighbor interpolation or bilinear interpolation) and then do a convolutional layer.

This seems like a natural approach, and roughly similar methods have worked well in image super-resolution (eg.

Where deconvolution has a unique entry for each output window, resize-convolution is implicitly weight-tying in a way that discourages high frequency artifacts.

It might also point at trickier issues with naively using bilinear interpolation, where it resists high-frequency image features too strongly.

Simply switching out the standard deconvolutional layers for nearest-neighbor resize followed by convolution causes artifacts of different frequencies to disappear.

we can already see the artifacts: This suggests that the artifacts are due to this method of generating images, rather than adversarial training.

(It also suggests that we might be able to learn a lot about good generator design without the slow feedback cycle of training models.) Another reason to believe these artifacts aren’t GAN specific is that we see them in other kinds of models, and have found that they also go away when we switch to resize-convolution upsampling.

For example, consider real-time artistic style transfer [10] where a neural net is trained to directly generate style-transferred images.

We’ve found these to be vulnerable to checkerboard artifacts (especially when the cost doesn’t explicitly resist them).

(We’ve chosen to present this technique separately because we felt it merited more detailed discussion, and because it cut across multiple papers.) Whenever we compute the gradients of a convolutional layer,

Max pooling was previously linked to high-frequency artifacts in [12].) More recent work in feature visualization (eg.

If gradient artifacts can affect an image being optimized based on a neural networks gradients in feature visualization,

It seems possible that having some pixels affect the network output much more than others may exaggerate adversarial counter-examples.

The standard approach of producing images with deconvolution — despite its successes! — has some conceptually simple issues that lead to artifacts in produced images.

It suggests that there is low-hanging fruit to be found in carefully thinking through neural network architectures, even ones where we seem to have clean working solutions.

In the meantime, we’ve provided an easy to use solution that improves the quality of many approaches to generating images with neural networks.

We look forward to seeing what people do with it, and whether it helps in domains like audio, where high frequency artifacts would be particularly problematic.

Lecture 11 | Detection and Segmentation

In Lecture 11 we move beyond image classification, and show how convolutional networks can be applied to other core computer vision tasks. We show how ...

High-Accuracy Neural-Network Models for Speech Enhancement

In this talk we will discuss our recent work on AI techniques that improve the quality of audio signals for both machine understanding and sensory perception.

Lesson 2: Deep Learning 2018

NB: Please go to to view this video since there is important updated information there. If you have questions, use the forums at ..

Real-Time Facial Segmentation and Performance Capture from RGB Input (ECCV 2016)

ECCV 2016 Paper Video: We introduce the concept of unconstrained real-time 3D facial performance capture through explicit semantic segmentation in the ...

Learning Blind Motion Deblurring (ICCV 2017)

Patrick Wieschollek, Michael Hirsch, Bernhard Schölkopf, Hendrik P.A. Lensch In IEEE International Conference on Computer Vision (ICCV), 2017 Code: ...

Independent component analysis

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is ...

Towards Machines that Perceive and Communicate

Kevin Murphy (Google Research) Abstract: In this talk, I summarize some recent work in my group related to visual scene understanding and "grounded" ...

Learning From and Dealing With Real, Rare World Data In Computer Vision

Computer Vision has achieved tremendous progress in recent years. Primarily because of the availability of massive datasets (e.g., ImageNet or the Yahoo ...

Depth Estimation

This video is part of the Udacity course "Computational Photography". Watch the full course at

Multi Frame Example Based Super Resolution Using Locally Directional Self Similarity

This project presents a multi-frame super-resolution approach to reconstruct a high-resolution image from several low-resolution video frames. The proposed ...