AI News, BOOK REVIEW: Deconvolution and Checkerboard Artifacts

Deconvolution and Checkerboard Artifacts

When we look very closely at images generated by neural networks, we often see a strange checkerboard pattern of artifacts.

It’s more obvious in some cases than others, but a large fraction of recent models exhibit this behavior.

For excellent discussion of deconvolution, see [5, 6].) Unfortunately, deconvolution can easily have “uneven overlap,”

In particular, deconvolution has uneven overlap when the kernel size (the output window size) is not divisible by the stride (the spacing between points on the top).

For example, in one dimension, a stride 2, size 3 deconvolution has some outputs with twice the number of inputs as others,

(a learned value added to the output) it’s easy to output the average color.

which uses stride 2 size 4 deconvolutions, as an example.) There are probably a lot of factors at play here.

At best, deconvolution is fragile because it very easily represents artifact creating functions, even when the size is carefully chosen.

For example, you might resize the image (using nearest-neighbor interpolation or bilinear interpolation) and then do a convolutional layer.

This seems like a natural approach, and roughly similar methods have worked well in image super-resolution (eg.

Where deconvolution has a unique entry for each output window, resize-convolution is implicitly weight-tying in a way that discourages high frequency artifacts.

It might also point at trickier issues with naively using bilinear interpolation, where it resists high-frequency image features too strongly.

Simply switching out the standard deconvolutional layers for nearest-neighbor resize followed by convolution causes artifacts of different frequencies to disappear.

we can already see the artifacts: This suggests that the artifacts are due to this method of generating images, rather than adversarial training.

(It also suggests that we might be able to learn a lot about good generator design without the slow feedback cycle of training models.) Another reason to believe these artifacts aren’t GAN specific is that we see them in other kinds of models, and have found that they also go away when we switch to resize-convolution upsampling.

For example, consider real-time artistic style transfer [10] where a neural net is trained to directly generate style-transferred images.

We’ve found these to be vulnerable to checkerboard artifacts (especially when the cost doesn’t explicitly resist them).

(We’ve chosen to present this technique separately because we felt it merited more detailed discussion, and because it cut across multiple papers.) Whenever we compute the gradients of a convolutional layer,

Max pooling was previously linked to high-frequency artifacts in [12].) More recent work in feature visualization (eg.

If gradient artifacts can affect an image being optimized based on a neural networks gradients in feature visualization,

It seems possible that having some pixels affect the network output much more than others may exaggerate adversarial counter-examples.

The standard approach of producing images with deconvolution — despite its successes! — has some conceptually simple issues that lead to artifacts in produced images.

It suggests that there is low-hanging fruit to be found in carefully thinking through neural network architectures, even ones where we seem to have clean working solutions.

In the meantime, we’ve provided an easy to use solution that improves the quality of many approaches to generating images with neural networks.

We look forward to seeing what people do with it, and whether it helps in domains like audio, where high frequency artifacts would be particularly problematic.

Deconvolution and Checkerboard Artifacts

When we look very closely at images generated by neural networks, we often see a strange checkerboard pattern of artifacts.

It’s more obvious in some cases than others, but a large fraction of recent models exhibit this behavior.

For excellent discussion of deconvolution, see [5, 6].) Unfortunately, deconvolution can easily have “uneven overlap,”

In particular, deconvolution has uneven overlap when the kernel size (the output window size) is not divisible by the stride (the spacing between points on the top).

For example, in one dimension, a stride 2, size 3 deconvolution has some outputs with twice the number of inputs as others,

(a learned value added to the output) it’s easy to output the average color.

which uses stride 2 size 4 deconvolutions, as an example.) There are probably a lot of factors at play here.

At best, deconvolution is fragile because it very easily represents artifact creating functions, even when the size is carefully chosen.

For example, you might resize the image (using nearest-neighbor interpolation or bilinear interpolation) and then do a convolutional layer.

This seems like a natural approach, and roughly similar methods have worked well in image super-resolution (eg.

Where deconvolution has a unique entry for each output window, resize-convolution is implicitly weight-tying in a way that discourages high frequency artifacts.

It might also point at trickier issues with naively using bilinear interpolation, where it resists high-frequency image features too strongly.

Simply switching out the standard deconvolutional layers for nearest-neighbor resize followed by convolution causes artifacts of different frequencies to disappear.

we can already see the artifacts: This suggests that the artifacts are due to this method of generating images, rather than adversarial training.

(It also suggests that we might be able to learn a lot about good generator design without the slow feedback cycle of training models.) Another reason to believe these artifacts aren’t GAN specific is that we see them in other kinds of models, and have found that they also go away when we switch to resize-convolution upsampling.

For example, consider real-time artistic style transfer [10] where a neural net is trained to directly generate style-transferred images.

We’ve found these to be vulnerable to checkerboard artifacts (especially when the cost doesn’t explicitly resist them).

(We’ve chosen to present this technique separately because we felt it merited more detailed discussion, and because it cut across multiple papers.) Whenever we compute the gradients of a convolutional layer,

Max pooling was previously linked to high-frequency artifacts in [12].) More recent work in feature visualization (eg.

If gradient artifacts can affect an image being optimized based on a neural networks gradients in feature visualization,

It seems possible that having some pixels affect the network output much more than others may exaggerate adversarial counter-examples.

The standard approach of producing images with deconvolution — despite its successes! — has some conceptually simple issues that lead to artifacts in produced images.

It suggests that there is low-hanging fruit to be found in carefully thinking through neural network architectures, even ones where we seem to have clean working solutions.

In the meantime, we’ve provided an easy to use solution that improves the quality of many approaches to generating images with neural networks.

We look forward to seeing what people do with it, and whether it helps in domains like audio, where high frequency artifacts would be particularly problematic.

Image restoration with Convolutional Neural Networks

On this artificial data, the convolutional networks significantly outperform existing blind deconvolution methods, including those optimized for text, in terms of image quality and OCR accuracy.

Authors: Pavel Svoboda, Michal Hradiš, Lukas Maršík, Pavel Zemčík Abstract: In this work we explore the previously proposed approach of direct blind deconvolution and denoising with convolutional neural networks in a situation where the blur kernels are partially constrained.

We focus on blurred images from a real-life traffic surveillance system, on which we, for the first time, demonstrate that neural networks trained on artificial data provide superior reconstruction quality on real images compared to traditional blind deconvolution methods.

The training data is easy to obtain by blurring sharp photos from a target system with a very rough approximation of the expected blur kernels, thereby allowing custom CNNs to be trained for a specific application (image content and blur range).

Authors: Pavel Svoboda, Michal Hradiš, David Bařina, Pavel Zemčík Abstract: This paper shows that it is possible to train large and deep convolutional neural networks (CNN) for JPEG compression artifacts reduction, and that such networks can provide significantly better reconstruction quality compared to previously used smaller networks as well as to any other state-of-the-art methods.

We were able to train networks with 8 layers in a single step and in relatively short time by combining residual learning, skip architecture, and symmetric weight initialization.

We provide further insights into convolution networks for JPEG artifact reduction by evaluating three different objectives, generalization with respect to training dataset size, and generalization with respect to JPEG quality level.

Lecture 11 | Detection and Segmentation

In Lecture 11 we move beyond image classification, and show how convolutional networks can be applied to other core computer vision tasks. We show how ...

High-Accuracy Neural-Network Models for Speech Enhancement

In this talk we will discuss our recent work on AI techniques that improve the quality of audio signals for both machine understanding and sensory perception.

Real-Time Facial Segmentation and Performance Capture from RGB Input (ECCV 2016)

ECCV 2016 Paper Video: We introduce the concept of unconstrained real-time 3D facial performance capture through explicit semantic segmentation in the ...

Spatial filtering approach for removal of blocking artifact in images

Click Below to Get this Project with Synopsis, Report, Video Tutorials & Other details ...

Learning Blind Motion Deblurring (ICCV 2017)

Patrick Wieschollek, Michael Hirsch, Bernhard Schölkopf, Hendrik P.A. Lensch In IEEE International Conference on Computer Vision (ICCV), 2017 Code: ...

Lesson 2: Deep Learning 2018

NB: Please go to to view this video since there is important updated information there. If you have questions, use the forums at ..

Optics Express : Optimization of sampling pattern and the design of Fourier ptychographic...

Optimization of sampling pattern and the design of Fourier ptychographic illuminator. Kaikai Guo et al (2015), Optics Express ...

Towards Machines that Perceive and Communicate

Kevin Murphy (Google Research) Abstract: In this talk, I summarize some recent work in my group related to visual scene understanding and "grounded" ...

Depth Estimation

This video is part of the Udacity course "Computational Photography". Watch the full course at

Progressive Image Denoising Through Hybrid Graph Laplacian Regularization: A Unified Framework

Recovering images from corrupted observations is necessary for many real-world applications. In this paper, we propose a unified framework to perform ...