AI News, Satellite Image Segmentation: a Workflow with U-Net

Satellite Image Segmentation: a Workflow with U-Net

Image Segmentation is a topic of machine learning where one needs to not only categorize what’s seen in an image, but to also do it on a per-pixel level.

I would have ended up in approximately the top 3.5% of the participants (10th position on the private leaderboard) if the competition was still open, but it is hard to compare since the competition is already finished, and that a few solutions were shared.

The thing here is that despite competition winners shared some code to reproduce exactly their winning submission (which was released after I started working on my pipeline), this does not include a lot of the required things to be able to come up with that code in the first place if one want to apply the pipeline to another problem or dataset.

Moreover, building neural networks is an iterative process where one needs to start somewhere and slowly increase a metric / evaluation score by modifying the neural network architecture and hyperparameters (configuration).

If red, green and blue represents 3 bands (RGB), the 20 bands contain a lot more information for the neural network to ponder upon, and hence ease learning and the quality of the predictions: it is aware of visual features that humans do not see naturally.

Moreover, the DSTL’s data is highly imbalanced: crops covers nearly half of the surface while most of the classes to predict covers less than 5% of the surface, such as buildings, structures, roads, tracks, and a few other classes.

First, the jaccard index is a good metric in situations of training data availability imbalance (e.g.: not enough cars in training data) to avoid excess of false positives and false negatives.

U-Net is like a convolutional autoencoder, But it also has skip-like connections with the feature maps located before the bottleneck (compressed embedding) layer, in such a way that in the decoder part some information comes from previous layers, bypassing the compressive bottleneck.

That way, the neural networks still learns to generalize in the compressed latent representation (located at the bottom of the “U” shape in the figure), but also recovers its latent generalizations to a spatial representation with the proper per-pixel semantic alignment in the right part of the U of the U-Net.

The 3rd place winners used the 4 possible 90 degrees rotations, as well as using a mirrored version of those rotation, which can increase the training data 8-fold: this data transformation belongs to the D4 Dihedral group.

I started working on the project before the winners’ official code was available publicly, so I made my own development and production pipeline going in that direction, which reveals useful not only for solving the problem, but to code a neural network that can eventually be transferred on other datasets.

For example, the 1st place winner used a very complicated set of custom neural networks for winning the competition by a great margin (1st on private leaderboard AND on public leaderboard).

The architecture I managed to develop was first derived from public open-source code pre-release before the end of the competition: https://www.kaggle.com/ceperaang/lb-0-42-ultimate-full-solution-run-on-your-hw It seems that a lot of participants developed their architectures on top of precisely this shared code, which has both been very helpful and acted as a bar raiser for participants of the competition to keep up in the leaderboard.

Hyperopt is used here to automatically figure out the best neural network architecture to best fit the data of the given problem, so this humongous neural network has been grown automatically.

In other words, to use Hyperopt, I first needed to define an hyperparameter space, such as the range for which the learning rate can vary, the number of layers, the number of neurons in height, width and depth in the layers, how are the layers stacked, and on.

Then running hyperopt takes time, but it proceeds to do what could be compared to using genetic algorithms to perform breeding and natural selection, except that there is no breeding here: just a readjustment from the past trials to try new trials in a way that balances exploration of new architectures versus optimization of the architecture near local maximas of performance.

A team workflow can be interesting where beginners learn to manipulate the data and to launch the optimisation to slowly start modifying the hyperparameter space and eventually add new parameters based on new research.

StarFork Running the improved 3rd place winners’ code, it is possible to get a score worth of the 2nd position because they coded their neural networks from scratch after the competition to make the code public and more useable.

I used the 3rd place winners’ post processing code, which rounds the prediction masks in a smoother way and which corrects a few bugs such as removing building predictions while water is also predicted for a given pixel, and on, but the neural network behind that is still quite custom.

On my side, I have used one neural architecture optimized with hyperopt on all classes at once, to then take this already-evolved architecture to train it on single classes at a time, so there is an imbalance in the bias/variance of the neural networks, each used on a single task rather than on all tasks at once as when meta-optimized.

Also, I realized afterwards that I forgot a crucial thing in my hyperoptimization sweeps: I forgot to add an hyperparameter for the initialization of convolutional weights: such as enabling the use of He’s technique (named “he_uniform” in Keras) which seems to perform best on a variety of U-Net CNNs.

The One Hundred Layers Tiramisu might not help for fitting on the DSTL’s dataset according to the 3rd place winner, but at least it has a lot of capacity potential in being applied to larger and richer datasets due to the use of the recently discovered densely connected convolutional blocks.

How to Make an Image Classifier - Intro to Deep Learning #6

We're going to make our own Image Classifier for cats & dogs in 40 lines of Python! First we'll go over the history of image classification, then we'll dive into the ...

Convolutional Neural Network (CNN) | Convolutional Neural Networks With TensorFlow | Edureka

TensorFlow Training - ) This Edureka "Convolutional Neural Network Tutorial" video (Blog: ..

Backpropagation Neural Network - How it Works e.g. Counting

Here's a small backpropagation neural network that counts and an example and an explanation for how it works, how it learns. A neural network is a tool in ...

But what *is* a Neural Network? | Chapter 1, deep learning

Subscribe to stay notified about new videos: Support more videos like this on Patreon: Special .

5. Learning Binary Codes for Image Retrieval

Video from Coursera - University of Toronto - Course: Neural Networks for Machine Learning:

Neural Network train and form recognition

This video disscusses the application of Neural Network in a simple form recognition task using binary images. The different steps from dataset peparation, ...

Neural Net in C++ Tutorial on Vimeo

TensorFlow Tutorial #07 Inception Model

How to use the pre-trained Inception Model to classify images.

NVIDIA Deep Learning Course: Class #3 - Getting started with Caffe

Register for the full course and find the Q&A log at Caffe is a Deep Learning framework developed by the ..

Lecture 9 | CNN Architectures

In Lecture 9 we discuss some common architectures for convolutional neural networks. We discuss architectures which performed well in the ImageNet ...