AI News, What a Deep Neural Network thinks about your #selfie

What a Deep Neural Network thinks about your #selfie

Convolutional Neural Networks are great: they recognize things, places and people in your personal photos, signs, people and lights in self-driving cars, crops, forests and traffic in aerial imagery, various anomalies in medical images and all kinds of other useful things.

In this fun experiment we’re going to do just that: We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones.

That turned out to be only true until about 2012, when we finally had enough compute (in form of GPUs specifically, thanks NVIDIA) and enough data (thanks ImageNet) to actually scale these models, as was first demonstrated when Alex Krizhevsky, Ilya Sutskever and Geoff Hinton won the 2012 ImageNet challenge (think: The World Cup of Computer Vision), crushing their competition (16.4% error vs.

happened to witness this critical juncture in time first hand because the ImageNet challenge was over the last few years organized by Fei-Fei Li’s lab (my lab), so I remember when my labmate gasped in disbelief as she noticed the (very strong) ConvNet submission come up in the submission logs.

The gif below illustrates the full computational process of a small ConvNet: On the left we feed in the raw image pixels, which we represent as a 3-dimensional grid of numbers.

If we suppose that we had 10 filters, in this way we would transform the original (256,256,3) image to a (256,256,10) “image”, where we’ve thrown away the original image information and only keep the 10 responses of our filters at every position in the image.

It’s as if the three color channels (red, green, blue) were now replaced with 10 filter response channels (I’m showing these along the first column immediately on the right of the image in the gif above).

The next columns will correspond to yet another set of filters being applied to the previous column’s responses, gradually detecting more and more complex visual patterns until the last set of filters is computing the probability of entire visual classes (e.g.

Then we can tell it that it’s in fact a toad, and there is a mathematical process for changing all filters in the ConvNet a tiny amount so as to make it slightly more likely to say toad the next time it sees that same image.

It includes a fun live demo of a ConvNet running in real time on your computer’s camera, as explained nicely by Jason in this video: In summary, the whole training process resembles showing a child many images of things, and him/her having to gradually figure out what to look for in the images to tell those things apart.

Or if you prefer your explanations technical, then ConvNet is just expressing a function from image pixels to class probabilities with the filters as parameters, and we run stochastic gradient descent to optimize a classification loss function.

Okay, so we collected 2 million selfies, decided which ones are probably good or bad based on the number of likes they received (controlling for the number of followers), fed all of it to Caffe and trained a ConvNet.

I am showing the images in a much smaller and less identifiable format because my intention is for us to learn about the broad patterns that decrease the selfie’s quality, not to shine light on people who happened to take a bad selfie.

As a last fun experiment, I tried to run the ConvNet on a few famous celebrity selfies, and sorted the results with the continuum visualization, where the best selfies are on the top and the ConvNet score decreases to the right and then towards the bottom: Amusingly, note that the general rule of thumb we observed before (no group photos) is broken with the famous group selfie of Ellen DeGeneres and others from the Oscars, yet the ConvNet thinks this is actually a very good selfie, placing it on the 2nd row!

:) Another one of our rules of thumb (no males) is confidently defied by Chris Pratt’s body (also 2nd row), and honorable mentions go to Justin Beiber’s raised eyebrows and Stephen Collbert / Jimmy Fallon duo (3rd row).

Here is the visualization: You can see that selfies cluster in some fun ways: we have group selfies on top left, a cluster of selfies with sunglasses/glasses in middle left, closeups bottom left, a lot of mirror full-body shots top right, etc.

Amusingly, in the image on the bottom right the ConvNet decided to get rid of the “self” part of selfie, entirely missing the point :) You can find many more fun examples of these “rude” crops: Before any of the more advanced users ask: Yes, I did try to insert a Spatial Transformer layer right after the image and before the ConvNet.

Before anyone asks, I also tried to port a smaller version of this ConvNet to run on iOS so you could enjoy real-time feedback while taking your selfies, but this turned out to be quite involved for a quick side project - e.g.

Of course, we’ve only barely scratched the surface - ConvNets are used as a basic building block in many Neural Networks, not just to classify images/videos but also to segment, detect, and describe, both in the cloud or in robots.

Of course you’ll learn much more by doing than by reading, so I’d recommend that you play with 101 Kaggle Challenges, or that you develop your own side projects, in which case I warmly recommend that you not only do but also write about it, and post it places for all of us to read, for example on /r/machinelearning which has accumulated a nice community.

The science of selfies: Neural network analyses 2 million pictures and finds females who show their hair, cut off their forehead and abandon their friends do best

Women were consistently ranked higher than men - to the extent there was not a single male in the top 100.  The position and pose of the face is quite consistent among the top images.  The face always occupies about 1/3 of the image, is slightly tilted, and is positioned in the center and at the top.

'We see don't see any cut off foreheads.  'Instead, most selfies seem to be a slightly broader shot with head fully in the picture, and shoulders visible.  'It also looks like many of them have a fancy hair style with slightly longer hair combed upwards.

'However, we still do see the prominance of faded facial features.' It also turned out showing skin wasn't the way to a good selfie 'A good portion of the variability between what makes a good or bad selfies can be explained by the style of the image, as opposed to the raw attractiveness of the person.  'Also, with some relief, it seems that the best selfies do not seem to be the ones that show the most skin.'