AI News,

Or, rather, a particular type of neural network called a convolutional neural network has proved very effective. In this post, I want to build off of the series of posts I wrote about neural networks a few months ago, plus some ideas from my post on digital images, to explain the difference between a convolutional neural network and a classical (is that the right term?) neural network.

If we’re making a neural network to analyze images, then the input to the neural network will be a vector like we saw in the post on digital images: each dimension will represent how light one of the pixels is (or one of its RGB values if it’s a color image, but for simplicity lets stick to grey-scale).

The standard way for a neuron to compute its output is to take a weighted sum of its input values, then apply a function with a steep drop-off that that sends all values below some threshold to values near 0, and all valued above that threshold to values near 1.

(Gory details: The dot product of two unit vectors is the cosine of the angle between them, and cosine is close to one for small angles, then goes to zero as the angle increases, up to a right angle.) So, for the neurons that get their input directly from each incoming data point, we can interpret this as follows: The fixed vector of weights defines an image.

When the output is different from the desired value, the weights are changed in a way so that the output will be closer to correct next time. Because of the way that dot products work, this means that the first level of neurons will try to adjust their weights to match the data points that come in during the training phase.

In order for one of our neurons that’s watching a particular rectangle to end up with weights that define the image of an eye, you need to have lots of input images with an eye in that particular rectangle. In order for the trained neural network to recognize an eye in a future image, there must have been a training image with an eye in the exact same spot.

This is slightly more complex for a convolutional neuron than for a standard neuron, but once you get it right, the first layer of neurons in a convolutional neural network will begin adapting themselves to small images that commonly appear anywhere in the input images that you use to train the network.

Object detection with neural networks — a simple tutorial using keras

Images are easy to generate and handle, and they are exactly the right type of data for machine learning: easy to understand for human beings, but difficult for computers.

To make this tutorial easy to follow along, we’ll apply two simplifications: 1) We don’t use real photographs, but images with abstract geometric shapes.

To construct the “images”, I created a bunch of 8x8 numpy arrays, set the background to 0, and a random rectangle within the array to 1.

Here are a few examples (white is 0, black is 1): The neural network is a very simple feedforward network with one hidden layer (no convolutions, nothing fancy).

You can see that I also plotted the IOU values above each bounding box: This index is called Intersection Over Union and measures the overlap between the predicted and the real bounding box.

Basically, we use the same approach as above: Bootstrap the images with 8x8 numpy arrays and train a feedforward neural network to predict two bounding boxes (i.e.

Let’s say that the expected bounding box of the left rectangle is at position 1 in the target vector (x1, y1, w1, h1), and the expected bounding box of the right rectangle is at position 2 in the vector (x2, y2, w2, h2).

In order to do this, we process the target vectors after every epoch: For each training image, we calculate the mean squared error between the prediction and the target A) for the current order of bounding boxes in the target vector (i.e.

x1, y1, w1, h1, x2, y2, w2, h2) and B) if the bounding boxes in the target vector are flipped (i.e.

If we train our network with flipping enabled, we get the following results (again on held-out test images): Overall, the network achieves a mean IOU of 0.5 on the training data (I haven’t calculated the one for the test dataset, but it should be pretty similar).

Secondly, you don’t necessarily have to use the mean squared error to decide whether the target should be flipped or not — you can as well use the IOU or even the distance between the bounding boxes.

We’ll use the exact same network as above and just add one value per bounding box to the target vector: 0 if the object is a rectangle, and 1 if it’s a triangle (i.e.

Putting it all together: Shapes, Colors, and Convolutional Neural Networks Alright, everything works, so let’s have some fun now: We’ll apply the method to some more “realistic” scenes — that means: different colors, more shapes, and multiple objects at once.

I also made some modifications to the network itself, but let’s first have a look at the results: As you can see, the bounding boxes aren’t perfect, but most of the time they are kind of in the right place.

In comparison to the simple experiments above, I made three modifications: 1) I used a convolutional neural network (CNN) instead of a feedforward network.

Specifically, I used one vector per object to classify shape (rectangle, triangle or circle) and one vector to classify color (red, green or blue).

All in all, the target vector for an image consists of 10 values for each object (4 for the bounding box, 3 for the shape classification, and 3 for the color classification).

Then, it takes the minimum of those values, assigns the corresponding predicted and expected bounding boxes to each other, takes the next smallest value out of the boxes that were not assigned yet, and so on.

Real-world objects Recognizing shapes is a cool and easy example, but obviously it’s not what you want to do in the real world (there aren’t that many abstract 2D shapes in nature, unfortunately).

In the real world, however, you have diverse scenarios: A small side road may have no cars on it, but as soon as you drive on the highway, you have to recognize hundreds of cars at the same time.

Even though this seems like a minor issue, it’s actually pretty hard to solve — how should the algorithm decide what’s an object and what’s background, if it doesn’t know how many objects there are?

What is backpropagation really doing? | Chapter 3, deep learning

What's actually happening to a neural network as it learns? Training data generation + T-shirt at Crowdflower does some cool work ..

On Characterizing the Capacity of Neural Networks using Algebraic Topology

The learnability of different neural architectures can be characterized directly by computable measures of data complexity. In this talk, we reframe the problem of ...

Backpropagation calculus | Appendix to deep learning chapter 3

This one is a bit more symbol heavy, and that's actually the point. The goal here is to represent in somewhat more formal terms the intuition for how ...

4.1.1 Neural Networks - Motivation - Nonlinear Hypotheses

Week 4 (Neural Networks: Representation) - Motivation - Non-linear Hypotheses Machine Learning Coursera by ..

Lec08 Multilayer Perceptron and Deep Neural Networks (Part 1)

Review of a simple perceptron model, learning rules and mechanism.

12b: Deep Neural Nets

NOTE: These videos were recorded in Fall 2015 to update the Neural Nets portion of the class. MIT 6.034 Artificial Intelligence, Fall 2010 View the complete ...

Selection of Region of Interest in Images using roipoly in Matlab

Selection of Region of Interest in Images using roipoly in Matlab. Calrify your Technical Queries in our Pantech forum:


Weight evolution of all neurons in an artificial neural network as they learn how to recognize images of digits. The neural net comprises a hidden layer of 117 ...

1. Non linear Hypotheses

Video from Coursera - Standford University - Course: Machine Learning:

Deep Learning with Tensorflow - Activation Functions

Enroll in the course for free at: Deep Learning with TensorFlow Introduction The majority of data ..