AI News, BOOK REVIEW: Imaculate does tech

Imaculate does tech

Facebook uses it auto tag your photos, Google uses it for more than just image tagging and street sign translation but also to create weird dreams.

We know CNNs are state of the art for computer vision because they have produced winners in the infamous Imagenet challenge among other challenges.

Modelled after our very own neuronal structure, neural networks are inter-connected processing elements (neurons) which process information by responding to external inputs.

You can imagine a single neuron as a black box that takes in numerical input and produces output(s) with a linear followed by a non linear activation function.

As shown above all neurons in each layer are connected to neurons in adjacent layers, making them Fully Connected (FC) layers.

A convolution neuron is basically a filter that sweeps across the width and height of the input, computing the dot product between itself and input elements in its receptive field producing a 2D activation map.

The Pooling layer performs reduces the spatial size of its input by selecting a value to represent a relatively small region in the input.

series of CONV-RELU-POOL layers are normally stacked together to form a bipyramid like structure where the height and weight decreases while the depth increases as the number of filters increases in higher layers.

On the other hand, CONV filters, have local receptive field therefore patterns that arise out of proximity or lack of therefore are easily detected.

This is important images are unstructured data,meaning each pixel doesn't exactly represent a defined feature, a nose pixel in one selfie is most likely in another location in another selfie.

In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

Convolutional neural network

History 3.1 Receptive fields 3.2 Neocognitron 3.3 Shift-invariant neural network 3.4 Neural abstraction pyramid 3.5 GPU implementations 4

Building blocks 5.1 Convolutional layer 5.2 Pooling layer 5.3 ReLU layer 5.4 Fully connected layer 5.5 Loss layer 6

Choosing hyperparameters 6.1 Number of filters 6.2 Filter shape 6.3 Max pooling shape 7

XRDS

Artificial Neural Networks (ANNs) are used everyday for tackling a broad spectrum of prediction and classification problems, and for scaling up applications which would otherwise require intractable amounts of data.

In this highly instructional and detailed paper, the authors propose a neural architecture called LeNet 5 used for recognizing hand-written digits and words that established a new state of the art2 classification accuracy of 99.2% on the MNIST dataset[5].

Wiesel in their paper[6] proposed an explanation for the way in which mammals visually perceive the world around them using a layered architecture of neurons in the brain, and this in turn inspired engineers to attempt to develop similar pattern recognition mechanisms in computer vision.

The most popular application for CNNs in the recent times has been Image Analysis, but many researchers have also found other interesting and exciting ways to use them: from winning Go matches against human players([7], a related video [8]) to an innovative application in discovering new drugs by training over large quantities of molecular structure data of organic compounds[9].

It can be mathematically described as follows: For a discrete domain of one variable: For a discrete domain of two variables: 2A point to note here is the improvement is, in fact, modest.

Though conventionally called as such, the operation performed on image inputs with CNNs is not strictly convolution, but rather a slightly modified variant called cross-correlation[10], in which one of the inputs is time-reversed: CNN Concepts CNNs have an associated terminology and a set of concepts that is unique to them, and that sets them apart from other types of neural network architectures.

However, with coloured images, particularly RGB (Red, Green, Blue)-based images, the presence of separate colour channels (3 in the case of RGB images) introduces an additional ‘depth’ field to the data, making the input 3-dimensional.

Features Just as its literal meaning implies, a feature is a distinct and useful observation or pattern obtained from the input data that aids in performing the desired image analysis.

As an example, when performing Face Detection, the fact that every human face has a pair of eyes will be treated as a feature by the system, that will be detected and learned by the distinct layers.

The real values of the kernel matrix change with each learning iteration over the training set, indicating that the network is learning to identify which regions are of significance for extracting features from the data.

Kernel Operations Detailed The exact procedure for convolving a Kernel (say, of size 16 x 16) with the input volume (a 256 x 256 x 3 sized RGB image in our case) involves taking patches from the input image of size equal to that of the kernel (16 x 16), and convolving (or calculating the dot product) between the values in the patch and those in the kernel matrix.

The patch selection is then slided (towards the right, or downwards when the boundary of the matrix is reached) by a certain amount called the ‘stride’ value, and the process is repeated till the entire input image has been processed.

Thus, instead of connecting each neuron to all possible pixels, we specify a 2 dimensional region called the ‘receptive field[14]’ (say of size 5×5 units) extending to the entire depth of the input (5x5x3 for a 3 colour channel input), within which the encompassed pixels are fully connected to the neural network’s input layer.

It performs the convolution operation over the input volume as specified in the previous section, and consists of a 3-dimensional arrangement of neurons (a stack of 2-dimensional layers of neurons, one for each channel depth).

The caveat with parameter sharing is that it doesn’t work well with images that encompass a spatially centered structure (such as face images), and in applications where we want the distinct features of the image to be detected in spatially different locations of the layer.

We must keep in mind though that the network operates in the same way that a feed-forward network would: the weights in the Conv layers are trained and updated in each learning iteration using a Back-propagation algorithm extended to be applicable to 3-dimensional arrangements of neurons.

Instead, a smoothed version called the Softplus function is used in practice: The derivative of the softplus function is the sigmoid function, as mentioned in a prior blog post.

Much like the convolution operation performed above, the pooling layer takes a sliding window or a certain region that is moved in stride across the input transforming the values into representative values.

For example, if the input is a volume of size 4x4x3, and the sliding window is of size 2×2, then for each color channel, the values will be down-sampled to their representative maximum value if we perform the max pooling operation.

Figure 6: The Max-Pooling operation can be observed in sub-figures (i), (ii) and (iii) that max-pools the 3 colour channels for an example input volume for the pooling layer.

quick and dirty empirical formula[15] for calculating the spatial dimensions of the Convolutional Layer as a function of the input volume size and the hyperparameters we discussed before can be written as follows: For each (ith) dimension of the input volume, pick: where is the (ith) input dimension, R is the receptive field value, P is the padding value, and S is the value of the stride.

To better understand better how it works, let’s consider the following example: Let the dimensions of the input volume be 288x288x3, the stride value be 2 (both along horizontal and vertical directions).

“Neural networks and physical systems with emergent collective computational abilities.” Proceedings of the national academy of sciences 79.8 (1982): 2554-2558.[http://www.pnas.org/content/79/8/2554.abstract] [2] Rumelhart, D.

[http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234] [8] Clark, Christopher, and Amos Storkey.

[http://colah.github.io/posts/2014-07-Understanding-Convolutions/] [13] Tim Dettmers, “Understanding Convolution In Deep Learning”.[http://timdettmers.com/2015/03/26/convolution-deep-learning/] [14] TensorFlow Documentation: Convolution [https://www.tensorflow.org/versions/r0.7/api_docs/python/nn.html#convolution] [15] Andrej Karpathy, “CS231n: Convolutional Neural Networks for Visual Recognition” [http://cs231n.github.io/convolutional-networks/] [16] Krizhevsky, Alex, and Geoffrey Hinton.

Machine Learning at Condé Nast, Part 1: A Neural Network Primer

The parameters (weights and biases) for each neuron in the convolutional layer is shared with all the neurons in that slice (that is the width and height dimensions).

Convolutional Neural Networks - Ep. 8 (Deep Learning SIMPLIFIED)

Out of all the current Deep Learning applications, machine vision remains one of the most popular. Since Convolutional Neural Nets (CNN) are one of the best available tools for machine vision,...

Recurrent Neural Networks - Ep. 9 (Deep Learning SIMPLIFIED)

Our previous discussions of deep net applications were limited to static patterns, but how can a net decipher and label patterns that change with time? For example, could a net be used to scan...

The Convolution Layer (CNN Visualization)

Demonstrating the convolutional layer of a convolutional neural network. The 3x3 window that passes over our input image is a "feature filter" for the smiley face's left eye (pretend that this...

Visualizing weights & intermediate layer outputs of CNN in Keras

This video explains how we can visualize the configuration of the model as well as the configuration of each layer. It also shows the way to visualize the filters and the parameters. At the...

Lecture 6 | Training Neural Networks I

In Lecture 6 we discuss many practical issues for training modern neural networks. We discuss different activation functions, the importance of data preprocessing and weight initialization,...

Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edureka

This Edureka "Neural Network Tutorial" video (Blog: will help you to understand the basics of Neural Networks and how to use it for deep learning. It explains Single..

Lecture 7 | Training Neural Networks II

Lecture 7 continues our discussion of practical issues for training neural networks. We discuss different update rules commonly used to optimize neural networks during training, as well as...

How Convolutional Neural Networks work

A gentle guided tour of Convolutional Neural Networks. Come lift the curtain and see how the magic is done. For slides and text, check out the accompanying blog post:

Deep Learning Lecture 10: Convolutional Neural Networks

Slides available at: Course taught in 2015 at the University of Oxford by Nando de Freitas with great help from Brendan Shillingford

Lecture 5 | Convolutional Neural Networks

In Lecture 5 we move from fully-connected neural networks to convolutional neural networks. We discuss some of the key historical milestones in the development of convolutional networks, including...