AI News, Accelerate Machine Learning with the cuDNN Deep Neural Network Library

Accelerate Machine Learning with the cuDNN Deep Neural Network Library

Machine Learning (ML) has its origins in the field of Artificial Intelligence, which started out decades ago with the lofty goals of creating a computer that could do any work a human can do.  While attaining that goal still appears to be in the distant future, many useful tools have been developed and successfully applied to a wide variety of problems.  In fact, ML has now become a pervasive technology, underlying many modern applications.  Today the world’s largest financial companies, internet firms and foremost research institutions are using ML in applications including internet search, fraud detection, gaming, face detection, image tagging, brain mapping, check processing and computer server health-monitoring, to name a few.  The US Postal Service uses machine learning techniques for hand-writing recognition, and leading applied-research government agencies such as IARPA and DARPA are funding work to develop the next generation of ML systems.

Past neural networks were typically both shallow (only one or two layers beyond the input layer) and fully connected, meaning each neuron receives input from every neuron in the layer below it.  Today, the most highly performing neural networks are deep, often having on the order of 10 layers (and the trend is toward even more layers).  A neural network with more than one layer can learn to recognize highly complex, non-linear features in its input.

An alternative to a fully connected layer is a convolutional layer.  A neuron in a convolutional layer is connected to neurons only in a small region in the layer below it.  Typically this region might be a 5×5 grid of neurons (or perhaps 7×7 or 11×11).  The size of this grid is called the filter size.  Thus a convolutional layer can be thought of as performing a convolution on its input.  This type of connection pattern mimics the pattern seen in perceptual areas of the brain, such as retinal ganglion cells or cells in the primary visual cortex.

However DNNs and CNNs require large amounts of computation, especially during the training phase.  Neural networks are trained by presenting the input to the network and letting the resulting activations of the neurons flow up through the net to the output layer, where the result is compared to the correct answer.   An error is calculated for each unit in the output layer and this error is “back propagated” down through the network to adjust each connection weight by a small amount.  Thus there is a “forward pass” of the input to generate an output, and a “backward pass” to propagate error information through the network when training.  When deployed, only the forward pass is used.

Furthermore, DNNs require a large amount of training data to achieve high accuracy, meaning hundreds of thousands to millions of input samples will have to be run through both a forward and backward pass.  Because neural networks are created from large numbers of identical neurons they are highly parallel by nature.  This parallelism maps naturally to GPUs, which provide a significant speed-up over CPU-only training, as shown in Figure 2.  In our own benchmarking using cuDNN with a leading neural network package called CAFFE, we obtain more than a 10X speed-up when training the “reference Imagenet” DNN model on an NVIDIA Tesla K40 GPU, compared to an Intel IvyBridge CPU.

This connection between GPUs and DNNs is revealed clearly when we look at the results of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), as shown in Figure 3.  Prior to 2012, no teams were using GPU-accelerated DNNs, and winning error rates were typically improving by 10% per year or less (yellow line in Figure 3).  In 2012, a team led by Geoff Hinton and Alex Krizhevsky from the University of Toronto was the first to use a GPU-accelerated DNN, and they won the competition by a large margin.  Since then, the proportion of teams using GPU-accelerated DNNs has grown significantly (green bars in Figure 3) and these DNNs continue to demonstrate winning performance.  Data from this year is still being compiled, but at the time of writing it appears at least 90% of teams in ILSVRC14 used GPUs.

Accelerate Machine Learning with the cuDNN Deep Neural Network Library

Machine Learning (ML) has its origins in the field of Artificial Intelligence, which started out decades ago with the lofty goals of creating a computer that could do any work a human can do.  While attaining that goal still appears to be in the distant future, many useful tools have been developed and successfully applied to a wide variety of problems.  In fact, ML has now become a pervasive technology, underlying many modern applications.  Today the world’s largest financial companies, internet firms and foremost research institutions are using ML in applications including internet search, fraud detection, gaming, face detection, image tagging, brain mapping, check processing and computer server health-monitoring, to name a few.  The US Postal Service uses machine learning techniques for hand-writing recognition, and leading applied-research government agencies such as IARPA and DARPA are funding work to develop the next generation of ML systems.

Past neural networks were typically both shallow (only one or two layers beyond the input layer) and fully connected, meaning each neuron receives input from every neuron in the layer below it.  Today, the most highly performing neural networks are deep, often having on the order of 10 layers (and the trend is toward even more layers).  A neural network with more than one layer can learn to recognize highly complex, non-linear features in its input.

An alternative to a fully connected layer is a convolutional layer.  A neuron in a convolutional layer is connected to neurons only in a small region in the layer below it.  Typically this region might be a 5×5 grid of neurons (or perhaps 7×7 or 11×11).  The size of this grid is called the filter size.  Thus a convolutional layer can be thought of as performing a convolution on its input.  This type of connection pattern mimics the pattern seen in perceptual areas of the brain, such as retinal ganglion cells or cells in the primary visual cortex.

However DNNs and CNNs require large amounts of computation, especially during the training phase.  Neural networks are trained by presenting the input to the network and letting the resulting activations of the neurons flow up through the net to the output layer, where the result is compared to the correct answer.   An error is calculated for each unit in the output layer and this error is “back propagated” down through the network to adjust each connection weight by a small amount.  Thus there is a “forward pass” of the input to generate an output, and a “backward pass” to propagate error information through the network when training.  When deployed, only the forward pass is used.

Furthermore, DNNs require a large amount of training data to achieve high accuracy, meaning hundreds of thousands to millions of input samples will have to be run through both a forward and backward pass.  Because neural networks are created from large numbers of identical neurons they are highly parallel by nature.  This parallelism maps naturally to GPUs, which provide a significant speed-up over CPU-only training, as shown in Figure 2.  In our own benchmarking using cuDNN with a leading neural network package called CAFFE, we obtain more than a 10X speed-up when training the “reference Imagenet” DNN model on an NVIDIA Tesla K40 GPU, compared to an Intel IvyBridge CPU.

This connection between GPUs and DNNs is revealed clearly when we look at the results of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), as shown in Figure 3.  Prior to 2012, no teams were using GPU-accelerated DNNs, and winning error rates were typically improving by 10% per year or less (yellow line in Figure 3).  In 2012, a team led by Geoff Hinton and Alex Krizhevsky from the University of Toronto was the first to use a GPU-accelerated DNN, and they won the competition by a large margin.  Since then, the proportion of teams using GPU-accelerated DNNs has grown significantly (green bars in Figure 3) and these DNNs continue to demonstrate winning performance.  Data from this year is still being compiled, but at the time of writing it appears at least 90% of teams in ILSVRC14 used GPUs.

Convolutional Neural Network (CNN)

A Convolutional Neural Network is a class of artificial neural network that uses convolutional layers to filter inputs for useful information.

The convolution operation involves combining input data (feature map) with a convolution kernel (filter) to form a transformed feature map.

The filters in the convolutional layers (conv layers) are modified based on learned parameters to extract the most useful information for a specific task.

The CNN would filter information about the shape of an object when confronted with a general object recognition task but would extract the color of the bird when faced with a bird recognition task.

Applications of Convolutional Neural Networks include various image (image recognition, image classification, video labeling, text analysis) and speech (speech recognition, natural language processing, text classification) processing systems, along with state-of-the-art AI systems such as robots,virtual assistants, and self-driving cars.

A convolutional network is different than a regular neural network in that the neurons in its layers are arranged in three dimensions (width, height, and depth dimensions).

Other regularization techniques such as batch normalization, where we normalize across the activations for the entire batch, or dropout, where we ignore randomly chosen neurons during the training process, can also be used.

More recent CNNs use inception modules which use 1×1 convolutional kernels to reduce the memory consumption further while allowing for more efficient computation (and thus training).

This is done by using 1×1 convolutions with small feature map size, for example, 192 28×28 sized feature maps can be reduced to 64 28×28 feature maps through 64 1×1 convolutions.

For example, the face on an image patch that is not in the center of the image but slightly translated, can still be detected by the convolutional filters because the information is funneled into the right place by the pooling operation.

Nvidia developer blog

The hottest area in machine learning today is Deep Learning, which uses Deep Neural Networks (DNNs) to teach computers to detect recognizable concepts in data.

Researchers and industry practitioners are using DNNs in image and video classification, computer vision, speech recognition, natural language processing, and audio recognition, among other applications.

The success of DNNs has been greatly accelerated by using GPUs, which have become the platform of choice for training these large, complex DNNs, reducing training time from months to only a few days.

Because of the increasing importance of DNNs in both industry and academia and the key role of GPUs, last year NVIDIA introduced cuDNN, a library of primitives for deep neural networks.

It puts the power of deep learning into an intuitive browser-based interface, so that data scientists and researchers can quickly design the best DNN for their data using real-time network behavior visualization.

Deep Learning is an approach to training and employing multi-layered artificial neural networks to assist in or complete a task without human intervention.

DNNs for image classification typically use a combination of convolutional neural network (CNN) layers and fully connected layers made up of artificial neurons tiled so that they respond to overlapping regions of the visual field.

The general approach of processing data through multiple layers, performing feature abstraction at each layer, is analogous to how the brain processes information.

The main console lists existing databases and previously trained network models available on the machine, as well as the training activities in progress.

You can track adjustments you have made to network configuration and maximize accuracy by varying parameters such as bias, neural activation functions, pooling windows, and layers.

When you select a model, DIGITS shows the status of the training exercise and its accuracy, and provides the option to load and classify images while the network is training or after training completes.

Because DIGITS runs a web server, it is easy for a team of users to share datasets and network configurations, and to test and share results.

Once everything is installed, launch DIGITS from its install directory using this command line: Then, if DIGITS is installed on your local machine, load the DIGITS web interface in your web browser by entering the URL http://localhost:5000.

You have three options for defining a network: selecting a preconfigured (“standard”) network, a previous network, or a custom network, as shown in Figure 4 (middle).

This is a handy network configuration checking tool that helps you visualize your network layout and quickly tells you if you have the wrong inputs into certain layers or forgot to put in a pooling function.

My ship category comprises a variety of different marine vehicles including cruise, cargo, weather, passenger and container ships;

It’s also easy to modify my networks, either by downloading a network file from a previous training run and pasting the modified version into the custom network box, or by loading one of my previous networks and customizing it.

I find it hard to mentally visualize a network’s response, but this feature helps by concisely showing all of the layer and activation information.

Figure 7 shows an example of my test network correctly classifying an old photo of a military ship with 100% confidence, and Figure 8 shows the results of classifying a picture of me.

NVIDIA Deep Learning Course: Class #3 - Getting started with Caffe

Register for the full course and find the Q&A log at Caffe is a Deep Learning framework developed by the ..

Keynote (TensorFlow Dev Summit 2017)

Jeff Dean, Rajat Monga, and Megan Kacholia delivered the keynote address at the inaugural TensorFlow Dev Summit. They discuss: - The origins of ...

What Is Cudnn?

How do i step 0 install cuda from the standard repositories. It provides gpu accelerated functionality for common operations in deep neural nets jul 4, 2016 inside ...