AI News, Accelerated Computing and Deep Learning

Accelerated Computing and Deep Learning

Looking back at the past couple of waves of computing, each was underpinned by a revolutionary computing model, a new architecture that expanded both the capabilities and reach of computing.

In 1995, the PC-Internet era was sparked by the convergence of low-cost microprocessors (CPUs), a standard operating system (Windows 95), and a new portal to a world of information (Yahoo!).

The PC-Internet era brought the power of computing to about a billion people and realized Microsoft’s vision to put “a computer on every desk and in every home.” A decade later, the iPhone put “an Internet communications” device in our pockets.

This new model — where deep neural networks are trained to recognize patterns from massive amounts of data — has proven to be “unreasonably” effective at solving some of the most complex problems in computer science.

That same year, recognizing that the larger the network, or the bigger the brain, the more it can learn, Stanford’s Andrew Ng and NVIDIA Research teamed up to develop a method for training networks using large-scale GPU-computing systems.

In the area of speech recognition, Microsoft Research used GPU deep learning to achieve a historic milestone by reaching “human parity” in conversational speech.

Now, NVIDIA’s GPU runs deep learning algorithms, simulating human intelligence, and acts as the brain of computers, robots and self-driving cars that can perceive and understand the world.

This may explain why NVIDIA GPUs are used broadly for deep learning, and NVIDIA is increasingly known as “the AI computing company.” As a new computing model, GPU deep learning is changing how software is developed and how it runs.

Whereas the old computing model is “instruction processing” intensive, this new computing model requires massive “data processing.” To advance every aspect of AI, we’re building an end-to-end AI computing platform — one architecture that spans training, inference and the billions of intelligent devices that are coming our way.

Pascal can train networks that are 65 times larger or faster than the Kepler GPU that Alex Krizhevsky used in his paper.(1) A single computer of eight Pascal GPUs connected by NVIDIA NVLink the highest throughput interconnect ever created, can train a network faster than 250 traditional servers.

Volta Tensor Core GPU Achieves New AI Performance Milestones

Artificial intelligence powered by deep learning now solves challenges once thought impossible, such as computers understanding and conversing in natural speech and autonomous driving.

Inspired by the effectiveness of deep learning to solve a great many challenges, the exponentially growing complexity of algorithms has resulted in a voracious appetite for faster computing.

For instance, Google created their TPU (tensor processing unit) accelerators which have generated good performance on the limited number of neural networks that can run on TPUs.

Tapping our years of experience and close collaboration with AI researchers all over the world, we created a new architecture optimized for the many models of deep learning – the NVIDIA Tensor Core GPU.

Combined with high-speed NVLink interconnect plus deep optimizations within all current frameworks, we achieve state-of-the-art performance. NVIDIA CUDA GPU programmability ensures performance for the large diversity of modern networks, as well as provides a platform to bring up emerging frameworks and tomorrow’s deep network inventions.

This new hardware accelerates computation of matrix multiples and convolutions, which account for most of the computational operations when training a neural network.

Our CUDA platform enables every deep learning framework to harness the full power of our Tensor Core GPUs to accelerate the rapidly-expanding universe of neural network types such as CNN, RNN, GAN, RL, and the thousands of variants emerging each year.

To eliminate these transposes, we eliminate these transposes by instead representing every tensor in the RN-50 model graph in NHWC format directly, a feature supported by the MXNet framework.

We achieved 1,075 images/second on a single Tensor Core V100 with our contributions to MXNet using a standard 90-epoch training schedule while hitting the same Top-1 classification accuracy (over 75%) as single-precision training.

For our single GPU and single node runs we used the de facto standard of 90 epochs to train ResNet-50 to over 75% accuracy for our single-GPU and single-node runs.

GPUs provide AI researchers programmability and support all DL frameworks to enable them to explore new algorithmic approaches and take advantage of existing ones.

Jeremy Howard and researchers at incorporated key algorithmic innovations and tuning techniques to train ResNet-50 on ImageNet in just three hours on a single AWS P3 instance, powered by eight V100 Tensor Core GPUs.

Krizhevsky took six days to train his brilliant neural network, called AlexNet, which outperformed all other image recognition approaches at the time, kicking off the deep learning revolution.

We demonstrated a 10x higher performance gain on Fairseq in less than a year with our recently-announced DGX-2 plus our numerous software stack improvements (see figure 8).

From building the most advanced deep learning accelerators to complex systems (HBM, COWOS, SXM, NVSwitch, DGX), from advanced numerics libraries and a deep software stack (cuDNN, NCCL, NGC), to accelerating all DL frameworks, NVIDIA’s commitment to AI offers unparalleled flexibility for AI developers.

We will continue to optimize through the entire stack and continue to deliver exponential performance gains to equip the AI community with the tools for driving deep learning innovation forward.

The ideal AI computing platform needs to provide excellent performance, scale to support giant and growing model sizes, and include programmability to address the ever-growing diversity of model architectures.

We’ll soon be combining 16 Tesla V100s into a single server node to create the world’s fastest computing server, offering 2 petaflops of performance.

In The Era Of Artificial Intelligence, GPUs Are The New CPUs

Traditionally, computing power is associated with the number of CPUs and the cores per processing unit.

During the 90s, when WinTel started to invade the enterprise data center, application performance and database throughput were directly proportional to the number of CPUs and available RAM.

While these factors are critical to achieving the desired performance of enterprise applications, a new processor started to gain attention – Graphics Processing Unit or GPU.

The answer lies in the rise of deep learning, an advanced machine learning technique that is heavily used in AI and Cognitive Computing.

Deep learning powers many scenarios including autonomous cars, cancer diagnosis, computer vision, speech recognition and many other intelligent use cases.

In the case of images, the first step is to turn them into the grayscale format and then to assign a number to each pixel depending on how light or dark it is.

As we can imagine, a simple selfie shot from a mobile phone translates to few million pixels, which in turn translates to a large matrix of numbers.

During the training phase of deep learning, these matrices of numbers are fed as input into the neural network along with the correct classification.

For example, by training the neural network with 1000s of cat images, we are going to get a model that can easily recognize a cat visible in a photo.

Altera designed a type of chip called a field programmable gate array (FPGA), a chip that can be programmed later for niche use cases.

FPGAs are becoming popular with cloud providers like Microsoft building massive data centers that host a diverse set of customer workloads.

These virtual machines are powered by the latest Nvidia Tesla processors, which deliver the required performance for training deep learning models.

Google, one of the pioneers in AI and deep learning has announced Tensor Processing Unit or TPU – a chip that is designed to perform complex math operations with massive parallelism.

Accelerating AI with GPUs: A New Computing Model

Building intelligent machines that can perceive the world as we do, understand our language, and learn from examples has been the life’s work of computer scientists for over five decades.

Yet, it took the combination of Yann LeCun’s work in convolutional neural nets, Geoff Hinton’s back-propagation and Stochastic Gradient Descent approach to training, and Andrew Ng’s large-scale use of GPUs to accelerate Deep Neural Networks (DNNs) to ignite the big bang of modern AI — deep learning.

At the time, NVIDIA was busy advancing GPU-accelerated computing, a new computing model that uses massively parallel graphics processors to accelerate applications also parallel in nature.  Scientists and researchers jumped on to GPUs to do molecular-scale simulations to determine the effectiveness of a life-saving drug, to visualize our organs in 3D (reconstructed from light doses of a CT scan), or to do galactic-scale simulations to discover the laws that govern our universe.

Alex Krizhevsky of the University of Toronto won the 2012 ImageNet computer image recognition competition.(1) Krizhevsky beat — by a huge margin — handcrafted software written by computer vision experts.

Shortly thereafter, Microsoft and the China University of Science and Technology announced a DNN that achieved IQ test scores at the college post-graduate level.(4) Then Baidu announced that a deep learning system called Deep Speech 2 had learned both English and Mandarin with a single algorithm.(5) And all top results of the 2015 ImageNet competition were based on deep learning, running on GPU-accelerated deep neural networks, and many beating human-level accuracy.

As Nature recently noted, early progress in deep learning was “made possible by the advent of fast graphics processing units (GPUs) that were convenient to program and allowed researchers to train networks 10 or 20 times faster.”(6) A combination of factors is essential to create a new computing platform — performance, programming productivity, and open accessibility.

By collaborating with AI developers, we continued to improve our GPU designs, system architecture, compilers, and algorithms, and sped up training deep neural networks by 50x in just three years — a much faster pace than Moore’s Law.

The programmability and richness of NVIDIA’s CUDA platform allow researchers to innovate quickly — building new configurations of CNNs, DNNs, deep inception networks, RNNs, LSTMs, and reinforcement learning networks.

This AI technology is how they respond to your spoken word, translate speech or text to another language, recognize and automatically tag images, and recommend newsfeeds, entertainment, and products that are tailored to what each of us likes and cares about.

Whether to augment humans with a superhuman co-pilot, or revolutionize personal mobility services, or reduce the need for sprawling parking lots within cities, self-driving cars have the potential to do amazing social good.

A recent study by KPMG predicts that computerized driver assistance technologies will help reduce car accidents 80% in 20 years — that’s nearly 1 million lives a year saved.

So we need a new computer platform to run it — an architecture that can efficiently execute programmer-coded commands as well as the massively parallel training of deep neural networks.

Why Deep Learning Is Suddenly Changing Your Life

Over the past four years, readers have doubtlessly noticed quantum leaps in the quality of a wide range of everyday technologies.

To gather up dog pictures, the app must identify anything from a Chihuahua to a German shepherd and not be tripped up if the pup is upside down or partially obscured, at the right of the frame or the left, in fog or snow, sun or shade.

Medical startups claim they’ll soon be able to use computers to read X-rays, MRIs, and CT scans more rapidly and accurately than radiologists, to diagnose cancer earlier and less invasively, and to accelerate the search for life-saving pharmaceuticals.

They’ve all been made possible by a family of artificial intelligence (AI) techniques popularly known as deep learning, though most scientists still prefer to call them by their original academic designation: deep neural networks.

Programmers have, rather, fed the computer a learning algorithm, exposed it to terabytes of data—hundreds of thousands of images or years’ worth of speech samples—to train it, and have then allowed the computer to figure out for itself how to recognize the desired objects, words, or sentences.

“You essentially have software writing software,” says Jen-Hsun Huang, CEO of graphics processing leader Nvidia nvda , which began placing a massive bet on deep learning about five years ago.

What’s changed is that today computer scientists have finally harnessed both the vast computational power and the enormous storehouses of data—images, video, audio, and text files strewn across the Internet—that, it turns out, are essential to making neural nets work well.

“We’re now living in an age,” Chen observes, “where it’s going to be mandatory for people building sophisticated software applications.” People will soon demand, he says, “ ‘Where’s your natural-language processing version?’ ‘How do I talk to your app?

The increased computational power that is making all this possible derives not only from Moore’s law but also from the realization in the late 2000s that graphics processing units (GPUs) made by Nvidia—the powerful chips that were first designed to give gamers rich, 3D visual experiences—were 20 to 50 times more efficient than traditional central processing units (CPUs) for deep-learning computations.

Its chief financial officer told investors that “the vast majority of the growth comes from deep learning by far.” The term “deep learning” came up 81 times during the 83-minute earnings call.

I think five years from now there will be a number of S&P 500 CEOs that will wish they’d started thinking earlier about their AI strategy.” Even the Internet metaphor doesn’t do justice to what AI with deep learning will mean, in Ng’s view.

An Introduction to GPU Programming with CUDA

If you can parallelize your code by harnessing the power of the GPU, I bow to you. GPU code is usually abstracted away by by the popular deep learning ...

Best Laptop for Machine Learning

What kind of laptop should you get if you want to do machine learning? There are a lot of options out there and in this video i'll describe the components of an ...

The Deep Learning Revolution

More info at: Deep learning is the fastest-growing field in artificial intelligence, helping computers make sense of infinite ..

Deep Learning on GPU Clusters

In this video from the Nvidia Theater at SC14, Bryan Catanzaro from Baidu presents: Deep Learning on GPU Clusters. "Deep neural networks have recently ...

Why Is Deep Learning Hot Right Now?

Learn more at Deep learning is the fastest-growing field in artificial intelligence (AI), helping computers make ..

Deep Learning with Intel

The Movidius Neural Compute Stick is a miniature deep learning hardware development platform that you can use to prototype, tune, and validate, your AI at the ...

TensorFlow Speed on GPU vs CPU

How fast is TensorFlow on a GPU compared to a CPU? Tested on a NVIDIA GTX 1070 with a MSI GT62VR 6RE Dominator Pro laptop.

How computers learn to recognize objects instantly | Joseph Redmon

Ten years ago, researchers thought that getting a computer to tell the difference between a cat and a dog would be almost impossible. Today, computer vision ...

YOLO Object Detection (TensorFlow tutorial)

You Only Look Once - this object detection algorithm is currently the state of the art, outperforming R-CNN and it's variants. I'll go into some different object ...

Research at NVIDIA: AI Reconstructs Photos with Realistic Results

Researchers from NVIDIA, led by Guilin Liu, introduced a state-of-the-art deep learning method that can edit images or reconstruct a corrupted image, one that ...