AI News, My Deep Dive into Deep Learning — What I Learned as a Physician

My Deep Dive into Deep Learning — What I Learned as a Physician

My Deep Dive into Deep Learning — What I Learned as a Physician Neurons depicted by Santiago Ramón y Cajal, courtesy of “You’re doing what?” my colleague exclaimed.

We hear the words machine learning, artificial intelligence, and deep learning echoed often, but I wanted to know the mechanics behind these processes and figure out if there were real applications to healthcare.

If we take a generalized approach, one could say neural networks and deep learning are modeled to mimic the neurons and neuronal connections in a brain.

For example, an Artificial Neural Network, at its most basic level, takes numbers as inputs, feeds them through nodes contained in hidden layers, and provides a predicted output.

Through an algorithm the weights or importance of each node in the hidden layers are changed and the model again tries to predict an outcome with the adjusted weights.

Convoluted Neural Network, on the other hand, is tailored towards using images as inputs, which through a series of steps are flattened into numerical inputs that can be fed into the same type of system.

The second, and I think more practical route, is to leverage the fact that you are a domain expert in medicine, and then focus on understanding the concepts of deep learning to then be in a position to come up with and apply solutions to your field.

Try not to get bogged down or quit if the code and mathematics are difficult, as you can always team up with people who are experts in this area, who have spent their time learning this the same way you have spent your time learning medicine.

For example, researchers at Stanford University demonstrated that a deep learning algorithm could diagnose a skin lesion as cancerous just as accurately as a board certified dermatologist.

Adit Deshpande

In this post, we’ll go into summarizing a lot of the new and important developments in the field of computer vision and convolutional neural networks.

The one that started it all (Though some may say that Yann LeCun’s paper in 1998 was the real pioneering publication).

For those that aren’t familiar, this competition can be thought of as the annual Olympics of computer vision, where teams from across the world compete to see who has the best computer vision model for tasks such as classification, localization, detection, and more.

2012 marked the first year where a CNN was used to achieve a top 5 test error rate of 15.4% (Top 5 error is the rate at which, given an image, the model does not output the correct label with its top 5 predictions).

In the paper, the group discussed the architecture of the network (which was called AlexNet).

The neural network developed by Krizhevsky, Sutskever, and Hinton in 2012 was the coming out party for CNNs in the computer vision community.

With AlexNet stealing the show in 2012, there was a large increase in the number of CNN models submitted to ILSVRC 2013.

In this paper titled “Visualizing and Understanding Convolutional Neural Networks”, Zeiler and Fergus begin by discussing the idea that this renewed interest in CNNs is due to the accessibility of large training sets and increased computational power with the usage of GPUs.

The basic idea behind how this works is that at every layer of the trained CNN, you attach a “deconvnet” which has a path back to the image pixels.

The reasoning behind this whole process is that we want to examine what type of structures excite a given feature map.

ZF Net was not only the winner of the competition in 2013, but also provided great intuition as to the workings on CNNs and illustrated more ways to improve performance.

Simplicity and depth.

As the spatial size of the input volumes at each layer decrease (result of the conv and pool layers), the depth of the volumes increase due to the increased number of filters as you go down the network.

VGG Net is one of the most influential papers in my mind because it reinforced the notion that convolutional neural networks have to have a deep network of layers in order for this hierarchical representation of visual data to work.

You know that idea of simplicity in network architecture that we just talked about?

The authors of the paper also emphasized that this new model places notable consideration on memory and power usage (Important note that I sometimes forget too: Stacking all of these layers and adding huge numbers of filters has a computational and memory cost, as well as an increased chance of overfitting).

When we first take a look at the structure of GoogLeNet, we notice immediately that not everything is happening sequentially, as seen in previous architectures.

The bottom green box is our input and the top one is the output of the model (Turning this picture right 90 degrees would let you visualize the model in relation to the last picture which shows the full network).

You may be asking yourself “How does this architecture help?”.

The network in network conv is able to extract information about the very fine grain details in the volume, while the 5x5 filter is able to cover a large receptive field of the input, and thus able to extract its information as well.

GoogLeNet was one of the first models that introduced the idea that CNN layers didn’t always have to be stacked up sequentially.

Imagine a deep CNN architecture.

Aside from the new record in terms of number of layers, ResNet won ILSVRC 2015 with an incredible error rate of 3.6% (Depending on their skill and expertise, humans generally hover around a 5-10% error rate.

The idea behind a residual block is that you have your input x go through conv-relu-conv series.

Basically, the mini module shown below is computing a “delta” or a slight change to the original input x to get a slightly altered representation (When we think of traditional CNNs, we go from x to F(x) which is a completely new representation that doesn’t keep any information about the original x).

3.6% error rate.

Some may argue that the advent of R-CNNs has been more impactful that any of the previous papers on new network architectures.

The purpose of R-CNNs is to solve the problem of object detection.

The authors note that any class agnostic region proposal method should fit.

Improvements were made to the original model because of 3 main problems.

In this model, the image is first fed through a ConvNet, features of the region proposals are obtained from the last feature map of the ConvNet (check section 2.1 of the paper for more details), and lastly we have our fully connected layers as well as our regression and classification heads.

Faster R-CNN works to combat the somewhat complex training pipeline that both R-CNN and Fast R-CNN exhibited.

Being able to determine that a specific object is in an image is one thing, but being able to determine that object’s exact location is a huge jump in knowledge for the computer.

According to Yann LeCun, these networks could be the next big development.

The analogy used in the paper is that the generative model is like “a team of counterfeiters, trying to produce and use fake currency” while the discriminative model is like “the police, trying to detect the counterfeit currency”.

Sounds simple enough, but why do we care about these networks?

What happens when you combine CNNs with RNNs (No, you don’t get R-CNNs, sorry )?But you do get one really amazing application.

The goal of this part of the model is to be able to align the visual and textual data (the image and its sentence description).

Now let’s think about representing the images.

The alignment model has the main purpose of creating a dataset where you have a set of image regions (found by the RCNN) and corresponding text (thanks to the BRNN).

The interesting idea for me was that of using these seemingly different RNN and CNN models to create a very useful application that in a way combines the fields of Computer Vision and Natural Language Processing.

Last, but not least, let’s get into one of the more recent papers in the field.

The 2 things that this module hopes to correct are pose normalization (scenarios where the object is tilted or scaled) and spatial attention (bringing attention to the correct object in a crowded image).

The entity in traditional CNN models that dealt with spatial invariance was the maxpooling layer.

The intuitive reasoning behind this layer was that once we know that a specific feature is in the original input volume (wherever there are high activation values), it’s exact location is not as important as its relative location to other features.

This paper caught my eye for the main reason that improvements in CNNs don’t necessarily have to come from drastic changes in network architecture.

Deep Learning

About this Course Machine learning is one of the fastest-growing and most exciting fields out there, and deep learning represents its true bleeding edge.

In this course, you’ll develop a clear understanding of the motivation for deep learning, and design intelligent systems that learn from complex and/or large-scale datasets.

You will learn to solve new classes of problems that were once thought prohibitively challenging, and come to better appreciate the complex nature of human intelligence as you solve these same problems effortlessly using deep learning methods.

Access Denied

Access Denied Your access to the NCBI website at has been temporarily blocked due to a possible misuse/abuse situation involving your site.

It could be something as simple as a run away script or learning how to better use E-utilities,, for more efficient work such that your work does not impact the ability of other researchers to also use our site.

Deep Learning

He told Page, who had read an early draft, that he wanted to start a company to develop his ideas about how to build a truly intelligent computer: one that could understand language and then make inferences and decisions on its own.

The basic idea—that software can simulate the neocortex’s large array of neurons in an artificial “neural network”—is decades old, and it has led to as many disappointments as breakthroughs.

Last June, a Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats.

In October, Microsoft chief research officer Rick Rashid wowed attendees at a lecture in China with a demonstration of speech software that transcribed his spoken words into English text with an error rate of 7 percent, translated them into Chinese-language text, and then simulated his own voice uttering them in Mandarin.

Hinton, who will split his time between the university and Google, says he plans to “take ideas out of this field and apply them to real problems” such as image recognition, search, and natural-language understanding, he says.

Extending deep learning into applications beyond speech and image recognition will require more conceptual and software breakthroughs, not to mention many more advances in processing power.

One has been to feed computers with information and rules about the world, which required programmers to laboriously write software that is familiar with the attributes of, say, an edge or a sound.

Neural networks, developed in the 1950s not long after the dawn of AI research, looked promising because they attempted to simulate the way the brain worked, though in greatly simplified form.

These weights determine how each simulated neuron responds—with a mathematical output between 0 and 1—to a digitized feature such as an edge or a shade of blue in an image, or a particular energy level at one frequency in a phoneme, the individual unit of sound in spoken syllables.

Programmers would train a neural network to detect an object or phoneme by blitzing the network with digitized versions of images containing those objects or sound waves containing those phonemes.

The eventual goal of this training was to get the network to consistently recognize the patterns in speech or sets of images that we humans know as, say, the phoneme “d” or the image of a dog.

This is much the same way a child learns what a dog is by noticing the details of head shape, behavior, and the like in furry, barking animals that other people call dogs.

Once that layer accurately recognizes those features, they’re fed to the next layer, which trains itself to recognize more complex features, like a corner or a combination of speech sounds.

Because the multiple layers of neurons allow for more precise training on the many variants of a sound, the system can recognize scraps of sound more reliably, especially in noisy environments such as subway platforms.

Hawkins, author of On Intelligence, a 2004 book on how the brain works and how it might provide a guide to building intelligent machines, says deep learning fails to account for the concept of time.

Brains process streams of sensory data, he says, and human learning depends on our ability to recall sequences of patterns: when you watch a video of a cat doing something funny, it’s the motion that matters, not a series of still images like those Google used in its experiment.

In high school, he wrote software that enabled a computer to create original music in various classical styles, which he demonstrated in a 1965 appearance on the TV show I’ve Got a Secret.

Since then, his inventions have included several firsts—a print-to-speech reading machine, software that could scan and digitize printed text in any font, music synthesizers that could re-create the sound of orchestral instruments, and a speech recognition system with a large vocabulary.

This isn’t his immediate goal at Google, but it matches that of Google cofounder Sergey Brin, who said in the company’s early days that he wanted to build the equivalent of the sentient computer HAL in 2001: A Space Odyssey—except one that wouldn’t kill people.

“My mandate is to give computers enough understanding of natural language to do useful things—do a better job of search, do a better job of answering questions,” he says.

queries as quirky as “a long, tiresome speech delivered by a frothy pie topping.” (Watson’s correct answer: “What is a meringue harangue?”) Kurzweil isn’t focused solely on deep learning, though he says his approach to speech recognition is based on similar theories about how the brain works.

“That’s not a project I think I’ll ever finish.” Though Kurzweil’s vision is still years from reality, deep learning is likely to spur other applications beyond speech and image recognition in the nearer term.

Microsoft’s Peter Lee says there’s promising early research on potential uses of deep learning in machine vision—technologies that use imaging for applications such as industrial inspection and robot guidance. > cs > arXiv:1706.07979

Computer Science > Learning Methods for Interpreting and Understanding Deep Neural Networks Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller (Submitted on 24 Jun 2017) This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions.

How Does Deep Learning Work? | Two Minute Papers #24

Artificial neural networks provide us incredibly powerful tools in machine learning that are useful for a variety of tasks ranging from image classification to voice translation. So what is...

How to Predict Stock Prices Easily - Intro to Deep Learning #7

We're going to predict the closing price of the S&P 500 using a special type of recurrent neural network called an LSTM network. I'll explain why we use recurrent nets for time series data,...

How to Make a Text Summarizer - Intro to Deep Learning #10

I'll show you how you can turn an article into a one-sentence summary in Python with the Keras machine learning library. We'll go over word embeddings, encoder-decoder architecture, and the...

Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorflow Tutorial | Edureka

This Edureka Recurrent Neural Networks tutorial video (Blog: will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also..

Recurrent Neural Networks - Ep. 9 (Deep Learning SIMPLIFIED)

Our previous discussions of deep net applications were limited to static patterns, but how can a net decipher and label patterns that change with time? For example, could a net be used to scan...

Andrew Rowan - Bayesian Deep Learning with Edward (and a trick using Dropout)

Filmed at PyData London 2017 Description Bayesian neural networks have seen a resurgence of interest as a way of generating model uncertainty estimates. I use Edward, a new probabilistic programmi...

Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edureka

This Edureka "Neural Network Tutorial" video (Blog: will help you to understand the basics of Neural Networks and how to use it for deep learning. It explains Single..

Learning to learn and compositionality with deep recurrent neural networks

Author: Nando de Freitas, Department of Computer Science, University of Oxford Abstract: Deep neural network representations play an important role in computer vision, speech, computational...

Neural Network Model - Deep Learning with Neural Networks and TensorFlow

Welcome to part three of Deep Learning with Neural Networks and TensorFlow, and part 45 of the Machine Learning tutorial series. In this tutorial, we're going to be heading (falling) down the...