AI News, Have Fun with Machine Learning: A Guide for Beginners

Have Fun with Machine Learning: A Guide for Beginners

This is a hands-on guide to machine learning for programmers with no background in AI.

Using a neural network doesn’t require a PhD, and you don’t need to be the person who makes

stuff like we would any other open source technology, instead of treating it like a research

In this guide our goal will be to write a program that uses machine learning to predict, with a high

of dolphins or seahorses using only the images themselves, and without having

going to approach this from the point of view of a practitioner vs. from

I’m sure that some of what I’ve written below is misleading, naive, or just

Here’s what we’re going to explore: This guide won’t teach you how neural networks are designed, cover much theory, or

Q: 'I know you said we won’t talk about the theory of neural networks, but I’m feeling

There are literally hundreds of introductions to this, from short posts to full online

a good starting point: Installing the software we'll use (Caffe and DIGITS) can be frustrating, depending on your platform and

First, we’re going to be using the Caffe deep learning framework from

everyone is talking about these days…” There are a lot of great choices available, and you should look at all the options.

However, I’m using Caffe for a number of reasons: But the number one reason I’m using Caffe is that you don’t need to write any code to work with

You can do everything declaratively (Caffe uses structured text files to define the network

to fancy GPUs?” It’s true, deep neural networks require a lot of computing power and energy to train...if

has already invested hundreds of hours of compute time training, and then to fine tune

When you’re done installing Caffe, you should have, or be able to do all of the following: On my machine, with Caffe fully built, I’ve got the following basic layout in my CAFFE_ROOT dir: At this point, we have everything we need to train, test, and program with neural networks.

In the next section we’ll add a user-friendly, web-based front end to Caffe

you’re experimenting and trying to learn, I highly recommend beginning with DIGITS.

environment variable before starting the server: NOTE: on Mac I had issues with the server scripts assuming my Python binary was called

Once the server is started, you can do everything else via your web browser at http://localhost:5000, which is what I'll do below.

If you need shell access, use the following command: Training a neural network involves a few steps: We’re going to do this 3 different ways, in order to show the difference between

The easiest way to begin is to divide your images into a categorized directory layout: Here each directory is a category we want to classify, and each image within that

92 Training images (49 dolphin, 43 seahorse) in 2 categories, with 30 Validation

our experimentation and learning purposes, because it won’t take forever to train and

categorizing 1000+ image categories across 1.2 million images.

Caffe uses structured text files to define network architectures.

the most part we’re not going to work with these, but it’s good to be aware of their existence,

We’ll train our network for 30 epochs, which means that it will learn (with our training

images) then test itself (using our validation images), and adjust the network’s

weights depending on how well it’s doing, and repeat this process 30 times. Each

time it completes a cycle we’ll get info about Accuracy (0% to 100%, where

Designing a neural network from scratch, collecting data sufficient to train it

(e.g., millions of images), and accessing GPUs for weeks to complete the training

tuning takes advantage of the layout of deep neural networks, and uses pretrained

Imagine using a neural network to be like looking at something far away with a pair

As you adjust the focus, you start to see colours, lines, shapes, and eventually you

In a multi-layered network, the initial layers extract features (e.g., edges), with later

layers using these features to detect shapes (e.g., a wheel, an eye), which are then

feed into final classification layers that detect items based on accumulated characteristics

to go from pixels to circles to eyes to two eyes placed in a particular orientation, and

new set of image classes instead of the ones on which it was initially trained.

network already knows how to “see” features in images, we’d like to retrain it

medical imagery, identifying plankton species from microscopic images collected at sea,

need the binary .caffemodel file, which is what contains the trained weights, and it’s available

0.001 from 0.01, since we don’t need to make such large jumps (i.e., we’re fine tuning). We’ll

In the pretrained model’s definition (i.e., prototext), we need to rename all references

its original training data (i.e., we want to throw away the current final

Here are the changes we need to make: I’ve included the fully modified file I’m using in src/alexnet-customized.prototxt.

This time our accuracy starts at ~60% and climbs right away to 87.5%, then to 96% and

We rename all references to the three fully connected classification layers, loss1/classifier,

order to rename the 3 classifier layers, as well as to change from 1000 to 2 categories: I’ve put the complete file in src/googlenet-customized.prototxt.

about different numbers of epochs, batch sizes, solver types (Adam, AdaDelta, AdaGrad, etc), learning

Clicking Download Model downloads a tar.gz archive containing the following files: There’s a nice description in the

trained, the current state of that network's weights are stored in a .caffemodel.

our network as a product, we often need to alter it in a few ways: DIGITS has already done the work for us, separating out the different versions of our prototxt files. The

files we’ll care about when using this network are: We can use these files in a number of ways to classify new images.

we can use build/examples/cpp_classification/classification.bin to classify one image: This will spit out a bunch of debug text, followed by the predictions for each of our two categories: You can read the complete C++ source for

For a classification version that uses the Python interface, DIGITS includes a nice example.

Let's write a program that uses our fine-tuned GoogLeNet model to classify the untrained images we

DIGITS, namely: We obviously need to provide the full path, and I'll assume that my files are in a dir called model/: The caffe.Net() constructor takes

It looks like this, in case you encounter it: We're interested in loading images of various sizes into our network for testing.

We'll use it to create a transformation appropriate for our images/network: We can also use the mean.binaryproto file DIGITS gave us to set our transformer's mean: If we had a lot of labels, we might also choose to read in our labels file, which we can use later

by looking up the label for a probability using its position (e.g., 0=dolphin, 1=seahorse): Now we're ready to classify an image.

read our image file, then use our transformer to reshape it and set it as our network's data layer: Q: 'How could I use images (i.e., frames) from a camera or video stream instead of files?'

Great question, here's a skeleton to get you started: Back to our problem, we next need to run the image data through our network and read out the

probabilities from our network's final 'softmax' layer, which will be in order by label category: Running the full version of this (see src/ using our fine-tuned

the following output: I'm still trying to learn all the best practices for working with models in code.

better documented code examples, APIs, premade modules, etc to show you here.

were more simple modules in high-level languages that I could point you at that “did the right

At the beginning we said that our goal was to write a program that used a neural network to correctly

are images of dolphins and seahorses that were never used in the training or validation data:

Let's look at how each of our three attempts did with this challenge: It’s amazing how well our model works, and what’s possible by fine tuning a pretrained network. Obviously

tools and workflows of neural networks, it’s turned out to be an ideal case, especially since it didn’t

Above all I hope that this experience helps to remove the overwhelming fear of getting started. Deciding

whether or not it’s worth investing time in learning the theories of machine learning and neural

networks is easier when you’ve been able to see it work in a small way.

setup and a working approach, you can try doing other sorts of classifications.

Train a Convolutional Neural Network with Nvidia DIGITS and Caffe

In the previous part of this series, I introduced Nvidia DIGITS as a user-friendly interface to build Deep Learning models.

By the end of this tutorial, you will learn everything it takes to train the model with DIGITS and using it for inferencing.

The first directory contains 2,500 images of each category while the test directory has about 1,500 images.

Within a few minutes, DIGITS will parse the directories to create three databases — train, val, and test.

By the time it hits the 30th epoch, the model reaches an accuracy of 85 percent.

Clone this repo which contains Python code for classification along with the shell script to run the Caffe Docker container for inference.

Set the environment variable to the Caffe model file, and then run script to perform inference.

If you try sending a cat image, you would see the below output: Let’s take a closer look at the The script essentially maps the current directory with the model and weights inside the Caffe container.

It then invokes with appropriate parameters such as the model name, weights, labels, and the image to be classified.

This tutorial covered the workflow involved in training a model through Nvidia DIGITS running on a Linux machine backed by GPU and using the same model on a Mac or Windows machine for inference.


Caffe is a deep learning framework made with expression, speed, and modularity in mind. It

between CPU and GPU by setting a single flag to train on a GPU machine then deploy to commodity clusters or mobile devices.

1 ms/image for inference and 4 ms/image for learning and more recent library versions and hardware are faster still. We

Community: Caffe already powers academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. Join

Please cite Caffe in your publications if it helps your research: If you do publish a paper where Caffe helped your research, we encourage you to cite the framework for tracking by Google Scholar.

The BAIR Caffe developers would like to thank NVIDIA for GPU donation, A9 and Amazon Web Services for a research grant in support of Caffe development and reproducible research in deep learning, and BAIR PI Trevor Darrell for guidance.

Using DIGITS to train a Semantic Segmentation neural network

NOTE: refer to this Parallel-For-All blog post for a detailed review of Fully Convolutional Neworks for semantic segmentation.

inference, a grid mask of the input image is generated where the value of each element in the grid denotes the class of the object that the corresponding pixel in the input image represents.

the resulting image mask, the person and the horse are correctly depicted using the color codes from the legend.

Use the script to create a train/val split of the labelled images: Let us take a look at one image/label pair:

color palette in PASCAL VOC is chosen such that adjacent values map to very different colors in order to make classes more easily distinguishable during visual inspection. In

PASCAL VOC, index 0 maps to black pixels (background) and index 255 maps to white pixels (don't care). Other

In the dataset creation form, click Separate validation images then specify the paths to the image and label folders for each of the training and validation sets.

more information on this model, refer to this paper: NOTE: in order to train this network you will need NV-Caffe 0.15 or later, or a corresponding version of the main BVLC fork of Caffe.

To train FCN-Alexnet in DIGITS some minor customizations need to be made to the original FCN-Alexnet prototxt from In order to train FCN-Alexnet a suitable pre-trained model must be used. It

NOTE: skip this section if you are not interested in the details of model convolutionalization and are already in possession of a pre-trained FCN-Alexnet Standard CNNs like Alexnet perform non-spatial prediction, e.g.

fc6 has 256x6x6x4096 weights and 256x6x6 bias terms, for a total number of 37757952 learnable parameters:

could elect to follow a sliding-window approach where the input image is scanned from left to right, top to bottom, and have each window go through fc6. This

your convolutional layer has Mi=256 input feature maps, Mo=4096 output feature maps, square kernels of size K=6 and stride S=1. If

You may use the script to convolutionalize an ILSVRC2012-trained Alexnet model: If you inspect the prototxt carefully you might notice that the learning rate multiplier for layer upscore is set to 0. The

reason behind this is this layer is merely there to perform bilinear interpolation of the score_fr layer so as to produce an upscaled version of its input. Here

we are using a bilinear weight filler to initialize the weights of this deconvolution layer such that the layer does bilinear interpolation of its input, with no further need for parameter learning, hence the null learning rate multiplier.

the images in the label dataset have a palette (as is the case for PASCAL VOC) then this visualization method will show the network output using the palette from the label dataset.

NVIDIA Deep Learning Course: Class #3 - Getting started with Caffe

Register for the full course and find the Q&A log at Caffe is a Deep Learning framework developed by the ..

How to run experiments using Caffe on Ubuntu

The code with which you can run experiments is the following: python python/ --print_results examples/images/cat.jpg foo.

Caffe - Ep. 20 (Deep Learning SIMPLIFIED)

Caffe is a Deep Learning library that is well suited for machine vision and forecasting applications. With Caffe you can build a net with sophisticated ...

How to Make an Image Classifier - Intro to Deep Learning #6

We're going to make our own Image Classifier for cats & dogs in 40 lines of Python! First we'll go over the history of image classification, then we'll dive into the ...

Intro and preprocessing - Using Convolutional Neural Network to Identify Dogs vs Cats p. 1

In this tutorial, we're going to be running through taking raw images that have been labeled for us already, and then feeding them through a convolutional neural ...

How to Train Your Models in the Cloud

Let's discuss whether you should train your models locally or in the cloud. I'll go through several dedicated GPU options, then compare three cloud options; AWS ...

2018 NVidia DIGITS Ubuntu16 CUDA9 GTX1080 with Caffe, TensorFlow solve issues

How to solve problem of installing DIGITS deep learning 2018 NVidia DIGITS Ubuntu16 CUDA9 GTX1080 with Caffe, TensorFlow solve issues top Application, ...

Deep Learning and Image classification using Nvidia Digits

Deep learning is part of machine learning and can be used to for tasks like image classification, object detection, speech recognition and a few other things. In ...

What's an MNIST

An introduction to the MNIST dataset.

Image Classification using Caffe Library