AI News, Have Fun with Machine Learning: A Guide for Beginners

Have Fun with Machine Learning: A Guide for Beginners

This is a hands-on guide to machine learning for programmers with no background in AI.

Using a neural network doesn’t require a PhD, and you don’t need to be the person who makes

stuff like we would any other open source technology, instead of treating it like a research

In this guide our goal will be to write a program that uses machine learning to predict, with a high

of dolphins or seahorses using only the images themselves, and without having

going to approach this from the point of view of a practitioner vs. from

I’m sure that some of what I’ve written below is misleading, naive, or just

Here’s what we’re going to explore: This guide won’t teach you how neural networks are designed, cover much theory, or

Q: 'I know you said we won’t talk about the theory of neural networks, but I’m feeling

a good starting point: Installing the software we'll use (Caffe and DIGITS) can be frustrating, depending on your platform and

everyone is talking about these days…” There are a lot of great choices available, and you should look at all the options.

However, I’m using Caffe for a number of reasons: But the number one reason I’m using Caffe is that you don’t need to write any code to work with

You can do everything declaratively (Caffe uses structured text files to define the network

to fancy GPUs?” It’s true, deep neural networks require a lot of computing power and energy to train...if

has already invested hundreds of hours of compute time training, and then to fine tune

When you’re done installing Caffe, you should have, or be able to do all of the following: On my machine, with Caffe fully built, I’ve got the following basic layout in my CAFFE_ROOT dir: At this point, we have everything we need to train, test, and program with neural networks.

In the next section we’ll add a user-friendly, web-based front end to Caffe

you’re experimenting and trying to learn, I highly recommend beginning with DIGITS.

environment variable before starting the server: NOTE: on Mac I had issues with the server scripts assuming my Python binary was called

Once the server is started, you can do everything else via your web browser at http://localhost:5000, which is what I'll do below.

If you need shell access, use the following command: Training a neural network involves a few steps: We’re going to do this 3 different ways, in order to show the difference between

The easiest way to begin is to divide your images into a categorized directory layout: Here each directory is a category we want to classify, and each image within that

92 Training images (49 dolphin, 43 seahorse) in 2 categories, with 30 Validation

our experimentation and learning purposes, because it won’t take forever to train and

categorizing 1000+ image categories across 1.2 million images.

Caffe uses structured text files to define network architectures.

the most part we’re not going to work with these, but it’s good to be aware of their existence,

We’ll train our network for 30 epochs, which means that it will learn (with our training

images) then test itself (using our validation images), and adjust the network’s

weights depending on how well it’s doing, and repeat this process 30 times. Each

time it completes a cycle we’ll get info about Accuracy (0% to 100%, where

Designing a neural network from scratch, collecting data sufficient to train it

(e.g., millions of images), and accessing GPUs for weeks to complete the training

tuning takes advantage of the layout of deep neural networks, and uses pretrained

Imagine using a neural network to be like looking at something far away with a pair

As you adjust the focus, you start to see colours, lines, shapes, and eventually you

In a multi-layered network, the initial layers extract features (e.g., edges), with later

layers using these features to detect shapes (e.g., a wheel, an eye), which are then

feed into final classification layers that detect items based on accumulated characteristics

to go from pixels to circles to eyes to two eyes placed in a particular orientation, and

new set of image classes instead of the ones on which it was initially trained.

network already knows how to “see” features in images, we’d like to retrain it

medical imagery, identifying plankton species from microscopic images collected at sea,

need the binary .caffemodel file, which is what contains the trained weights, and it’s available

0.001 from 0.01, since we don’t need to make such large jumps (i.e., we’re fine tuning). We’ll

In the pretrained model’s definition (i.e., prototext), we need to rename all references

its original training data (i.e., we want to throw away the current final

Here are the changes we need to make: I’ve included the fully modified file I’m using in src/alexnet-customized.prototxt.

This time our accuracy starts at ~60% and climbs right away to 87.5%, then to 96% and

We rename all references to the three fully connected classification layers, loss1/classifier,

order to rename the 3 classifier layers, as well as to change from 1000 to 2 categories: I’ve put the complete file in src/googlenet-customized.prototxt.

about different numbers of epochs, batch sizes, solver types (Adam, AdaDelta, AdaGrad, etc), learning

Clicking Download Model downloads a tar.gz archive containing the following files: There’s a nice description in the

trained, the current state of that network's weights are stored in a .caffemodel.

our network as a product, we often need to alter it in a few ways: DIGITS has already done the work for us, separating out the different versions of our prototxt files. The

files we’ll care about when using this network are: We can use these files in a number of ways to classify new images.

we can use build/examples/cpp_classification/classification.bin to classify one image: This will spit out a bunch of debug text, followed by the predictions for each of our two categories: You can read the complete C++ source for

For a classification version that uses the Python interface, DIGITS includes a nice example.

Let's write a program that uses our fine-tuned GoogLeNet model to classify the untrained images we

DIGITS, namely: We obviously need to provide the full path, and I'll assume that my files are in a dir called model/: The caffe.Net() constructor takes

It looks like this, in case you encounter it: We're interested in loading images of various sizes into our network for testing.

We'll use it to create a transformation appropriate for our images/network: We can also use the mean.binaryproto file DIGITS gave us to set our transformer's mean: If we had a lot of labels, we might also choose to read in our labels file, which we can use later

by looking up the label for a probability using its position (e.g., 0=dolphin, 1=seahorse): Now we're ready to classify an image.

read our image file, then use our transformer to reshape it and set it as our network's data layer: Q: 'How could I use images (i.e., frames) from a camera or video stream instead of files?'

Great question, here's a skeleton to get you started: Back to our problem, we next need to run the image data through our network and read out the

probabilities from our network's final 'softmax' layer, which will be in order by label category: Running the full version of this (see src/ using our fine-tuned

the following output: I'm still trying to learn all the best practices for working with models in code.

better documented code examples, APIs, premade modules, etc to show you here.

were more simple modules in high-level languages that I could point you at that “did the right

At the beginning we said that our goal was to write a program that used a neural network to correctly

are images of dolphins and seahorses that were never used in the training or validation data:

Let's look at how each of our three attempts did with this challenge: It’s amazing how well our model works, and what’s possible by fine tuning a pretrained network. Obviously

tools and workflows of neural networks, it’s turned out to be an ideal case, especially since it didn’t

Above all I hope that this experience helps to remove the overwhelming fear of getting started. Deciding

whether or not it’s worth investing time in learning the theories of machine learning and neural

networks is easier when you’ve been able to see it work in a small way.

setup and a working approach, you can try doing other sorts of classifications.


Caffe is a deep learning framework made with expression, speed, and modularity in mind. It

between CPU and GPU by setting a single flag to train on a GPU machine then deploy to commodity clusters or mobile devices.

1 ms/image for inference and 4 ms/image for learning and more recent library versions and hardware are faster still. We

Community: Caffe already powers academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. Join

Please cite Caffe in your publications if it helps your research: If you do publish a paper where Caffe helped your research, we encourage you to cite the framework for tracking by Google Scholar.

The BAIR Caffe developers would like to thank NVIDIA for GPU donation, A9 and Amazon Web Services for a research grant in support of Caffe development and reproducible research in deep learning, and BAIR PI Trevor Darrell for guidance.

Using DIGITS to train a Semantic Segmentation neural network

NOTE: refer to this Parallel-For-All blog post for a detailed review of Fully Convolutional Neworks for semantic segmentation.

inference, a grid mask of the input image is generated where the value of each element in the grid denotes the class of the object that the corresponding pixel in the input image represents.

the resulting image mask, the person and the horse are correctly depicted using the color codes from the legend.

Use the script to create a train/val split of the labelled images: Let us take a look at one image/label pair:

color palette in PASCAL VOC is chosen such that adjacent values map to very different colors in order to make classes more easily distinguishable during visual inspection. In

PASCAL VOC, index 0 maps to black pixels (background) and index 255 maps to white pixels (don't care). Other

In the dataset creation form, click Separate validation images then specify the paths to the image and label folders for each of the training and validation sets.

more information on this model, refer to this paper: NOTE: in order to train this network you will need NV-Caffe 0.15 or later, or a corresponding version of the main BVLC fork of Caffe.

To train FCN-Alexnet in DIGITS some minor customizations need to be made to the original FCN-Alexnet prototxt from In order to train FCN-Alexnet a suitable pre-trained model must be used. It

NOTE: skip this section if you are not interested in the details of model convolutionalization and are already in possession of a pre-trained FCN-Alexnet Standard CNNs like Alexnet perform non-spatial prediction, e.g.

fc6 has 256x6x6x4096 weights and 256x6x6 bias terms, for a total number of 37757952 learnable parameters:

could elect to follow a sliding-window approach where the input image is scanned from left to right, top to bottom, and have each window go through fc6. This

your convolutional layer has Mi=256 input feature maps, Mo=4096 output feature maps, square kernels of size K=6 and stride S=1. If

You may use the script to convolutionalize an ILSVRC2012-trained Alexnet model: If you inspect the prototxt carefully you might notice that the learning rate multiplier for layer upscore is set to 0. The

reason behind this is this layer is merely there to perform bilinear interpolation of the score_fr layer so as to produce an upscaled version of its input. Here

we are using a bilinear weight filler to initialize the weights of this deconvolution layer such that the layer does bilinear interpolation of its input, with no further need for parameter learning, hence the null learning rate multiplier.

the images in the label dataset have a palette (as is the case for PASCAL VOC) then this visualization method will show the network output using the palette from the label dataset.

2. Classification using Traditional Machine Learning vs. Deep Learning

Deep learning is the new big trend in machine learning.

To do this, we will build a Cat/Dog image classifier using a deep learning algorithm called convolutional neural network (CNN) and a Kaggle dataset.

The first part covers some core concepts behind deep learning, while the second part is structured in a hands-on tutorial format.

In the first part of the hands-on tutorial (section 4), we will build a Cat/Dog image classifier using a convolutional neural network from scratch.

In the second part of the tutorial (section 5), we will cover an advanced technique for training convolutional neural networks called transfer learning.

By the end of this post, you will understand how convolutional neural networks work, and you will get familiar with the steps and the code for building these networks.

Our goal is to build a machine learning algorithm capable of detecting the correct animal (cat or dog) in new unseen images.

Classification using a machine learning algorithm has 2 phases: The training phase for an image classification problem has 2 main steps: In the predicition phase, we apply the same feature extraction process to the new images and we pass the features to the trained machine learning algorithm to predict the label.

The promise of deep learning is more accurate machine learning algorithms compared to traditional machine learning with less or no feature engineering.

In addition to algorithmic innovations, the increase in computing capabilities using GPUs and the collection of larger datasets are all factors that helped in the recent surge of deep learning.

basic model for how the neurons work goes as follows: Each synapse has a strength that is learnable and control the strength of influence of one neuron on another.

If the final sum is above a certain threshold, the neuron get fired, sending a spike along its axon.[1] Artificial neurons are inspired by biological neurons, and try to formulate the model explained above in a computational form.

An artificial neuron has a finite number of inputs with weights associated to them, and an activation function (also called transfer function).

We need 2 elements to train an artificial neural network: Once we have the 2 elements above, we train the ANN using an algorithm called backpropagation together with gradient descent (or one of its derivatives).

CNNs have special layers called convolutional layers and pooling layers that allow the network to encode certain images properties.

This layer consists of a set of learnable filters that we slide over the image spatially, computing dot products between the entries of the filter and the input image.

For example, if we want to apply a filter of size 5x5 to a colored image of size 32x32, then the filter should have depth 3 (5x5x3) to cover all 3 color channels (Red, Green, Blue) of the image.

The goal of the pooling layer is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting.

A pooling layer of size 2x2 with stride of 2 shrinks the input image to a 1/4 of its original size.

[2] The simplest architecture of a convolutional neural networks starts with an input layer (images) followed by a sequence of convolutional layers and pooling layers, and ends with fully-connected layers.

The convolutional, pooling and ReLU layers act as learnable features extractors, while the fully connected layers acts as a machine learning classifier.

Furthermore, the early layers of the network encode generic patterns of the images, while later layers encode the details patterns of the images.

After setting up an AWS instance, we connect to it and clone the github repository that contains the necessary Python code and Caffe configuration files for the tutorial.

There are 4 steps in training a CNN using Caffe: After the training phase, we will use the .caffemodel trained model to make predictions of new unseen data.

transform_img takes a colored images as input, does the histogram equalization of the 3 color channels and resize the image.

We need to make the modifications below to the original bvlc_reference_caffenet prototxt file: We can print the model architecture by executing the command below.

The optimization process will run for a maximum of 40000 iterations and will take a snapshot of the trained model every 5000 iterations.

After defining the model and the solver, we can start training the model by executing the command below: The training logs will be stored under deeplearning-cats-dogs-tutorial/caffe_models/caffe_model_1/model_1_train.log.

The code above stores the mean image under mean_array, defines a model called net by reading the deploy file and the trained model, and defines the transformations that we need to apply to the test images.

The code above read an image, apply similar image processing steps to training phase, calculates each class' probability and prints the class with the largest probability (0 for cats, and 1 for dogs).

Instead of training the network from scratch, transfer learning utilizes a trained model on a different dataset, and adapts it to the problem that we're trying to solve.

Using DIGITS to train a medical image segmentation network

will train an image segmentation neural network and learn how to implement a custom metric by using a Caffe Python layer. We

we will see how a finer segmentation model helps address the issue of coarse segmentation contours.

“Evaluation Framework for Algorithms Segmenting Short Axis Cardiac MRI.” The MIDAS Journal – Cardiac MR Left Ventricle Segmentation Challenge, The dataset consists of 16-bit MRI images in DICOM format and expert-drawn contours in text format (coordinates of contour polylines). A

DIGITS plug-ins are thin layers that users can implement to load data when they are not in a format that DIGITS natively supports. See

The Sunnybrook plug-in reads 16-bit images from DICOM files (those images are the input to the network we will train) and creates black-and-white images out of the .txt files that delineate the contours of the left ventricle (those black-and-white images will be our labels).

will need to download two archives: Once you have downloaded the data, expand the archives into a location that DIGITS can access (if you are using .deb packages, make sure the data can be read by user www-data on your system).

FCN models were first published on: Fully Convolutional Models for Semantic Segmentation Jonathan Long, Evan Shelhamer, Trevor Darrell CVPR 2015 arXiv:1411.4038 We will use FCN-Alexnet, which was already introduced to DIGITS users on the semantic segmentation example.

some time to review the code below and save it into a file called Use the Clone button on the model that you just trained to create a new model with the same settings. In

the custom network definition, add this definition of the Dice layer: In the Python layers menu, check client-side file then upload Name

re-use knowledge that the model has acquired when training on another dataset and apply that knowledge to a different task.

We have already seen in the semantic segmentation example how to use a pre-trained Alexnet model that was trained on the 1.2M ImageNet database (ILSVRC2012). Let

In order to overcome this issue, let us just re-create the dataset as a set of RGB images: we can use the Sunnybrook plug-in for this: Now that we have a dataset of RGB images, we can easily re-use the pre-trained FCN-Alexnet: When you are ready, click Create. You

will see that after a small number of epochs, the Dice coefficient starts increasing to finally exceed 0.6:

We saw in the previous section that FCN-Alexnet produces very coarse outputs: the segmented area is very edgy, while left ventricles tend to look like circles. The

order to reduce overfit, we can artificially augment the training dataset by applying random perturbations (color, contract changes, etc.).

How to run experiments using Caffe on Ubuntu

The code with which you can run experiments is the following: python python/ --print_results examples/images/cat.jpg foo.

Deep Learning and Image classification using Nvidia Digits

Deep learning is part of machine learning and can be used to for tasks like image classification, object detection, speech recognition and a few other things. In this video, let's walk through...

Build a TensorFlow Image Classifier in 5 Min

Only a few days left to sign up for my new course! Learn more and sign-up here In this episode we're going to train our own image..

How to Train Your Models in the Cloud

Let's discuss whether you should train your models locally or in the cloud. I'll go through several dedicated GPU options, then compare three cloud options; AWS, Google Cloud, and FloydHub....


How to train a caffe model to recognize hand-written digits using Mnist Dataset ! Feel free to contact us or to comment for more information.

opencv 3.3 深度學習 DNN模塊實現 GoogleNet 一千種圖片分類 caffe , Opencv 3.3 DNN How to Use GoogleNet Tutorial

opencv 3.3 深度學習DNN模塊實現GoogleNet 一千種物件圖片分類caffe 5:50 seconds to see the effect Thousands of objects picture classification opencv 3.3 how to use GoogleNet...

NVIDIA Deep Learning Course: Class #3 - Getting started with Caffe

Register for the full course and find the Q&A log at Caffe is a Deep Learning framework developed by the Berkeley Vision and Learning Center..

Caffe Custom ImageNet Training and excute

[Caffe 프레임워크] 개인 이미지를 학습 시키고, 분류하는 영상입니다. 개인 이미지는 연예인 이미지를 이용하여 분류하였습니다.

2018 NVidia DIGITS Ubuntu16 CUDA9 GTX1080 with Caffe, TensorFlow solve issues

How to solve problem of installing DIGITS deep learning 2018 NVidia DIGITS Ubuntu16 CUDA9 GTX1080 with Caffe, TensorFlow solve issues top Application, programming, interface, tutorial, introduct...

TFLearn - Deep Learning with Neural Networks and TensorFlow p. 14

Welcome to part fourteen of the Deep Learning with Neural Networks and TensorFlow tutorials. Today, we're going to be covering TFLearn, which is a high-level/abstraction layer for TensorFlow....