AI News, BOOK REVIEW: Mastering Machine Learning with R - Second Edition

Mastering Machine Learning with R - Second Edition

With a problem-solution approach, you will understand how to implement different deep neural architectures to carry out complex tasks at work.

You will not only learn about the different mobile and embedded platforms supported by TensorFlow but also how to set up cloud platforms for deep learning applications.

By using crisp, no-nonsense recipes, you will become an expert in implementing deep learning techniques in growing real-world applications and research areas such as reinforcement learning, GANs, autoencoders and more.

Over the course of 7 days, you will be introduced to seven algorithms, along with exercises that will help you learn different aspects of machine learning.

On completion of the book, you will understand which machine learning algorithm to pick for clustering, classification, or regression and which is best suited for your problem.

In this book, you will explore, in depth, topics such as data mining, classification, clustering, regression, predictive modeling, anomaly detection, boosted trees with XGBOOST, and more.

By the end of this book, you will be able to perform machine learning with R in the cloud using AWS in various scenarios with different datasets.

Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster.

Machine learning

Machine learning is a field of computer science that uses statistical techniques to give computer systems the ability to 'learn' (e.g., progressively improve performance on a specific task) with data, without being explicitly programmed.[2]

These analytical models allow researchers, data scientists, engineers, and analysts to 'produce reliable, repeatable decisions and results' and uncover 'hidden insights' through learning from historical relationships and trends in the data.[8]

Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: 'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.'[9]

Developmental learning, elaborated for robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration and social interaction with human teachers and using guidance mechanisms such as active learning, maturation, motor synergies, and imitation.

Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval.[13]:708–710;

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases).

Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge.

Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

Loss functions express the discrepancy between the predictions of the model being trained and the actual problem instances (for example, in classification, one wants to assign a label to instances, and models are trained to correctly predict the pre-assigned labels of a set of examples).

The difference between the two fields arises from the goal of generalization: while optimization algorithms can minimize the loss on a training set, machine learning is concerned with minimizing the loss on unseen samples.[15]

The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.

An artificial neural network (ANN) learning algorithm, usually called 'neural network' (NN), is a learning algorithm that is vaguely inspired by biological neural networks.

They are usually used to model complex relationships between inputs and outputs, to find patterns in data, or to capture the statistical structure in an unknown joint probability distribution between observed variables.

Falling hardware prices and the development of GPUs for personal use in the last few years have contributed to the development of the concept of deep learning which consists of multiple hidden layers in an artificial neural network.

Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples.

Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.

Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to some predesignated criterion or criteria, while observations drawn from different clusters are dissimilar.

Different clustering techniques make different assumptions on the structure of the data, often defined by some similarity metric and evaluated for example by internal compactness (similarity between members of the same cluster) and separation between different clusters.

Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG).

Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions, allowing reconstruction of the inputs coming from the unknown data generating distribution, while not being necessarily faithful for configurations that are implausible under that distribution.

Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features.

genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and uses methods such as mutation and crossover to generate new genotype in the hope of finding good solutions to a given problem.

In 2006, the online movie company Netflix held the first 'Netflix Prize' competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%.

Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ('everything is a recommendation') and they changed their recommendation engine accordingly.[37]

Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.[44]

Classification machine learning models can be validated by accuracy estimation techniques like the Holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set.

In comparison, the N-fold-cross-validation method randomly splits the data in k subsets where the k-1 instances of the data are used to train the model while the kth instance is used to test the predictive ability of the training model.

For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[61][62]

There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these 'greed' biases are addressed.[64]

How To Improve Deep Learning Performance

How can you get better performance from your deep learning model?

The ideas won’t just help you with deep learning, but really any machine learning algorithm.

For example, a new framing of your problem or more data is often going to give you more payoff than tuning the parameters of your best performing algorithm.

have included lots of links to tutorials from the blog, questions from related sites as well as questions on the classic Neural Net FAQ.

For example, with photograph image data, you can get big gains by randomly shifting and rotating existing images.

traditional rule of thumb when working with neural networks is: Rescale your data to the bounds of your activation functions.

For example, if you have a sigmoid on the output layer to predict binary values, normalize your y values to be binary.

would suggest that you create a few different versions of your training dataset as follows: Then evaluate the performance of your model on each.

In addition, there are other methods for keeping numbers small in your network such as normalizing activation and weights, but we’ll look at these techniques later.

But they will also learn a problem much faster if you can better expose the structure of the problem to the network for learning.

There are lots of feature selection methods and feature importance methods that can give you ideas of features to keep and features to boot.

Even if you just list off 3-to-5 alternate framings and discount them, at least you are building your confidence in the chosen approach.

It is a good idea to think through the problem and it’s possible framings before you pick up the tool, because you’re less invested in solutions.

All the theory and math describes different approaches to learn a decision process from data (if we constrain ourselves to predictive modeling).

In this section, we’ll touch on just a few ideas around algorithm selection before next diving into the specifics of getting the most from your chosen deep learning method.

No single algorithm can perform better than any other, when performance is averaged across all possible problems.

Now, we are not trying to solve all possible problems, but the new hotness in algorithm land may not be the best choice on your specific dataset.

Maybe you can drop the deep learning model and use something a lot simpler, a lot faster to train, even something that is easy to understand.

This often means we cannot use gold standard methods to estimate the performance of the model such as k-fold cross validation.

For a good post on hyperparameter optimization see: You will get better performance if you know why performance is no longer improving.

quick way to get insight into the learning behavior of your model is to evaluate it on the training and a validation dataset each epoch, and plot the results.

Remember, changing the weight initialization method is closely tied with the activation function and even the optimization function.

Learning rate is coupled with the number of training epochs, batch size and optimization method.

Obviously, you want to choose the right transfer function for the form of your output, but consider exploring different representations.

For example, switch your sigmoid for binary classification to linear for a regression problem, then post-process your outputs.

More layers offer more opportunity for hierarchical re-composition of abstract features learned from the data.

Small batch sizes with large epoch size and a large number of training epochs are common in modern deep learning implementations.

Consider a near infinite number of epochs and setup check-pointing to capture the best performing model seen so far, see more on this further down.

Dropout randomly skips neurons during training, forcing others in the layer to pick up the slack.

Also consider other more traditional neural network regularization techniques , such as: Experiment with the different aspects that can be penalized and with the different types of penalties that can be applied (L1, L2, both).

To get the most out of a given method, you really need to dive into the meaning of each parameter, then grid search different values for your problem.

have found that newer/popular methods can converge a lot faster and give a quick idea of the capability of a given network topology, for example: You can also explore other optimization algorithms such as the more traditional (Levenberg-Marquardt) and the less so (genetic algorithms).

Nevertheless, you often have some leeway (MSE and MAE for regression, etc.) and you might get a small bump by swapping out the loss function on your problem.

This can save a lot of time, and may even allow you to use more elaborate resampling methods to evaluate the performance of your model.

Early stopping is a type of regularization to curb overfitting of the training data and requires that you monitor the performance of the model on training and a held validation datasets, each epoch.

You can also setup checkpoints to save the model if this condition is met (measuring loss of accuracy), and allow the model to keep learning.

Checkpointing allows you to do early stopping without the stopping, giving you a few models to choose from at the end of a run.

We’ll take a look at three general areas of ensembles you may want to consider: Don’t select a model, combine them.

If you have multiple different deep learning models, each that performs well on the problem, combine their predictions by taking the mean.

Their predictions will be highly correlated, but it might give you a small bump on those patterns that are harder to predict.

Often you can get better results over that of a mean of the predictions using simple linear methods like regularized regression that learns how to weight the predictions from different models.

Baseline reuslts using the mean of the predictions from the submodels, but lift performance with learned weightings of the models.

API-driven services bring intelligence to any application

Developed by AWS and Microsoft, Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components.

More seasoned data scientists and researchers will value the ability to build prototypes quickly and utilize dynamic neural network graphs for entirely new model architectures, all without sacrificing training speed.

Machine Learning

Data scientists in both industry and academia have been using GPUs for machine learning to make groundbreaking improvements across a variety of applications including image classification, video analytics, speech recognition and natural language processing.

neural networks to create systems that can perform feature detection from massive amounts of unlabeled training data – is an area that has been seeing significant investment and research.

Although machine learning has been around for decades, two relatively recent trends have sparked widespread use of machine learning: the availability of massive amounts of training data, and powerful and efficient parallel computing provided by GPU computing.

Getting Started with Neural Network Toolbox

Use graphical tools to apply neural networks to data fitting, pattern recognition, clustering, and time series problems. Top 7 Ways to Get Started with Deep ...

Semi-supervised Learning explained

In this video, we explain the concept of semi-supervised learning. We also discuss how we can apply semi-supervised learning with a technique called ...

Turn your laptop into a Deep Learning BEAST

Setup tutorial of an External Video adapter for Deep Learning. I highly recommend using an Nvidia graphic card, since AMD lacks the CUDA API that most Deep ...

11. Introduction to Machine Learning

MIT 6.0002 Introduction to Computational Thinking and Data Science, Fall 2016 View the complete course: Instructor: Eric Grimson ..

More Data Mining with Weka (5.1: Simple neural networks)

More Data Mining with Weka: online course from the University of Waikato Class 5 - Lesson 1: Simple neural networks Slides (PDF): ..

12a: Neural Nets

NOTE: These videos were recorded in Fall 2015 to update the Neural Nets portion of the class. MIT 6.034 Artificial Intelligence, Fall 2010 View the complete ...

TensorFlow.js - Introducing deep learning with client-side neural networks

Let's introduce the concept of client-side artificial neural networks, which will lead us to deploying and running models, along with our full deep learning ...

Introducing ML.NET : Build 2018

ML.NET is aimed at providing a first class experience for Machine Learning in .NET. Using ML.NET, .NET developers can develop and infuse custom AI into ...

CS50 2016 - Week 7 - Machine Learning

00:00:00 - Introduction 00:01:47 - Introducing Machine Learning 00:11:21 - Image Classification 00:17:13 - Flatland 00:19:35 - Lineland, Flatland, Spaceland, ...

Deep Learning Made Easy With These Tips

These deep learning tricks will enhance your knowledge and allow you to implement them in the practical scenario, which may lead you to professional success.