AI News, Relationships with other machine learning techniques

Relationships with other machine learning techniques

Deep learning is a new and exciting subfield of machine learning which attempts to sidestep the whole feature design process, instead learning complex predictors directly from the data.

Most deep learning approaches are based on neural nets, where complex high-level representations are built through a cascade of units computing simple nonlinear functions.

(Geoff is a pioneer in the field, and invented or influenced a large fraction of the work discussed here.) But it’s one thing to learn the basics, and another to be able to get them to work well.

You can also check out one of several review papers, which give readable overviews of recent progress in the field: If you’re interested in using neural nets, it’s likely that you want to automatically predict something.

Supervised learning is a machine learning framework where you have a particular task you’d like the computer to solve, and a training set where the correct predictions are labeled.

For instance, you might want to automatically classify email messages as spam or not-spam, and in supervised learning, you have a dataset of 100,000 emails labeled as 'spam' or 'not spam' that you use to train your classifier so it can classify new emails it has never seen before.

Before diving into neural nets, you'll first want to be familiar with “shallow” machine learning algorithms, such as linear regression, logistic regression, and support vector machines (SVMs).

You’ll need to understand how to balance the tradeoff between underfitting and overfitting: you want your model to be expressive enough to model relevant aspects of the data, but not so complex that it “overfits” by modeling all the idiosyncrasies.

Or maybe you think the data are better explained in terms of clusters, where data points within a cluster are more similar than data points in different clusters.

E.g., if you’re working on object recognition, labeling the objects in images is a laborious task, whereas unlabeled data includes the billions of images available on the Internet.

The idea is that you start by training an unsupervised neural net on the unlabeled data (I’ll cover examples shortly), and then convert it to a supervised network with a similar architecture.

The evidence for generative pre-training is still mixed, and many of the most successful applications of deep neural nets have avoided it entirely, especially in the big data setting.

each unit activates only rarely), or feed the network corrupted versions of its inputs and make it reconstruct the clean ones (this is known as a denoising autoencoder).

The basic workhorse for neural net training is stochastic gradient descent (SGD), where one visits a single training example at a time (or a “minibatch” of training examples), and takes a small step to reduce the loss on those examples.

There is a broad class of optimization problems known as convex optimization, where SGD and other local search algorithms are guaranteed to find the global optimum.

While neural net training isn’t convex, the problem of curvature also shows up for convex problems, and many of the techniques for dealing with it are borrowed from convex optimization.

As general background, it’s useful to read the following sections of Boyd and Vandenberghe’s book, Convex Optimization: While Newton’s method is very good at dealing with curvature, it is impractical for large-scale neural net training for two reasons.

(Matrix inversion is only practical up to tens of thousands of parameters, whereas neural nets typically have millions.) Still, it serves as an idealized second-order training method which one can try to approximate.

Practical algorithms for doing so include: Compared with most neural net models, training RBMs introduces another complication: computing the objective function requires computing the partition function, and computing the gradient requires performing inference.

(This is true for learning Markov random fields (MRFs) more generally.) Contrastive divergence and persistent contrastive divergence are widely used approximations to the gradient which often work quite well in practice.

One can estimate the model likelihood using annealed importance sampling, but this is delicate, and failures in estimation tend to overstate the model's performance.

As early as 1998, convolutional nets were successfully applied to recognizing handwritten digits, and the MNIST handrwritten digit dataset has long been a major benchmark for neural net research.

More recently, convolutional nets made a big splash by significantly pushing forward the state of the art in classifying between thousands of object categories.

There is actually a surprising relationship between neural nets and kernels: Bayesian neural nets converge to Gaussian processes (a kernelized regression model) in the limit of infinitely many hidden units.

Doubly unfortunately, neuroscience and cognitive science seem not to have the same commitment to open access that machine learning does, so this section might only be useful if you have access to a university library.

(This is worth reading, even if you read nothing else in this section.) While not all researchers agree with this way of partitioning things, it's useful to keep in mind when trying to understand exactly what someone is claiming.

Connectionism is a branch of cognitive science, especially influential during the 1980s, which attempted to model high-level cognitive processes in terms of networks of neuron-like units.

Training/Testing on our Data - Deep Learning with Neural Networks and TensorFlow part 7

Welcome to part seven of the Deep Learning with Neural Networks and TensorFlow tutorials. We've been working on attempting to apply our recently-learned ...

Neural Network Model - Deep Learning with Neural Networks and TensorFlow

Welcome to part three of Deep Learning with Neural Networks and TensorFlow, and part 45 of the Machine Learning tutorial series. In this tutorial, we're going to ...

Lecture 6 | Training Neural Networks I

In Lecture 6 we discuss many practical issues for training modern neural networks. We discuss different activation functions, the importance of data ...

Batch Size in a Neural Network explained

In this video, we explain the concept of the batch size used during training of an artificial neural network and also show how to specify the batch size in code with ...

Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edureka

TensorFlow Training - ) This Edureka "Neural Network Tutorial" video (Blog: will .

Neural Network Training Data for self-driving - Python plays GTA p.9

Welcome to part 9 of the Python Plays: Grand Theft Auto series, where our first goal is to create a self-driving car. In this tutorial, we're going to cover how we can ...

Intro - Training a neural network to play a game with TensorFlow and Open AI

This tutorial mini series is focused on training a neural network to play the Open AI environment called CartPole. The idea of CartPole is that there is a pole ...

Processing our own Data - Deep Learning with Neural Networks and TensorFlow part 5

Welcome to part five of the Deep Learning with Neural Networks and TensorFlow tutorials. Now that we've covered a simple example of an artificial neural ...

Neural networks [2.10] : Training neural networks - model selection

Beginner Intro to Neural Networks 4: First Neural Network in Python

Welcome to the fourth video in a series introducing neural networks! In this video we write our first neural network as a function. It takes random parameters (w1, ...