AI News, The Neural Network Zoo
- On Sunday, June 3, 2018
- By Read More
The Neural Network Zoo
With new neural network architectures popping up every now and then, it’s hard to keep track of them all.
I should add that this overview is in no way clarifying how each of the different node types work internally (but that’s a topic for another day).
That’s not the end of it though, in many places you’ll find RNN used as placeholder for any recurrent architecture, including LSTMs, GRUs and even the bidirectional variants.
Many abbreviations also vary in the amount of “N”s to add at the end, because you could call it a convolutional neural network but also simply a convolutional network (resulting in CNN or CN).
Feed forward neural networks (FF or FFNN) and perceptrons (P) are very straight forward, they feed information from the front to the back (input and output, respectively).
A layer alone never has connections and in general two adjacent layers are fully connected (every neuron form one layer to every neuron to another layer).
The simplest somewhat practical network has two input cells and one output cell, which can be used to model logic gates.
Each neuron has an activation threshold which scales to this temperature, which if surpassed by summing the input causes the neuron to take the form of one of two states (usually -1 or 1, sometimes 0 or 1).
If updated one by one, a fair random sequence is created to organise which cells update in what order (fair random being all options (n) occurring exactly once every n items).
if humans see half a table we can image the other half, this network will converge to a table if presented with half noise and half a table.
It starts with random weights and learns through back-propagation, or more recently through contrastive divergence (a Markov chain is used to determine the gradients between two informational gains).
The training and running process of a BM is fairly similar to a HN: one sets the input neurons to certain clamped values after which the network is set free (it doesn’t get a sock).
They don’t trigger-happily connect every neuron to every other neuron but only connect every different group of neurons to every other group, so no input neurons are directly connected to other input neurons and no hidden to hidden connections are made either.
RBMs can be trained like FFNNs with a twist: instead of passing data forward and then back-propagating, you forward pass the data and then backward pass the data (back to the first layer).
If one were to train a SAE the same way as an AE, you would in almost all cases end up with a pretty useless identity network (as in what comes in is what comes out, without any transformation or decomposition).
This sparsity driver can take the form of a threshold filter, where only a certain error is passed back and trained, the other error will be “irrelevant”
In a way this resembles spiking neural networks, where not all neurons fire all the time (and points are scored for biological plausibility).
This is a useful approach because neural networks are large graphs (in a way), so it helps if you can rule out influence from some nodes to other nodes as you dive into deeper layers.
This encourages the network not to learn details but broader features, as learning smaller features often turns out to be “wrong”
This technique is also known as greedy training, where greedy means making locally optimal solutions to get to a decent but possibly not optimal answer.
Rather, you create a scanning input layer of say 20 x 20 which you feed the first 20 x 20 pixels of the image (usually starting in the upper left corner).
Once you passed that input (and possibly use it for training) you feed it the next 20 x 20 pixels: you move the scanner one pixel to the right.
Note that one wouldn’t move the input 20 pixels (or whatever scanner width) over, you’re not dissecting the image into blocks of 20 x 20, but rather you’re crawling over it.
This input data is then fed through convolutional layers instead of normal layers, where not all nodes are connected to all nodes.
These convolutional layers also tend to shrink as they become deeper, mostly by easily divisible factors of the input (so 20 would probably go to a layer of 10 followed by a layer of 5).
Powers of two are very commonly used here, as they can be divided cleanly and completely by definition: 32, 16, 8, 4, 2, 1.
Pooling is a way to filter out details: a commonly found pooling technique is max pooling, where we take say 2 x 2 pixels and pass on the pixel with the most amount of red.
To apply CNNs for audio, you basically feed the input audio waves and inch over the length of the clip, segment by segment.
They may be referenced as deep deconvolutional neural networks, but you could argue that when you stick FFNNs to the back and the front of DNNs that you have yet another architecture which deserves a new name.
The pooling layers commonly found in CNNs are often replaced with similar inverse operations, mainly interpolation and extrapolation with biased assumptions (if a pooling layer uses max pooling, you can invent exclusively lower new data when reversing it).
in the encoding as probabilities, so that it can learn to produce a picture with a cat and a dog together, having only ever seen one of the two in separate pictures.
Demo’s have shown that these networks can also learn to model complex transformations on images, such as changing the source of light or the rotation of a 3D object.
This creates a form of competition where the discriminator is getting better at distinguishing real data from generated data and the generator is learning to become less predictable to the discriminator.
This works well in part because even quite complex noise-like patterns are eventually predictable but generated content similar in features to the input data is harder to learn to distinguish.
GANs can be quite difficult to train, as you don’t just have to train two networks (either of which can pose it’s own problems) but their dynamics need to be balanced as well.
One big problem with RNNs is the vanishing (or exploding) gradient problem where, depending on the activation functions used, information rapidly gets lost over time, just like very deep FFNNs lose information in depth.
Intuitively this wouldn’t be much of a problem because these are just weights and not neuron states, but the weights through time is actually where the information from the past is stored;
A picture or a string of text can be fed one pixel or character at a time, so the time dependent weights are used for what came before in the sequence, not actually from what happened x seconds before.
Long / short term memory (LSTM) networks try to combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined memory cell.
The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter.
It’s an attempt to combine the efficiency and permanency of regular digital storage and the efficiency and expressive power of neural networks.
in Neural Turing Machines comes from them being Turing complete: the ability to read and write and change state based on what it reads means it can represent anything a Universal Turing Machine can represent.
Paper PDF Bidirectional recurrent neural networks, bidirectional long / short term memory networks and bidirectional gated recurrent units (BiRNN, BiLSTM and BiGRU respectively) are not shown on the chart because they look exactly the same as their unidirectional counterparts.
This trains the network to fill in gaps instead of advancing information, so instead of expanding an image on the edge, it could fill a hole in the middle of an image.
Deep residual networks (DRN) are very deep FFNNs with extra connections passing input from one layer to a later layer (often 2 to 5 layers) as well as the next layer.
Instead of trying to find a solution for mapping some input to some output across say 5 layers, the network is enforced to learn to map some input to some output + some input.
It has been shown that these networks are very effective at learning patterns up to 150 layers deep, much more than the regular 2 to 5 layers one could expect to train.
Instead of feeding input and back-propagating the error, we feed the input, forward it and update the neurons for a while, and observe the output over time.
The input and the output layers have a slightly unconventional role as the input layer is used to prime the network and the output layer acts as an observer of the activation patterns that unfold over time.
Instead, they start with random weights and train the weights in a single step according to the least-squares fit (lowest error across all functions).
The real difference is that LSMs are a type of spiking neural networks: sigmoid activations are replaced with threshold functions and each neuron is also an accumulating memory cell.
This entails plotting points in a 3D plot, allowing it to distinguish between Snoopy, Garfield AND Simon’s cat, or even higher dimensions distinguishing even more cartoon characters.
- On Thursday, June 27, 2019
Lecture 6 | Training Neural Networks I
In Lecture 6 we discuss many practical issues for training modern neural networks. We discuss different activation functions, the importance of data ...
Neural Program Learning from Input-Output Examples
Most deep learning research focuses on learning a single task at a time - on a fixed problem, given an input, predict the corresponding output. How should we ...
The Nervous System, Part 1: Crash Course A&P #8
SUBBABLE MESSAGE••• TO: Kerry FROM: Cale I love you with all my ha-art. Deadset. *** You can directly support Crash Course at ...
12a: Neural Nets
NOTE: These videos were recorded in Fall 2015 to update the Neural Nets portion of the class. MIT 6.034 Artificial Intelligence, Fall 2010 View the complete ...
INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS ANN IN HINDI
Find the notes of ARTIFICIAL NEURAL NETWORKS in this link ...
Deep Recurrent Neural Networks for Sequence Learning in Spark
Max/MSP Neural Network Tutorial 1: Our First Neuron!
In this mammoth tutorial, we go through some basics about neural networks and how to build them in Max, and end up with our very own neuron that can learn a ...
Neural Network Explained -Artificial Intelligence - Hindi
Neural network in ai (Artificial intelligence) Neural network is highly interconnected network of a large number of processing elements called neuron architecture ...
Neural Networks Demystified [Part 2: Forward Propagation]
Neural Networks Demystified @stephencwelch Supporting Code: In this short series, we will ..
Professor Forcing Recurrent Neural Networks (NIPS 2016 Spotlight)
Spotlight video for NIPS 2016 Paper: Professor Forcing: A new algorithm for training recurrent networks