AI News, Attention and Augmented Recurrent Neural Networks

Attention and Augmented Recurrent Neural Networks

Recurrent neural networks are one of the staples of deep learning, allowing neural networks to work with sequences of data like text, audio and video.

They can be used to boil a sequence down into a high-level understanding, to annotate sequences, and even to generate new sequences from scratch!

The basic RNN design struggles with longer sequences, but a special variant—“long short-term memory” networks [1]—can even work with these.

Such models have been found to be very powerful, achieving remarkable results in many tasks including translation, voice recognition, and image captioning.

Four directions stand out as particularly exciting: Individually, these techniques are all potent extensions of RNNs, but the really striking thing is that they can be combined, and seem to just be points in a broader space.

Our guess is that these “augmented RNNs” will have an important role to play in extending deep learning’s capabilities over the coming years.

Instead of specifying a single location, the RNN outputs an “attention distribution” that describes how we spread out the amount we care about different memory positions.

We do this by having the new value of a position in memory be a convex combination of the old memory content and the write value, with the position between the two decided by the attention weight.

.attr('viewBox', '0 -5 10 10')

d.left = memoryBBox.x + nodeWidth * i;

return `translate(${d.left + d.width / 2 - vectorScale(d.x) / 2}, ${d.top + d.width / 2 - vectorScale(d.y) / 2})`

.attr('x2', function(d) { return vectorScale(d.x);

.attr('y2', function(d) { return vectorScale(d.y);

.style('opacity', function(d) { return d + 0.05;

.style('stroke-opacity', function(d) { return d + 0.1;

var resultX = d3.sum(memoryData, function(d, i) { return d.x * data[i];

var resultY = d3.sum(memoryData, function(d, i) { return d.y * data[i];

.attr('transform',`translate(${resultBBox.x + resultBBox.width / 2 - vectorScale(resultX) / 2}, ${resultBBox.y + resultBBox.width / 2 - vectorScale(resultY) / 2})`)

.attr('x2', function(d) { return vectorScale(resultX);

.attr('y2', function(d) { return vectorScale(resultY);

var distance = Math.abs(bbox.left - d3.event.clientX + bbox.width / 2);

data = data.map(function(d) { return d / d3.sum(data);

.attr('transform', 'translate(' + (writeResultBBox.x + writeResultBBox.width / 2 - vectorScale(writeData.x) / 2) + ', ' + (writeResultBBox.y + writeResultBBox.width / 2 - vectorScale(writeData.y) / 2) + ')')

return 'translate(' + (d.left + d.width / 2 - vectorScale(d.x) / 2) + ', ' + (d.top + d.width / 2 - vectorScale(d.y) / 2) + ')';

.attr('x2', function(d) { return vectorScale(d.x);

.attr('y2', function(d) { return vectorScale(d.y);

d.x = (memoryData[i].x + writeData.x * data[i]) / (1 + data[i]);

d.y = (memoryData[i].y + writeData.y * data[i]) / (1 + data[i]);

.attr('transform', function(d) {

return 'translate(' + (d.left + d.width / 2 - vectorScale(d.x) / 2) + ', ' + (d.top + d.width / 2 - vectorScale(d.y) / 2) + ')';

.attr('x2', function(d) { return vectorScale(d.x);

.attr('y2', function(d) { return vectorScale(d.y);

.style('opacity', function(d) { return d + 0.05;

.style('stroke-opacity', function(d) { return d + 0.1;

var distance = Math.abs(bbox.left - d3.event.clientX + bbox.width / 2);

data = data.map(function(d) { return d / d3.sum(data);

Content-based attention allows NTMs to search through their memory and focus on places that match what they’re looking for, while location-based attention allows relative movement in memory, enabling the NTM to loop.

memoryData = [{ x: 0.4, y: 0.2 }, { x: 0.2, y: 0.2 }, { x: 0.1, y: 0.9 }, { x: 0.9, y: 0.9 }, { x: 0.4, y: 0.3 }]; var

return 'translate(' + (d.left + d.width / 2 - vectorScale(d.x) / 2) + ', ' + (d.top + d.width / 2 - vectorScale(d.y) / 2) + ')';

return 2 * (v.x - 0.5) * 2 * (queryData[0].x - 0.5) + 2 * (v.y - 0.5) * 2 * (queryData[0].y - 0.5);

if (i - (1 - n) >= 0) {

convolveData[i - (1 - n)] += shiftData[n] * interpolateData[i];

var distance = Math.abs(bbox.left - d3.event.clientX + bbox.width / 2);

This capability to read and write allows NTMs to perform many simple algorithms, previously beyond neural networks.

Neural networks can achieve this same behavior using attention, focusing on part of a subset of the information they’re given.

Attention avoids this by allowing the RNN processing the input to pass along information about each word it sees, and then for the RNN generating the output to focus on words as they become relevant.

It can be used in voice recognition [12], allowing one RNN to process the audio and then have another RNN skim over it, focusing on relevant parts as it generates a transcript.

Other uses of this kind of attention include parsing text [13], where it allows the model to glance at words as it generates the parse tree, and for conversational modeling [14], where it lets the model focus on previous parts of the conversation as it generates its response.

More broadly, attentional interfaces can be used whenever one wants to interface with a neural network that has a repeating structure in its output.

We achieve this with the same trick we used before: instead of deciding to run for a discrete number of steps, we have an attention distribution over the number of steps to run.

The weight for each step is determined by a “halting neuron.” It’s a sigmoid neuron that looks at the RNN state and gives a halting weight, which we can think of as the probability that we should stop at that step.

However, there are a number of details here that make it a bit complicated, so let’s start by imagining a slightly simpler model, which is given an arithmetic expression and generates a program to evaluate it.

So an operation might be something like “add the output of the operation 2 steps ago and the output of the operation 1 step ago.” It’s more like a Unix pipe than a program with variables being assigned to and read from.

For example, we might be pretty sure we want to perform addition at the first time step, then have a hard time deciding whether we should multiply or divide at the second step, and so on...

Instead of running a single operation at each step, we do the usual attention trick of running all of them and then average the outputs together, weighted by the probability we ran that operation.

Another lovely approach is the Neural Programmer-Interpreter [18] which can accomplish a number of very interesting tasks, but requires supervision in the form of correct programs.

We think that this general space, of bridging the gap between more traditional programming and neural networks is extremely important.

In general, it seems like a lot of interesting forms of intelligence are an interaction between the creative heuristic intuition of humans and some more crisp and careful media, like language or equations.

One approach is what one might call “heuristic search.” For example, AlphaGo [19] has a model of how Go works and explores how the game could play out guided by neural network intuition.

The “augmented RNNs” we’ve talked about in this article are another approach, where we connect RNNs to engineered media, in order to extend their general capabilities.

The wonderful thing about attention is that it gives us an easier way out of this problem by partially taking all actions to varying extents.

However, it’s still challenging because you may want to do things like have your attention depend on the content of the memory, and doing that naively forces you to look at each memory.

FRM: Bond returns value at risk (VaR) as bond risk

Bond risk can be measured by "price returns value at risk (VaR)" where the price returns VaR is linked to yield VaR with duration. For more financial risk videos, ...

FRM: Value at Risk (VaR): Historical simulation for portfolio

This example is a portfolio of three stocks: GOOG, YHOO, and MSFT. Process is: 1. I calculated for each stock the historical series of daily periodic returns ...

The Stan Collymore Show:“Super Eagles” star Leon Balogun, VAR on WC2018 & return to Southend

Nigerian star defender Leon Balogun predicts what his compatriots would do if they won the World Cup and discusses H&M's “the coolest monkey in the jungle” ...

2015 - FRM : Quantifying Volatility in VaR Models Part I(of 2)

FinTree website link: This series of videos discusses the following key points: 1)How asset retun1 distributions tend to deviate from ..

FinMod 10 Bond Value YTM Duration Convexity VaR

Compute bond value, ytm, duration, convexity, VaR.

Var Z - Return

Vocal stuff. It's pretty cool. Subscribe: Twitter: Soundcloud: .

2015 - FRM : VAR Methods Part I (of 2)

FinTree website link: This series of videos discusses the following key points: • DEFINING VAR • CALCULATING VAR • VAR ..

Marginal value at risk (marginal VaR)

This is a review which follows Jorion's (Chapter 7) calculation of marginal value at risk (marginal VaR). Marginal VaR requires that we calculate the beta of a ...

FRM: Three approaches to value at risk (VaR)

This is a brief introduction to the three basic approaches to value at risk (VaR): Historical simulation, Monte Carlo simulation, Parametric VaR (e.g., delta normal).

FRM: Hybrid historical simulation approach to value at risk (VaR)

Yesterday I illustrated the simple historical approach to estimating value at risk (VaR). Today, using the same 100-day sample of Google's recent daily (periodic) ...