# AI News, BOOK REVIEW: Train A One Layer Feed Forward Neural Network in TensorFlow With ReLU Activation

## Train A One Layer Feed Forward Neural Network in TensorFlow With ReLU Activation

To train our model, we need to tell the model what the correct answer is and we're going to do that by feeding in the correct answers.

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data

However, you'll notice here that instead of having a 784-dimensional vector, we have a 10-dimensional vector.

y_ which represents the correct values that we are trying to get our neural network to learn is a 10-dimensional vector as each vector corresponds to the true probability for each of the different classes, namely 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

Once we have defined our predictions and then the true labels, we're going to use cross entropy to compare them and to produce a numerical value of how close our answer is to the correct answer.

Because we're feeding in the raw output of the ReLU, we're going to call it tf.nn.softmax_cross_entropy_with_logits.

So what this is saying is that using the TensorFlow GradientDescentOptimizer with a learning rate of 0.001, minimize the variable that we've defined as our cross entropy.

Our cross entropy here is defined as the cross entropy between logits y and the labels y_, again with the outputs of our model and the true values.

We're going to train it for 50 steps which we'll handle just using a standard Python for loop.

What this says is from the training set, pull a new batch of 100 samples from there.

Just to show that this runs, we're going to produce this and it should return without any errors.

So just to make sure that it's doing something, we'll tell it to print out what step it's on.

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_) train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)

To train our model, we need to tell the model what the correct answer is and we're going to do that by feeding in the correct answers.

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data

However, you'll notice here that instead of having a 784-dimensional vector, we have a 10-dimensional vector.

y_ which represents the correct values that we are trying to get our neural network to learn is a 10-dimensional vector as each vector corresponds to the true probability for each of the different classes, namely 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

Once we have defined our predictions and then the true labels, we're going to use cross entropy to compare them and to produce a numerical value of how close our answer is to the correct answer.

Because we're feeding in the raw output of the ReLU, we're going to call it tf.nn.softmax_cross_entropy_with_logits.

So what this is saying is that using the TensorFlow GradientDescentOptimizer with a learning rate of 0.001, minimize the variable that we've defined as our cross entropy.

Our cross entropy here is defined as the cross entropy between logits y and the labels y_, again with the outputs of our model and the true values.

We're going to train it for 50 steps which we'll handle just using a standard Python for loop.

What this says is from the training set, pull a new batch of 100 samples from there.

Just to show that this runs, we're going to produce this and it should return without any errors.

So just to make sure that it's doing something, we'll tell it to print out what step it's on.

## Neural Machine Translation (seq2seq) Tutorial

Authors: Thang Luong, Eugene Brevdo, Rui Zhao (Google Research Blogpost, Github) This version of the tutorial requires TensorFlow Nightly. For

using the stable TensorFlow versions, please consider other branches such as tf-1.4.

great success in a variety of tasks such as machine translation, speech recognition,

of seq2seq models and shows how to build a competitive seq2seq model

We achieve this goal by: We believe that it is important to provide benchmarks that people can easily replicate.

As a result, we have provided full experimental results and pretrained

on models on the following publicly available datasets: We first build up some basic knowledge about seq2seq models for NMT, explaining how

and tricks to build the best possible NMT models (both in speed and translation

RNNs, beam search, as well as scaling up to multiple GPUs using GNMT attention.

Back in the old days, traditional phrase-based translation systems performed their

task by breaking up source sentences into multiple chunks and then translated

An encoder converts a source sentence into a 'meaning' vector which is passed

Specifically, an NMT system first reads the source sentence using an encoder to

differ in terms of: (a) directionality – unidirectional or bidirectional;

In this tutorial, we consider as examples a deep multi-layer RNN which is unidirectional

simply consumes the input source words without making any prediction;

on the other hand, processes the target sentence while predicting the next

running: Let's first dive into the heart of building an NMT model with concrete code snippets

At the bottom layer, the encoder and decoder RNNs receive as input the following:

are in time-major format and contain word indices: Here for efficiency, we train with multiple sentences (batch_size) at once.

The embedding weights, one set per language, are usually

choose to initialize embedding weights with pretrained word representations such

Once retrieved, the word embeddings are then fed as input into the main network, which

consists of two multi-layer RNNs – an encoder for the source language and a

models do a better job when fitting large training datasets).

RNN uses zero vectors as its starting states and is built as follows: Note that sentences have different lengths to avoid wasting computation, we tell dynamic_rnn

describe how to build multi-layer LSTMs, add dropout, and use attention in a

Given the logits above, we are now ready to compute our training loss: Here, target_weights is a zero-one matrix of the same size as decoder_outputs.

SGD with a learning of 1.0, the latter approach effectively uses a much smaller

pass is just a matter of a few lines of code: One of the important steps in training RNNs is gradient clipping.

a decreasing learning rate schedule, which yields better performance.

nmt/scripts/download_iwslt15.sh /tmp/nmt_data Run the following command to start the training: The above command trains a 2-layer LSTM seq2seq model with 128-dim hidden units and

We can start Tensorboard to view the summary of the model during training: Training the reverse direction from English and Vietnamese can be done simply by changing:\

--src=en --tgt=vi While you're training your NMT models (and once you have trained models), you can

obtain translations given previously unseen source sentences.

Greedy decoding – example of how a trained NMT model produces a translation

for a source sentence 'Je suis étudiant' using greedy search.

the correct target words as an input, inference uses words predicted by the

know the target sequence lengths in advance, we use maximum_iterations to limit

Having trained a model, we can now create an inference file and translate some sentences:

Remember that in the vanilla seq2seq model, we pass the last source state from the

It consists of the following stages: Here, the function score is used to compared the target hidden state $$h_t$$ with

each of the source hidden states $$\overline{h}_s$$, and the result is normalized to produced

$$a_t$$ is used to derive the softmax logit and loss.

and on whether the previous state $$h_{t-1}$$ is used instead of $$h_t$$ in the scoring function as originally suggested in (Bahdanau et al., 2015).

of attention, i.e., direct connections between target and source, needs to be

use the current target hidden state as a 'query' to decide on which parts of

mechanism, we happen to use the set of source hidden states (or their transformed

versions, e.g., $$W_1h_t$$ in Bahdanau's scoring style) as 'keys'.

Thanks to the attention wrapper, extending our vanilla seq2seq code with attention

attention_model.py First, we need to define an attention mechanism, e.g., from (Luong et al., 2015):

to create a new directory for the attention model, so we don't reuse the previously

Run the following command to start the training: After training, we can use the same inference command with the new out_dir for inference:

separate graphs: Building separate graphs has several benefits: The primary source of complexity becomes how to share Variables across the three graphs

Before: Three models in a single graph and sharing a single Session After: Three models in three graphs, with three Sessions sharing the same Variables Notice how the latter approach is 'ready' to be converted to a distributed version.

feed data at each session.run call (and thereby performing our own batching,

training and eval pipelines: The first approach is easier for users who aren't familiar with TensorFlow or need

to do exotic input modification (i.e., their own minibatch queueing) that can

Some examples: All datasets can be treated similarly via input processing.

To convert each sentence into vectors of word strings, for example, we use the dataset

map transformation: We can then switch each sentence vector into a tuple containing both the vector and

object table, this map converts the first tuple elements from a vector of strings

containing the tuples of the zipped lines can be created via: Batching of variable-length sentences is straightforward.

Values emitted from this dataset will be nested tuples whose tensors have a leftmost

The structure will be: Finally, bucketing that batches similarly-sized source sentences together is also

Reading data from a Dataset requires three lines of code: create the iterator, get

Bidirectionality on the encoder side generally gives better performance (with some

of how to build an encoder with a single bidirectional layer: The variables encoder_outputs and encoder_state can be used in the same way as

While greedy decoding can give us quite reasonable translation quality, a beam search

explore the search space of all possible translations by keeping around a small

a minimal beam width of, say size 10, is generally sufficient.

You may notice the speed improvement of the attention based NMT model is very small

(i.e., 1 bidirectional layers for the encoder), embedding dim is

measure the translation quality in terms of BLEU scores (Papineni et al., 2002).

step-time means the time taken to run one mini-batch (of size 128).

(i.e., 2 bidirectional layers for the encoder), embedding dim is

These results show that our code builds strong baseline systems for NMT.\ (Note

that WMT systems generally utilize a huge amount monolingual data which we currently do not.) Training Speed: (2.1s step-time, 3.4K wps) on Nvidia K40m

see the speed-ups with GNMT attention, we benchmark on K40m only: These results show that without GNMT attention, the gains from using multiple gpus are minimal.\ With

The above results show our models are very competitive among models of similar architectures.\ [Note

that OpenNMT uses smaller models and the current best result (as of this writing) is 28.4 obtained by the Transformer network (Vaswani et al., 2017) which has a significantly different architecture.] We have provided a

There's a wide variety of tools for building seq2seq models, so we pick one per language:\ Stanford

https://github.com/OpenNMT/OpenNMT-py [PyTorch] We would like to thank Denny Britz, Anna Goldie, Derek Murray, and Cinjon Resnick for their work bringing new features to TensorFlow and the seq2seq library.

Build and Train Your First TensorFlow Graph

Video from my talk at NVIDIA's GTC DC 2016. Their hosting: and click "View Recording" “Hello, TensorFlow!” on O'Reilly:

Train an Image Classifier with TensorFlow for Poets - Machine Learning Recipes #6

Monet or Picasso? In this episode, we'll train our own image classifier, using TensorFlow for Poets. Along the way, I'll introduce Deep Learning, and add context and background on why the...

How To Train an Object Detection Classifier Using TensorFlow 1.5 (GPU) on Windows 10

[These instructions work for TensorFlow 1.6 too!] This tutorial shows you how to train your own object detector for multiple objects using Google's TensorFlow Object Detection API on Windows....

Build a TensorFlow Image Classifier in 5 Min

In this episode we're going to train our own image classifier to detect Darth Vader images. The code for this repository is here:

Training Custom Object Detector - TensorFlow Object Detection API Tutorial p.5

Welcome to part 5 of the TensorFlow Object Detection API tutorial series. In this part of the tutorial, we will train our object detection model to detect our custom object. To do this, we...

The Best Way to Prepare a Dataset Easily

In this video, I go over the 3 steps you need to prepare a dataset to be fed into a machine learning model. (selecting the data, processing it, and transforming it). The example I use is preparing...

Training/Testing on our Data - Deep Learning with Neural Networks and TensorFlow part 7

Welcome to part seven of the Deep Learning with Neural Networks and TensorFlow tutorials. We've been working on attempting to apply our recently-learned basic deep neural network on a dataset...

Intro - TensorFlow Object Detection API Tutorial p.1

Hello and welcome to a miniseries and introduction to the TensorFlow Object Detection API. This API can be used to detect, with bounding boxes, objects in images and/or video using either some...

TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Python | Edureka

TensorFlow Training - ) This Edureka TensorFlow Tutorial video (Blog: will help you in understanding various.

How to Deploy a Tensorflow Model to Production

Once we've trained a model, we need a way of deploying it to a server so we can use it as a web or mobile app! We're going to use the Tensorflow Serving library to help us run a model on a...