AI News, Paul O'Grady - An introduction to PyTorch Autograd

Paul O'Grady - An introduction to PyTorch Autograd

'An introduction to PyTorch & Autograd[EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 2][Rimini, Italy]PyTorch is an optimized tensor library for Deep Learning, and is a recent newcomer to the growing list of GPU programming frameworks available in Python.

In this talk I will present a gentle introduction to the PyTorch library and overview its main features using some simple examples, paying particular attention to the mechanics of the Autograd package.Keywords: GPU Processing, Algorithmic Differentiation, Deep Learning, Linear algebra.License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/...Please see our speaker release agreement for details: https://ep2017.europython.eu/en/speak...

An introduction to PyTorch Autograd

PyTorch is an optimized tensor library for Deep Learning, and is a recent newcomer to the growing list of GPU programming frameworks available in Python.

In this talk I will present a gentle introduction to the PyTorch library and overview its main features using some simple examples, paying particular attention to the mechanics of the Autograd package. Keywords:

In this talk I will present a gentle introduction to the PyTorch library and overview its main features using some simple examples, paying particular attention to the mechanics of the Autograd package.

Exploring the Deep Learning Framework PyTorch

Anyone who is interested in deep learning has likely gotten their hands dirty at some point playing around with Tensorflow, Google’s open source deep learning framework.

Tensorflow has a lot of benefits like wide-scale adoption, deployment on mobile, and support for distributed computing, but it also has a somewhat challenging learning curve, and is difficult to debug.

PyTorch supports tensor computation and dynamic computation graphs that allow you to change how the network behaves on the fly unlike static graphs that are used in frameworks such as Tensorflow.

PyTorch is a Python open source deep learning framework that was primarily developed by Facebook’s artificial intelligence research group and was publicly introduced in January 2017.

While PyTorch is still really new, users are rapidly adopting this modular deep learning framework, especially because PyTorch supports dynamic computation graphs that allow you to change how the network behaves on the fly, unlike static graphs that are used in frameworks such as Tensorflow.

PyTorch is really gaining traction in the community as the go-to deep learning framework for quickly iterating on models and for making tight deadlines due to its intuitive imperative programming architecture.

That means graphs are created on the fly (which you might want if the input data has non-uniform length or dimensions) and those dynamic graphs make debugging as easy as debugging in Python.

Tensors are the main building blocks of deep learning frameworks (besides variables, computational graphs, and such) and are basically objects that describe a linear relationship to other objects.

These data containers can be various sizes like you can see on the slide, such as 1d tensors (vectors), 2d tensors (matrices), 3d tensors (cubes), and so on, holding numerical values.

Neural nets rely on computational graphs and when the network is being trained, the gradients of the loss function get computed based off of the weights and biases, then the weights are updated using gradient descent.

Because PyTorch is a define-by-run framework (defining the graph in forward pass versus a define-then-run framework like Tensorflow), your backprop is defined by how your code is run, and that every single iteration can be different.

So to reiterate how PyTorch does reverse automatic differentiation, the graph is created on the fly each time code is encountered and with backprop, the graph will be dynamically walked backward re-calculating the gradient when you need it.

Because operations in PyTorch actually modify real data, versus in Tensorflow where they are references or containers where the data will be inserted later, it’s possible to debug your code within the graph itself and treat it like regular Python code.

Meaning, due to its imperative programming architecture you can set breakpoints, print out values during computation, read the stack trace and know what part of code it’s referring to.

If you were to create a sentiment analysis model using Recurrent Neural Nets where you also have various input sizes and need native control flow, it’s easy to accomplish with dynamic computational graphs.

With Tensorflow, if you were creating an RNN, you would have to pad the inputs with zeros after setting the maximum length of your sentences to allow for varying input lengths unless you use Tensorflow Fold that allows for use of dynamic batching.

This is due to the static graph only allowing the same input size and shape because remember, the graph is defined before it’s run so you have to explicit about your inputs before you create a TF session to run your graph.

This is because using native Python you wouldn’t be able to keep that control flow logic when exporting the graph, so instead you need to use Tensorflow’s methods for control flow such as tf.while_loop and more so you can build efficient ETL operations within Tensorflow.

Note, when using Tensorflow, that you can have some of the flexibility you gain from dynamic graphs in PyTorch when you use both control flow operations in TF and Tensorflow Fold, but it is not as smooth of a user experience as it is in PyTorch and takes time to ramp up.

Remember, static graph architecture can be thought of as: define the graph, run the graph once to initialize weights, then run your data through the graph multiple times within TF Session for training.

All this is huge because when training a neural network: you don’t want to spend all that time and money training only to get error messages after you’ve spent hours or days on it.

Also note that GPU operations are asynchronous so when moving stuff around you’ll want to use torch.cuda.synchronize() to wait for all kernels in streams to finish otherwise it will be very slow to move things around if they’re waiting for the model to finish.

You also have nice features like pre-loaders in both PyTorch and Tensorflow that allow you to pre-load the data using multiple workers for parallel processing so loading your data doesn’t swamp your expensive GPU’s.

Note that you’ll need to explicitly save the epoch state and the optimizer of your graph to get back to training, but this doesn’t get you the whole way in terms of saving your model.

With Tensorflow you can save the entire computational graph as a protocol buffer that saves operations and weights, everything you need to use it for inference without needing to deploy your actual source code that exported the protobuf.

ONNX is a library that focuses on taking research code into production with an open source format to make it easier to work between different frameworks, such as developing your model in PyTorch and deploying it in Caffe2.

Because RNN’s have variable inputs and due to PyTorch’s use of dynamic graphs, RNNs run faster on PyTorch (there are various blog posts and benchmarks out there, here is one) than on Tensorflow and you don’t have to hack together a solution with Tensorflow Fold to use dynamic graph structures.

Also, again due to the choice of having dynamic graphs versus static graphs, PyTorch is a winner for easy debugging and being able to quickly iterate on your model due to the imperative programming structure.

Tensorflow was built for distributed computing (static graphs allow for better optimization) while automatic differentiation has to be performed on every iteration of the graph in PyTorch which can slow it down in certain cases.

So, in the near future when they release PyTorch 1.0 they are releasing torch.jit which is a just-in-time compiler that converts your PyTorch models into production ready code that can also export your model into a C++ runtime.

A lot of times you are choosing PyTorch for flexibility and sometimes losing out on performance, but that depends on your hardware and model you are running and now that different memory layouts will be available in PyTorch, the future is looking good for PyTorch and performance in production.

Investigating Tensors with PyTorch

In deep learning, it is common to see a lot of discussion around tensors as the cornerstone data structure.

On the other hand, PyTorch is a python package built by Facebook that provides two high-level features: 1) Tensor computation (like Numpy) with strong GPU acceleration and 2) Deep Neural Networks built on a tape-based automatic differentiation system.

(Goodfellow et al.) - "In the general case, an array of numbers arranged on a regular grid with a variable number of axes is known as a tensor."

Tensor notation is much like matrix notation with a capital letter representing a tensor and lowercase letters with subscript integers representing scalar values within the tensor.

It is a term, and set of techniques known in machine learning in the training and operation of deep learning models can be described regarding tensors.

PyTorch is a Python-based scientific computing package targeted for: Let's quickly summarize the unique features of PyTorch - (While this technique is not unique to PyTorch, it’s one of the fastest implementations of it to date.

You get the best of speed and flexibility for your crazy research.) (Hence, PyTorch is quite fast – whether you run small or large neural networks.) This blog makes the comparison between PyTorch and Tensorflow very well.

3.5.) If you have CUDA enabled on your machine feel free and run the following (this is for CUDA 9.0): Now, let's see the installation steps for a Linux environment with Python 3.5 and no CUDA support: Yes, you guessed it right!

Now, if you have CUDA support (9.0) then the step would be: For a Mac environment with Python 3.5 and no CUDA support the steps would be: And, with CUDA support (9.0): (MacOS Binaries don't support CUDA, install from source if CUDA is needed) Hope, you are done with the installation of PyTorch by now!

Now, let's construct a 5x3 matrix, uninitialized: Construct a matrix filled zeros and of data type long: Construct a tensor directly from data: If you understood Tensors correctly, tell me what kind of Tensor x is in the comments section!

dtype (data type), unless new values are provided by user: Get its size: Note that torch.Size is in fact a tuple, so it supports all tuple operations.

The element-wise addition of two tensors with the same dimensions results in a new tensor with the same dimensions where each scalar value is the element-wise addition of the scalars in the parent tensors.

The element-wise subtraction of one tensor from another tensor with the same dimensions results in a new tensor with the same dimensions where each scalar value is the element-wise subtraction of the scalars in the parent tensors4.

Two tensors are “broadcastable” if the following rules hold: Let's understand this with PyTorch using the following code snippet: Now that you have got a fair idea of broadcasting let's see if two tensors are "broadcastable"

If two tensors x, y are “broadcastable”, the resulting tensor size is calculated as follows: If the number of dimensions of x and y are not equal, prepend 1 to the dimensions of the tensor with fewer dimensions to make them equal length.

The tensor product is the most common form of tensor multiplication that you may encounter, but many other types of tensor multiplications exist, such as the tensor dot product and the tensor contraction.

Then you e define the sizes of all the layers and the batch size: And now, you will create some dummy input data x and some dummy target data y.

Paul O'Grady - An introduction to PyTorch Autograd

'An introduction to PyTorch & Autograd[EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 2][Rimini, Italy]PyTorch is an optimized tensor library for Deep Learning, and is a recent newcomer to the growing list of GPU programming frameworks available in Python.

In this talk I will present a gentle introduction to the PyTorch library and overview its main features using some simple examples, paying particular attention to the mechanics of the Autograd package.Keywords: GPU Processing, Algorithmic Differentiation, Deep Learning, Linear algebra.License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/...Please see our speaker release agreement for details: https://ep2017.europython.eu/en/speak...

How is PyTorch different from TensorFlow? What are the advantages of using one vs. the other? When should I use one or the other?

Let’s start from NumPy (you’ll see why a bit later).

You can build a machine learning algorithm even with NumPy, but creating a deep neural network is getting exponentially harder.

It is usually used for machine learning together with a machine learning package.

TensorFlow TensorFlow, on the other hand, was written mainly in C++ and CUDA (NVIDIA's language for programming GPUs), and was not specifically created for Python.

Knowing NumPy (which is my underlying assumption for the dear reader), it is easier to switch to PyTorch than TensorFlow, that is why it is gaining popularity so fast.

I don't know if PyTorch will catch up to the community of TF (that depends on the users and the adoption).

Opinion So, my verdict would be that TensorFlow has kind of stood the test of time (if you can use this expression for a framework that is not that old itself) and is still more widely adopted.

Practical reasons to learn both If you want to work in the industry, it is probable that a company may have some customized framework.

Paul O'Grady - An introduction to PyTorch & Autograd

"An introduction to PyTorch & Autograd [EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 2] [Rimini, Italy] PyTorch is an optimized tensor library for Deep ...