AI News, Machine Learning Blog Software Development News
- On Monday, June 4, 2018
- By Read More
Machine Learning Blog Software Development News
Deep Learning (the favourite buzzword of late 2010s along with blockchain/bitcoin and Data Science/Machine Learning) has enabled us to do some really cool stuff the last few years.
Other than the advances in algorithms (which admittedly are based on ideas already known since 1990s aka “Data Mining era”), the main reasons of its success can be attributed to the availability of large free datasets, the introduction of open-source libraries and the use of GPUs.
It comes with lots of interesting features such as auto-differentiation (which saves you from estimating/coding the gradients of the cost functions) and GPU support (which allows you to get easily a 200x speed improvement using decent hardware).
Yes it is feasible and from time to time you have to do it (especially if you write custom layers/loss-functions) but do you really want to write code that describes the complex networks as a series of vector operations (yes, I know there are higher-level methods in TF but they are not as cool as Keras)?
Keras allows you to describe your networks using high level concepts and write code that is backend agnostic, meaning that you can run the networks across different deep learning libraries.
It used to be harder to achieve but thankfully Keras has recently included a utility method called mutli_gpu_model which makes the parallel training/predictions easier (currently only available with TF backend).
This method can be used for achieving parallel training and predictions, nevertheless keep in mind that for training it does not scale linearly with the amount of GPUs due to the required synchronization.
When you do multi-GPU training pay attention to the batch size as it has multiple effects on speed/memory, convergence of your model and if you are not careful you might corrupt your model weights!
Especially during training, the inputs of each layer are kept in memory as they are required on the back-propagation step, so increasing your batch size too much can lead to out-of-memory errors.
As Keskar et al put it, “It has been observed in practice that when using a larger batch (than 512) there is a degradation in the quality of the model, as measured by its ability to generalize.”.
Two simple ways to achieve this is either by rejecting batches that don’t match the predefined size or repeat the records within the batch until you reach the predefined size.
A good way to detect whether you are facing GPU data starvation is to monitor the GPU utilization, nevertheless be warned that this is not the only reason for observing that (the synchronization that happens during training across the multiple GPUs is also to blame for low utilization).
There are 2 ways around this: either call the save() on the reference of the original model (the weights will be updated automatically) or you need to serialize the model by chopping-down the parallelized version and cleaning up all the unnecessary connections.
Last but not least, keep in mind that calling the list_devices() method is somehow expensive, so if you are just interested on the number of available GPUs call the method once and store their number on a local variable.
- On Monday, September 16, 2019
The Best Way to Prepare a Dataset Easily
In this video, I go over the 3 steps you need to prepare a dataset to be fed into a machine learning model. (selecting the data, processing it, and transforming it).
Distributed TensorFlow (TensorFlow Dev Summit 2017)
TensorFlow gives you the flexibility to scale up to hundreds of GPUs, train models with a huge number of parameters, and customize every last detail of the ...
Training/Testing on our Data - Deep Learning with Neural Networks and TensorFlow part 7
Welcome to part seven of the Deep Learning with Neural Networks and TensorFlow tutorials. We've been working on attempting to apply our recently-learned ...
Distributed TensorFlow (TensorFlow Dev Summit 2018)
Igor Saprykin offers a way to train models on one machine and multiple GPUs and introduces an API that is foundational for supporting other configurations in ...
The Evolution of Gradient Descent
Which optimizer should we use to train our neural network? Tensorflow gives us lots of options, and there are way too many acronyms. We'll go over how the ...
Training Custom Object Detector - TensorFlow Object Detection API Tutorial p.5
Welcome to part 5 of the TensorFlow Object Detection API tutorial series. In this part of the tutorial, we will train our object detection model to detect our custom ...
Quick Demo: Image classification web app using Keras and Flask
30 seconds quick demo of an image classification web app using Keras and Flask. Here's the source code:
Processing our own Data - Deep Learning with Neural Networks and TensorFlow part 5
Welcome to part five of the Deep Learning with Neural Networks and TensorFlow tutorials. Now that we've covered a simple example of an artificial neural ...
Lecture 11 | Detection and Segmentation
In Lecture 11 we move beyond image classification, and show how convolutional networks can be applied to other core computer vision tasks. We show how ...
Lecture 8 | Deep Learning Software
In Lecture 8 we discuss the use of different software packages for deep learning, focusing on TensorFlow and PyTorch. We also discuss some differences ...