AI News, Machine learning mega-benchmark: GPU providers (part 2)
- On Sunday, September 30, 2018
- By Read More
Machine learning mega-benchmark: GPU providers (part 2)
We had recently published a large-scale machine learning benchmark using word2vec, comparing several popular hardware providers and ML frameworks in pragmatic aspects such as their cost, ease of use, stability, scalability and performance.
I discuss and compare all platforms in much detail below, but each platform came with its own relative pros and cons and the GPUaaS market is a very exciting and lively space.
This HW provider list should be a good assortment of platforms with virtual instances (AWS, GCE), bare metal infrastructure (Softlayer), dedicated servers (Hetzner) and comparatively newer players specialized in providing GPUaaS (LeaderGPU, Paperspace).
The model is trained for 4 epochs on 90% of the (shuffled) data while the remaining held-out 10% is used for model evaluation.
We publish the setup and code in full, not only so anyone can reproduce these results, but also so that you can plug in your own HW platform or another algorithm of choice, to do your own benchmark.
LeaderGPU, Amazon and Paperspace offer freely available Deep Learning Machine Images which come pre-installed with Nvidia drivers along with the Python development environment, and the Nvidia-Docker – essentially the whole nine yards required to start the experiments right off the bat.
While this makes life significantly easier, especially for beginners who just wish to experiment with Machine Learning models, I set up everything from scratch (except for LeaderGPU) the old school way, in order to evaluate the ease of customizing an instance to individual needs.
In this process, I faced a few issues common with all the platforms, such as NVIDIA driver incompatibility with the installed gcc version, or the GPU usage reaching 100% after installing the driver without any evidence of a running process.
The benchmark was carried out using the Keras framework whose multi-GPU implementation was surprisingly inefficient, at times performing worse than a single GPU run on the same machine.
There also appears to be another general trend – cheaper GPUs give better performance/price ratio than the pricier GPUs, indicating that the decrease in training time does not offset the increase in total cost.
Since its one of the most accepted and actively developed deep learning frameworks, users would expect a speedup on switching to multi-GPU model without any additional handling.
The speedup is rather unpredictable – there is clearly a speedup on the “dual GTX 1080” server while the multi-GPU training took even longer to finish compared to the single-GPU training on the “dual P100” server.
As evident from Table 1, there were no notable differences which reaffirms that the underlying hardware/platform has no impact on the quality of training, and that the benchmark was correctly set up.
GPU prices change frequently, but at the moment, AWS provides K80 GPUs (p2 instances) starting at $0.9/hr which are billed in one second increments whereas the more powerful and performant Tesla V100 GPUs (p3 instances) commence at $3.06/hr.
Some of the platforms provide significant discounts (50%-90%) on their spare compute capacity (AWS spot instances and GCE’s preemptive instances) although they can terminate (and restart) unexpectedly.
That is fine for applications that can handle such terminations but many tasks, for instance, time-bound projects wouldn’t fare well in this case, especially if you consider the wasted labor hours.
Running tasks on preemptive/spot instances requires additional code to handle the termination and re-start of instances gracefully (checkpointing/storing data to a persistent disk etc.).
Also, the spot price fluctuations (in case of AWS) can cause the costs to depend heavily on the supply-demand of capacity at the time of the benchmark run.
servers (like the ones provided by LeaderGPU) and bare metal servers (such as Hetzner) are suitable for users considering heavy-duty long term employment of these resources (doh).
- On Monday, September 23, 2019
Overview of Cloud GPU providers for deep learning, and setting up on Paperspace
- Paperspace referral link for $10 starting credit. In this video, I compare Paperspace to both AWS and Azure for GPU computing in the ..
How to Train Your Models in the Cloud
Let's discuss whether you should train your models locally or in the cloud. I'll go through several dedicated GPU options, then compare three cloud options; AWS ...
LiquidSky vs SIXA vs Paperspace
A brief comparison video showing the pros and the cons of each cloud gaming service (Latency, Storage, CPU, GPU, RAM, Price). 3DMark Fire Strike ...
Google Colaboratory now lets you use GPUs for Deep Learning
Google colab: Snippet to check: Please .
Bitcoin Mining Amazon Cloud
Bitcoin Mining Amazon Cloud Infos on Genesis Mining - Use Code HWvl6U Bitcoin is actually a global currency which uses an ..
What is Microsoft azure and google cloud
Subscribe,like and share us !! what is Microsoft,azure,google,cloud,Microsoft azure,google cloudmicrosoftazure, ?? Learn Google Cloud Platform with CBT ...