AI News, Big Data Analytics at the MPCDF: GPU Crystallography with Python

Big Data Analytics at the MPCDF: GPU Crystallography with Python

In close collaboration with scientists from MPG, the Max Planck Computing and Data Facility is engaged in the development and optimization of algorithms and applications for high performance computing, as well as in the design and implementation of solutions for data-intensive projects. Python is now used at MPCDF in the emerging area of “atom probe crystallography” (APT): a Fourier spectral analysis in 3D reciprocal space can be simulated in order to reveal both composition and crystallographic structure at the atomic scale of billions APT experimental data sets.

In close collaboration with scientists from MPG, the Max Planck Computing and Data Facility is engaged in the development and optimization of algorithms and applications for high performance computing, as well as in the design and implementation of solutions for data-intensive projects. Python is now used at MPCDF in the emerging area of “atom probe crystallography” (APT): a Fourier spectral analysis in 3D reciprocal space can be simulated in order to reveal both composition and crystallographic structure at the atomic scale of billions APT experimental data sets.

Big Data Analytics at the MPCDF: GPU Crystallography with Python

In close collaboration with scientists from MPG, the Max Planck Computing and Data Facility is engaged in the development and optimization of algorithms and applications for high performance computing, as well as in the design and implementation of solutions for data-intensive projects. Python is now used at MPCDF in the emerging area of “atom probe crystallography” (APT): a Fourier spectral analysis in 3D reciprocal space can be simulated in order to reveal both composition and crystallographic structure at the atomic scale of billions APT experimental data sets.

In close collaboration with scientists from MPG, the Max Planck Computing and Data Facility is engaged in the development and optimization of algorithms and applications for high performance computing, as well as in the design and implementation of solutions for data-intensive projects. Python is now used at MPCDF in the emerging area of “atom probe crystallography” (APT): a Fourier spectral analysis in 3D reciprocal space can be simulated in order to reveal both composition and crystallographic structure at the atomic scale of billions APT experimental data sets.

Why are GPUs necessary for training Deep Learning models?

Most of you would have heard exciting stuff happening using deep learning.

I have seen people training a simple deep learning model for days on their laptops (typically without GPUs) which leads to an impression that Deep Learning requires big systems to run execute.

When I first got introduced with deep learning, I thought that deep learning necessarily needs large Datacenter to run on, and “deep learning experts”

This is because every book that I referred or every talk that I heard, the author or speaker always say that deep learning requires a lot of computational power to run on.

I don’t have to take over Google to be a deep learning expert 😀 This is a common misconception that every beginner faces when diving into deep learning.

Although, it is true that deep learning needs considerable hardware to run efficiently, you don’t need it to be infinite to do your task.

We define an artificial neural network in our favorite programming language which would then be converted into a set of commands that run on the computer.

If you would have to guess which components of neural network do you think would require intense hardware resource, what would be your answer?

When you train a deep learning model, two main operations are performed: In forward pass, input is passed through the neural network and after processing the input, an output is generated.

Whereas in backward pass, we update the weights of neural network on the basis of error we get in forward pass.

So in a neural network, we can consider first array as input to the neural network, and the second array can be considered as weights of the network.

VGG16 (a convolutional neural network of 16 hidden layers which is frequently used in deep learning applications) has ~140 million parameters;

This is in a nutshell why we use GPU (graphics processing units) instead of a CPU (central processing unit) for training a neural network.

Before the boom of Deep learning, Google had a extremely powerful system to do their processing, which they had specially built for training huge nets.

GPGPUs were created for better and more general graphic processing, but were later found to fit scientific computing well.

In 2006, Nvidia came out with a high level language CUDA, which helps you write programs from graphic processors in a high level language.

If your tasks are going to be small or can fit in complex sequential processing, you don’t need a big system to work on.

Scenario 3: If you are regularly working on complex problems or are a company which leverages deep learning, you would probably be better off building a deep learning system or use a cloud service like AWS or FloydHub.

As mentioned above, there is a lot of research and active work happening to think of ways to accelerate computing.

In this article, we covered the motivations of using a GPU for deep learning applications and saw how to choose them for your task.

If you have any specific questions regarding the topic, feel free to comment below or ask them on discussion portal.

Giuseppe Di Bernardo - Big Data Analytics at the MPCDF: GPU Crystallography with Python

"Big Data Analytics at the MPCDF: GPU Crystallography with Python [EuroPython 2017 - Talk - 2017-07-12 - Anfiteatro 1] [Rimini, Italy] In close collaboration with ...

Basic Sound Processing in Python | SciPy 2015 | Allen Downey

Visualization with GPUs on Google Compute Engine (Cloud Next '18)

GPUs on Google Compute Engine allow you to run remote workstations on demand. With NVIDIA Tesla GPUs, you can spin up a workstation in any region you ...

Visualizing data with deck.gl - Nicolas Belmonte

Best Practices for Managing Compute Engine VM Instances (Cloud Next '18)

Stay informed about all of the features being released to help you manage your Google Compute Engine virtual machine instances. In this session we highlight ...

Tensorflow Deep Learning examples running in a IBM Spectrum LSF Suites v10.2 cluster

This fast paced video shows Tensorflow examples including: 1) the Inception Image Recognition tutorial, 2) the Retrain Flower tutorial and 3) MNIST model ...

Google Test Automation Conference - 11/11/2015

GTAC 2015 will be held at the Google office in Cambridge Massachusetts, on ..

Inferring Road Maps from GPS Traces

Google Tech Talk March 7, 2013 (more info below) Presented by James Biagioni ABSTRACT Driven by the near-ubiquitous availability of GPS sensors in a ...

Google Test Automation Conference 2015

GTAC 2015 will be held at the Google office in Cambridge Massachusetts, on ..

Google I/O'17: Channel 6

Technical sessions and deep dives into Google's latest developer products and platforms. Watch more Chrome and Web talks at I/O '17 here: ...