Google, Stanford use machine learning on 37.8m data points for drug discovery

Researchers from Google and Stanford University have used machine learning methods – deep learning and multitask networks – to discover effective drug treatments for a variety of diseases.

“This process often takes years of research, requiring the creation and testing of millions of drug-like compounds in an effort to find a just a few viable drug treatment candidates.” The researchers added that high-throughput screening (rapid automated screening of diverse compounds) is expensive and is usually done in sophisticated labs, which means it may not be the most practical solution.

Applying machine learning to virtual screening (similar to high-throughput screening) is another way to go about drug discovery, but low hit rates resulting in imbalanced datasets and “paucity” of experimental data resulting in overfitting (noise in the training data) remain as challenges, the researchers said.

The datasets were made up of 128 experiments in the PubChem BioAssay database (PCBA), 17 datasets to avoid common pitfalls in virtual screening (MUV), and 102 datasets to evaluate methods to predict interactions between proteins and small molecules (DUD-E).

The goal of Tox21 is to crowdsource data analysis conducted by independent researchers to discover how they can predict compounds' interference in biochemical pathways using only chemical structure data.

our work provides a strong argument that increased data sharing could result in benefits for all.” The researchers also wrote that it’s “disappointing… that all published applications of deep learning to virtual screening (that we are aware of) use distinct datasets that are not directly comparable”, meaning standards for datasets and performance metrics need to be established.

