AI News, Google, Stanford use machine learning on 37.8m data points for drug discovery
- On Thursday, October 4, 2018
- By Read More
Google, Stanford use machine learning on 37.8m data points for drug discovery
Researchers from Google and Stanford University have used machine learning methods – deep learning and multitask networks – to discover effective drug treatments for a variety of diseases.
“This process often takes years of research, requiring the creation and testing of millions of drug-like compounds in an effort to find a just a few viable drug treatment candidates.” The researchers added that high-throughput screening (rapid automated screening of diverse compounds) is expensive and is usually done in sophisticated labs, which means it may not be the most practical solution.
Applying machine learning to virtual screening (similar to high-throughput screening) is another way to go about drug discovery, but low hit rates resulting in imbalanced datasets and “paucity” of experimental data resulting in overfitting (noise in the training data) remain as challenges, the researchers said.
The datasets were made up of 128 experiments in the PubChem BioAssay database (PCBA), 17 datasets to avoid common pitfalls in virtual screening (MUV), and 102 datasets to evaluate methods to predict interactions between proteins and small molecules (DUD-E).
The goal of Tox21 is to crowdsource data analysis conducted by independent researchers to discover how they can predict compounds' interference in biochemical pathways using only chemical structure data.
our work provides a strong argument that increased data sharing could result in benefits for all.” The researchers also wrote that it’s “disappointing… that all published applications of deep learning to virtual screening (that we are aware of) use distinct datasets that are not directly comparable”, meaning standards for datasets and performance metrics need to be established.
- On Thursday, February 21, 2019
Predicting Stock Prices - Learn Python for Data Science #4
In this video, we build an Apple Stock Prediction script in 40 lines of Python using the scikit-learn library and plot the graph using the matplotlib library.
Types of Data: Nominal, Ordinal, Interval/Ratio - Statistics Help
The kind of graph and analysis we can do with specific data is related to the type of data it is. In this video we explain the different levels of data, with examples.
Factor Analysis - Case I (outliers, missing values, assumptions testing and analysis)
Basic intro to factor analysis. Important concepts discussed includes sample size, missing values, dealing with multivariate outliers, univariate and multivariate ...
Modeling Default Risk for Business Loans | Data Dialogs 2016
Square Capital is Square's lending arm, providing capital to merchants in a fast, fair, and intelligent manner. The Capital Data Science team focuses on ...
Overview & application to field data
Mike Warner, Imperial College - London Directors: Liacir Lucena, Universidade Federal do Rio Grande do Norte (Brazil) Mike Warner, Imperial College London ...
Part 1 - Using Excel for Open-ended Question Data Analysis
Completing data analysis on open-ended questions using Excel. For analyzing multiple responses to an open-ended question see Part 2: ...
28: Data Scientist – Artificial Intelligence and Deep Learning – Kyle Ambert
Check out the website!! Kyle Ambert earned a Bachelor of Arts in Psychology and a PHD in Bioinformatics and is currently a senior ..
Intro and Getting Stock Price Data - Python Programming for Finance p.1
Welcome to a Python for Finance tutorial series. In this series, we're going to run through the basics of importing financial (stock) data into Python using the ...
Scalable Infrastructure Integrating NGS & Other Data to Power Discovery & Analysis
Moderator: - Brady Davis, DNAnexus Panelists: - Sorena Nadaf, City of Hope - David Fenstermacher, MedImmune - Greg Tranah, Sutter Health - Laura ...
Outliers Detection in Time Series w Cassandra & Spark (Jean Armel Luce, Orange) | C* Summit 2016
An outlier in time series data is often a signal that must be addressed. Domains where outliers detection can give noteworthy informations are various: ...