AI News, What is it like to be a machine learning engineer in 2018?

What is it like to be a machine learning engineer in 2018?

There are so many tools, platforms and resources available, MLEs can focus their time on solving problems critical to their field or company instead of worrying about building platforms and hand rolling numerical algorithms.

Google Cloud has easy means of building and deploying TensorFlow models including their new TPU support in beta, AWS has an ever evolving suite of deep learning AMIs and Nvidia has a great deep learning SDK.

Such collaboration empowers researchers and engineers to quickly prototype models and share results across platforms and languages.

ArXiv (and bioRxiv) continues to grow its adoption, Coursera courses span linear algebra, machine learning and deep learning, and countless blog posts are published everyday with amazing visualizations and interpretations of modern research.

This is certainly not true of everyone, but I feel the average MLE solution is trending towards “how do I apply CNNs or LSTMs to this problem?” Keeping on top of research is getting exponentially harder as the field grows.

There are now deployed apps for recognizing objects, synthesizing speech and translating signs in foreign languages - all of which perform at human level accuracy.

There are great tools, databases, queues and ETL frameworks to help with this, but fundamentally data wrangling still involves manually writing per-problem schemas, partitions, etc..

Discussing the future of ML is not inherently bad, it helps spark ethical discussions that are more relevant to today’s research and it helps the ML community reach out to a broader audience.

Toyota Research Institute accelerates safe automated driving with deep learning at a global scale on AWS

Guardian mode requires the driver to have hands on the wheel and eyes on the road at all times, while it constantly monitors the driving environment, inside and out, intervening only when necessary when it perceives a potential crash.

Developing and deploying autonomous vehicles requires the ability to collect, store, and manage massive amounts of data, high performance computing capacity, and advanced deep learning techniques, along with the capability to do real-time processing in the vehicle.

To gather data used in their deep learning models, TRI has a fleet of test cars equipped with various types of data acquisition sensors such as cameras, radar, and LIDAR (a technique used in control and navigation to generate object representations in 3D space).

These simulations generate photo-real data streams that test how their machine learning models react to demanding cases such as rainstorms, snowstorms, and sharp glare at different times of the day and night, with different road surfaces and surroundings.

As new test data becomes available, TRI rapidly explores research ideas and trains their models quickly so they can deploy updated versions on their test cars and rerun tests.

This significantly accelerates our research and development velocity as we can quickly incorporate new data and retrain models, explore ideas, increase model accuracy, and introduce new features faster,” says Adrien Gaidon, PhD, Machine Learning Lead, Toyota Research Institute.

They help accelerate model training times to only a few hours or minutes, enabling data scientists and machine learning engineers to iterate faster, train more models, and build a competitive edge into their applications.

Using deep learning on Amazon EC2 P3 instances, Amazon S3, Amazon SQS, and AWS networking services, TRI built a scalable solution to enable their development teams to make rapid progress and deliver on their grand vision of applying AI to help Toyota produce cars that are safer, and get closer to realizing a future without traffic injuries or fatalities.

List of datasets for machine learning research

Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively, the availability of high-quality training datasets.[1]

High-quality labeled training datasets for supervised and semi-supervised machine learning algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data.

As datasets come in myriad formats and can sometimes be difficult to use, there has been considerable work put into curating and standardizing the format of datasets to make them easier to use for machine learning research.

JMIR Publications

To improve health outcomes and trim health care costs, we often need to perform predictions/classifications using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions.

Machine learning studies computer algorithms, such as support vector machine, random forest, neural network, and decision tree, that learn from data [1].

Trials showed machine learning was used to help the following: (1) lower 30-day mortality rate (odds ratio [OR]=0.53) in emergency department (ED) patients having community-acquired pneumonia [2];

(2) increase on-target hemoglobin values by 8.5%-17% and reduce cardiovascular events by 15%, hospitalization days by 15%, blood transfusion events by 40%-60%, expensive darbepoetin consumption by 25%, and hemoglobin fluctuation by 13% in end-stage renal disease patients on dialysis [3-6];

(3) reduce ventilator use by 5.2 days and health care costs by US $1500 per patient at a hospital respiratory care center [7];

Compared to statistical methods like logistic regression, machine learning poses less strict assumptions on distribution of data, can increase prediction/classification accuracy, in certain cases doubling it [10-12], and has won most data science competitions [13].

Facing a shortage in the United States of data scientists estimated as high as 140,000+ by 2018 [16] and hiring competition from companies with deep pockets, health care systems have a hard time recruiting data scientists [17,18].

As detailed below, developing a machine learning model often requires data scientists to spend extensive time on model selection, which becomes infeasible with limited budgets.

This is often done with the help of database programmers and/or master-level statisticians, who can also help with data preprocessing and are easier to find than data scientists with deep machine learning knowledge.

Each learning algorithm includes two categories of parameters: hyper-parameters that a machine learning tool user manually sets prior to model training, and normal parameters automatically tuned in training the model (see Table 1).

In case model accuracy is unsatisfactory, substitute the algorithm and/or hyper-parameter values and then retrain the model, while using some technique to avoid overfitting on the validation set [20-24].

If feature selection is considered, in each iteration the user also needs to choose a feature selection technique from many applicable ones and set its hyper-parameter values, making this process even more complex.

But in case a large number of algorithms are examined, efforts like Auto-WEKA [25,29-31], hyperopt-sklearn [28], and MLbase [32,33] cannot effectively handle large datasets in reasonable time.

On a modern computer, it takes 2 days to train the champion ensemble model that won the Practice Fusion Diabetes Classification Challenge [34] one time on 9948 patients with 133 input or independent variables (aka, features).

Even when disregarding ensembles of more than five base models, aborting long-running tests, and greatly limiting the hyper-parameter value search space (eg, allowing no more than 256 decision trees in a random forest), all impacting search result quality, more than 30 minutes are needed to test an average combination on 12,000 rows (ie, data instances) with 784 attributes [35].

To ensure search result quality, automation efforts often test more than 1000 combinations on the whole dataset [35], leading to months of search time.

Irrespective of whether search is done manually or automatically, a slow speed in search frequently causes a search to be terminated early, producing suboptimal model accuracy [35].

Numerous clinical attributes are documented over time needing aggregation prior to machine learning (eg, weight at each patient visit is combined to check whether a patient’s weight kept rising in the previous year).

In case model accuracy is unsatisfactory, the analyst substitutes pairs for some attributes and reconstructs the model, while using some technique to avoid overfitting on the validation set [20-24].

model that is built and is accurate in a health care system often performs poorly and needs to be rebuilt for another system [37], with differing patients, practice patterns, and collected attributes impacting model selection [38,39].

To realize value from data, we need new approaches to enable health care researchers to directly use clinical big data and make machine learning feasible with limited budgets and data scientist resources.

To fill the gap, we will (1) finish developing the open source software, Automated Machine Learning (Auto-ML), to efficiently automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance, (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers, and (3) perform simulations to estimate the impact of adopting Auto-ML on US patient outcomes.

This expands the human resource pool for clinical machine learning and aligns with the industry trend of citizen data scientists, where an organization arms its talent with tools to do deep analytics [40].

To improve patient identification and outcomes for care management, Aim 2 involves applying Auto-ML to two new modeling problems by doing the following: (1) use a health care system’s incomplete medical (ie, clinical and/or administrative) data to find future high-cost, diabetic patients and (2) use vast attributes in modern electronic medical records to find future hospital users in asthmatic patients.

Widely used for chronic diseases like asthma and diabetes, care management applies early interventions to high-risk patients to avoid high costs and health status decline [41-43].

Every year, asthma causes 1.8 million ED visits, 439,000 hospitalizations, US $56 billion in health care costs [47], and 3630 deaths [44].

Predictive modeling is widely used for care management [56] as the best method for finding high-risk patients [57], but current approaches have two gaps, as discussed below.

First, several dozen risk factors for hospital use in asthma are known, including age, gender, race/ethnicity, asthma medication use, prior health care use, comorbidities (eg, ischemic heart disease, rhinitis, sinusitis, reflux, anxiety-depression, diabetes, cataracts, chronic bronchitis, and chronic obstructive pulmonary disease), allergies, lung function, number of asthma medication prescribers as a measure of continuity of care, health insurance type, lab test results (eg, total serum immunoglobulin E level and eosinophil count), body mass index, smoking status, secondhand smoke exposure, the ratio of controller to total asthma medications, frequency of nonasthma visits, number of procedures, number of diagnoses, number of prescription drug claims, and asthma questionnaire results (eg, frequency of asthma symptom occurrence, interference with normal activity, nighttime awakening, reliever use for symptom control, forced expiratory volume in 1 second [FEV1], peak expiratory flow rate, FEV1/forced vital capacity ratio, asthma control test score, number of exacerbations last year, controller use, asthma-related acute care, asthma trigger reduction, and asthma medication) [55,63,65,67-73].

With the new software that will be built as part of our project, for the first time, health care researchers with limited machine learning knowledge will quickly be able to build high-quality machine learning models with minimal help from data scientists.

Existing models for predicting hospital use in asthmatic patients were built mainly using a small set of patients (eg, <1000) or attributes (eg, <10), creating a hurdle in finding many predictive attributes and their interactions.

Our software will (1) automatically choose hyper-parameter values, feature selection techniques, and algorithms for a particular machine learning problem faster than existing methods;

(3) continuously show, as a function of time given for model selection, estimated patient outcomes of model use and forecasted model accuracy—for the first time, one can obtain feedback continuously throughout automatic model selection;

In summary, this study is significant in that it makes machine learning feasible with limited budgets and data scientist resources to help realize value from clinical big data and improve patient outcomes.

Our first aim is to finish developing Auto-ML to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance.

Our review paper [26] showed that few automatic selection methods [25,28-31,82] have been fully implemented and can manage an arbitrary number of combinations of hyper-parameter values and many learning algorithms.

inefficiencies, we drafted a method based on Bayesian optimization for response surface to rapidly identify, for a specific modeling problem, a good combination of hyper-parameter values, a feature selection technique, and a learning algorithm when a large number of algorithms and techniques are examined [35,83].

For each remaining algorithm, we construct a separate regression model, use a Bayesian optimization for response surface approach to choose several new hyper-parameter value combinations, and test these combinations.

To help avoid overfitting, we will use two validation samples of equal size with as little overlap as possible, and reduce the frequency of revealing information about the second validation sample [23].

For a combination of hyper-parameter values and a learning algorithm, we use the combination and the training sample to construct a model and assess the model’s error rate twice, once on either validation sample.

If the error rate on the first validation sample is higher than a specific threshold (eg, in the top 50% of the error rates on the first validation sample of all combinations tested so far at this stage), we use it as the combination’s error rate estimate.

To tackle this issue, we previously proposed that before doing tests, we apply a feature selection technique to the dataset, or a large sample of it, and rapidly drop features not likely to have high predictive power [24].

If the number for a feature evaluator or feature search method is smaller than a specific threshold (eg, 3), we will conduct more tests for the feature evaluator or feature search method to make up the difference.

At the end of each stage except for the last one, we will identify a prechosen number n1 (eg, 3) of combinations of algorithms, techniques, and hyper-parameter values that achieve the lowest error rates among all combinations examined so far.

Krueger et al [87] used a similar approach to perform fast cross-validation to select a good hyper-parameter value combination for a given learning algorithm and modeling problem.

To tackle this, the automated temporal aggregation function of Auto-ML demands that the dataset, except for the dependent variable, complies with the Observational Medical Outcomes Partnership (OMOP) common data model [88] and its linked standardized terminologies [89].

For Aim 1 (c), we aim to continuously show, as a function of time given for model selection, forecasted model accuracy and projected patient outcomes of model use.

During automatic selection, to be more useful and user friendly, Auto-ML will show projected patient outcomes of model use and forecasted model accuracy as a function of time given for model selection (see Figure 3).

Via announcements in our institution’s email lists and personal contact, we will recruit 25 health care researchers from UWM, which houses approximately 2500 faculty members, most doing health care research.

These health care researchers would regard their familiarity with medical data at the MD level, but would regard their machine learning knowledge as below the level taught in a typical machine learning course for computer science undergraduates.

The clinical and administrative dataset is deidentified and publicly available from the Practice Fusion Diabetes Classification Challenge [15,34], containing 3-year (2009-2012) records as well as the labels of 9948 adult patients from all US states in the following year.

Each of the six problems from Modeling Problems 2-7 uses a distinct, deidentified, and publicly available dataset from the University of California, Irvine machine learning repository [95] to perform a task: (1) Arcene: classify mass spectrometric data into cancer versus normal patterns;

When 60% of health care researchers can actually achieve model accuracy of at least 95% of the gold standard, a sample size of 25 health care researchers produces a one-sided 95% lower confidence limit of 42%.

As detailed in our design paper [83], we will obtain quantitative outcome measures covering model accuracy, time on task, self-efficacy for constructing machine learning models with clinical big data, satisfaction, trustworthiness, adequacy, and quality of documentation.

Overview Aim 2 involves applying Auto-ML and novel methodology to two new modeling problems crucial for care management allocation, to which our institutions are seeking solutions, and pilot one model with care managers.

The patient population consists of IH pediatric patients (0-17 years of age) with asthma in 2005-2016, identified by Schatz et al’s method [63,100,101] as having the following: (1) at least one diagnosis code of asthma according to the International Classification of Diseases, Ninth Revision (ICD-9) (ie, 493.xx), or the International Classification of Diseases, Tenth Revision (ICD-10) (ie, J45/J46.*);

or (2) two or more “asthma-related medication dispensings (excluding oral steroids) in a one-year period, including β-agonists (excluding oral terbutaline), inhaled steroids, other inhaled anti-inflammatory drugs, and oral leukotriene modifiers.”

By running Oracle database Structured Query Language (SQL) queries, our contracted IH data analyst will extract from the IH EDW a deidentified, clinical and administrative dataset, encrypt it, and securely transfer it to a HIPAA-compliant computer cluster for secondary analysis.

The dependent variable is whether an asthmatic patient incurred hospital use—inpatient stay or ED visit—with a primary diagnosis of asthma (ie, ICD-9 493.xx or ICD-10 J45/J46.*) in the following year [14,63,64].

The patient population includes UWM adult patients (18 years of age or older) with diabetes in 2012-2016, identified by the method in Neuvirth et al [104] as having one or more hemoglobin A1c test results of 6.5% or higher.

UWM data analyst will run SQL Server database SQL queries to extract from the UWM EDW a deidentified, clinical and administrative dataset, encrypt it, and securely transfer it to a HIPAA-compliant computer cluster for secondary analysis.

Several candidate constraints exist: (1) the patient had two or more visits to UWM in the past year, (2) the patient has a UWM primary care physician and lives within 5 miles of a UWM hospital, and (3) the patient saw a primary care physician or endocrinologist at UWM in the past year and lives within 60 miles (ie, around 1 hour of driving distance) of a UWM hospital.

Using UWM data and grouper models like the Clinical Classifications Software system to group diagnosis codes and reduce features [60], we will build two models: one for estimating an inpatient stay’s allowed cost and another for estimating an ED visit’s allowed cost based on patient demographics and diagnosis data.

By aggregating the estimated costs of individual non-UWM inpatient stays and ED visits, we will assess each UWM patient’s portion of cost spent at non-UWM hospitals and use the portions to evaluate every candidate constraint.

If a health care system does not have enough data to make the two models reasonably accurate, it can use the average costs of an inpatient stay and ED visit to assess each patient’s portion of cost spent at external hospitals.

prediction results have a correlation coefficient of .6 for both classes and performing a two-sided Z test at a significance level of .05, a sample size of 561 data instances per class provides 90% power to find a discrepancy of .05 between the two models’

Using an F test at a significance level of .05 and under the assumption of the existence of 20 features from clinical data in addition to 300 or fewer features used in the second model, a sample size of 443 patients provides 90% power to identify an increase of 5% in R2 from 20%.

For each patient, we will first show the care manager the historical, deidentified patient attributes, then show the prediction result and automatically generated explanations, and finally survey him/her using both open-ended and semistructured questions.

For Modeling Problem 9, in our ongoing UWM operational project, we have used around 30 attributes and approximately 6000 patients to build a basic cost prediction model, which achieved an R2 close to that of the commercial claims-based model.

Since the health care researcher will use many more attributes and patients that should increase model accuracy, we expect the cost prediction model built by him/her to achieve a higher R2 than the claims-based model.

Trials showed that machine learning helped drop the 30-day mortality rate in ED patients with community-acquired pneumonia (risk ratio≈OR=0.53, as the mortality rate is much less than 1) [2] and cut hospitalization days by 15% in end-stage renal disease patients on dialysis [3].

Using Aim 1(d)’s test results on whether health care researchers can use Auto-ML to achieve model accuracy of at least 95% of the gold standard, we will conservatively estimate p1’s minimum and maximum values (eg, by fitting a normal distribution and using its 2.5 and 97.5 percentile points).

If success/not success, for each ED patient with community-acquired pneumonia, we will simulate whether the patient will die or not based on the 30-day mortality rate reported in the paper [2] when using/not using machine learning.

In the most conservative case assuming a proportion of discordant pairs of 10%, a sample size of 1152 patients provides 90% power to notice an OR of 0.53 [2] using a two-sided McNemar test at a significance level of .05.

To acquire the whole range of possible outcomes, we will do sensitivity analysis by changing the levels of the probabilities p1 and p2, 30-day mortality rate, and rate reduction gained by machine learning.

The paper shows that compared to the modern Auto-WEKA automatic selection method [25], on six medical and 21 nonmedical benchmark datasets, our draft method reduced search time by 28-fold, classification error rate by 11%, and standard deviation of error rate due to randomization by 36%, on average.

By providing support for common data models (eg, OMOP [88]) and their linked standardized terminologies adopted by a large number of systems, Auto-ML can be used to construct models if attributes required to solve a problem are accessible in a structured dataset or in one of those common data models.

To help users decide whether any data quality issues need to be handled before modeling, Auto-ML will show the numbers of attribute values outside reasonable ranges and numbers of missing values of nonrepeatedly recorded attributes.

By making machine learning feasible with limited budgets and data scientist resources, our new software will help realize value from clinical big data and improve patient outcomes.

What Are Machine Learning Models Hiding?

In a recent series of papers, we uncovered multiple privacy and integrity problems in today’s ML pipelines, especially (1) online services such as Amazon ML and Google Prediction API that create ML models on demand for non-expert users, and (2) federated learning, aka collaborative learning, that lets multiple users create a joint ML model while keeping their data private (imagine millions of smartphones jointly training a predictive keyboard on users’

Our Oakland 2017 paper, which has just received the PET Award for Outstanding Research in Privacy Enhancing Technologies, concretely shows how to perform membership inference, i.e., determine if a certain data record was used to train an ML model.  Membership inference has a long history in privacy research, especially in genetic privacy and generally whenever statistics about individuals are released.  It also has beneficial applications, such as detecting inappropriate uses of personal data.

an adversary can query it and tell from its output if a certain record was used during training.  For example, if a classifier based on a patient study is used for predictive health care, membership inference can leak whether or not a certain patient participated in the study.

They are typically evaluated solely by their test accuracy, i.e., how well they classify the data that they did not train on.  Yet they can achieve high test accuracy without using all of their capacity.  In addition to asking if a model has learned its task well, we should ask what else has the model learned?

Consider a binary gender classifier trained in this way.  By submitting special inputs to this classifier and observing whether they are classified as male or female, the adversary can reconstruct the actual images on which the classifier was trained (the top row is the ground truth):

Train an Image Classifier with TensorFlow for Poets - Machine Learning Recipes #6

Monet or Picasso? In this episode, we'll train our own image classifier, using TensorFlow for Poets. Along the way, I'll introduce Deep Learning, and add context ...

Intro to Amazon Machine Learning

The Amazon Web Services Machine Learning platform is finely tuned to enable even the most inexperienced data scientist to build and deploy predictive models ...

How Machine Learning is Impacting Oil and Gas (Cloud Next '18)

New advances in machine learning are helping organizations optimize operations, maximize output, and automate manual processes. However, the scalability ...

'How neural networks learn' - Part II: Adversarial Examples

In this episode we dive into the world of adversarial examples: images specifically engineered to fool neural networks into making completely wrong decisions!

Progressive uses H2O Predictive Analytics for UBI

Progressive is one of the largest auto insurers in the United States with over 13 million policies in force. Progressive is a pioneer in data analytics with more than ...

Vectors - The Math of Intelligence #3

We're going to explore why the concept of vectors is so important in machine learning. We'll talk about how they are used to represent both data and models.

Customer Successes with Machine Learning (Google Cloud Next '17)

TensorFlow is rapidly democratizing machine intelligence. Combined with the Google Cloud Machine Learning platform, TensorFlow now allows any developer ...

Dimensionality Reduction - The Math of Intelligence #5

Most of the datasets you'll find will have more than 3 dimensions. How are you supposed to understand visualize n-dimensional data? Enter dimensionality ...

Predictive Maintenance & Monitoring using Machine Learning: Demo & Case study (Cloud Next '18)

Learn how to build advanced predictive maintenance solution. Learn what is predictive monitoring and new scenarios you can unlock for competitive advantage.

MIT 6.S094: Deep Learning

This is lecture 1 of course 6.S094: Deep Learning for Self-Driving Cars (2018 version). This class is free and open to everyone. It is an introduction to the practice ...