AI News, What are Playbooks and Why Did We Create Them?

What are Playbooks and Why Did We Create Them?

Today, DataScience announced the launch of Playbooks, a comprehensive collection of instructional content designed to give DataScience Cloud clients the information they need to build powerful and predictive data models from the ground up.

With Playbooks, data science teams have a leg up in solving their most challenging data science problems with code libraries, vetted technical and academic articles, notebooks, and live educational seminars.

Our in-house data scientists have created this robust knowledge repository and service offering to take our clients from the first step of identifying the algorithms that are applicable to their business use cases, to building and deploying models, to leveraging model outputs across various systems and tools.

Trailblazers are moving away from pre-canned, vertical-specific applications and starting to build internal data science teams in order to bring advanced model building in house.

Though the challenges for these two groups differ, the solution in each case is similar: They both need the ability to build better models.TheDataScience.com Platform provides the platform on which to build and deploy models, but Playbooks are the secret sauce that gives our clients the right tools for a successful model build.

Our curated knowledge base allows you to quickly dig deeper on a particular model or methodology — without spending hours sifting through irrelevant material.

Field Guide to Data Science

Country*Please selectUnited StatesUnited KingdomCanadaIndiaNetherlandsAustraliaSouth AfricaFranceGermanySingaporeSwedenBrazil--------------AfghanistanÅland IslandsAlbaniaAlgeriaAmerican SamoaAndorraAngolaAnguillaAntarcticaAntigua and BarbudaArgentinaArmeniaArubaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBeninBermudaBhutanBoliviaBosnia and HerzegovinaBotswanaBouvet IslandBrazilBrit/Indian Ocean Terr.Brunei DarussalamBulgariaBurkina FasoBurundiCambodiaCameroonCanadaCape VerdeCayman IslandsCentral African RepublicChadChileChristmas IslandCocos (Keeling) IslandsColombiaComorosCongoCongo, The Dem.

Republic OfCook IslandsCosta RicaCôte D'IvoreCroatiaCubaCyprusCzech RepublicDenmarkDjiboutiDominicaDominican RepublicEcuadorEgyptEl SalvadorEquatorial GuineaEstoniaEthiopiaFalkland IslandsFaroe IslandsFijiFinlandFranceFrench GuianaFrench PolynesiaFrench Southern Terr.GabonGambiaGeorgiaGermanyGhanaGibraltarUnited KingdomGreeceGreenlandGrenadaGuadeloupeGuamGuatemalaGuineaGuinea-BissauGuyanaHaitiHeard/McDonald Isls.HondurasHong KongHungaryIcelandIndiaIndonesiaIraqIrelandIsraelItalyJamaicaJapanJordanKazakhstanKenyaKiribatiKorea (South)KuwaitKyrgyzstanLaosLatviaLebanonLesothoLiberiaLibyaLiechtensteinLithuaniaLuxembourgMacauMacedoniaMadagascarMalawiMalaysiaMaldivesMaliMaltaMarshall IslandsMartiniqueMauritaniaMauritiusMayotteMexicoMicronesiaMoldovaMonacoMongoliaMontserratMoroccoMozambiqueMyanmarN.

Azure AI guide for predictive maintenance solutions

Predictive maintenance (PdM) is a popular application of predictive analytics that can help businesses in several industries achieve high asset utilization and savings in operational costs.

This guide brings together the business and analytical guidelines and best practices to successfully develop and deploy PdM solutions using the Microsoft Azure AI platform technology.

The main content of the guide is on the data science process - including the steps of data preparation, feature engineering, model creation, and model operationalization.

To complement these key concepts, this guide lists a set of solution templates to help accelerate PdM application development.

The first half of this guide describes typical business problems, the benefits of implementing PdM to address these problems, and lists some common use cases.

The second half explains the data science behind PdM, and provides a list of PdM solutions built using the principles outlined in this guide.

These assets could range from aircraft engines, turbines, elevators, or industrial chillers - that cost millions - down to everyday appliances like photocopiers, coffee machines, or water coolers.

Businesses face high operational risk due to unexpected failures and have limited insight into the root cause of problems in complex systems.

Some of the key business questions are: Typical goal statements from PdM are: These goal statements are the starting points for: It is important to emphasize that not all use cases or business problems can be effectively solved by PdM.

There are three important qualifying criteria that need to be considered during problem selection: This section focuses on a collection of PdM use cases from several industries such as Aerospace, Utilities, and Transportation.

Each section starts with a business problem, and discusses the benefits of PdM, the relevant data surrounding the business problem, and finally the benefits of a PdM solution.

Predictive models learn patterns from historical data, and predict future outcomes with certain probability based on these observed patterns.

The feature characteristics (type, density, distribution, and so on) of new data should match that of the training and test data sets.

Consider the wheel failure use case discussed above - the training data should contain features related to the wheel operations.

If the problem was to predict the failure of the traction system, the training data has to encompass all the different components for the traction system.

The general recommendation is to design prediction systems about specific components rather than larger subsystems, since the latter will have more dispersed data.

Two questions are commonly asked with regard to failure history data: (1) "How many failure events are required to train a model?"

The quality of the data is critical - each predictor attribute value must be accurate in conjunction with the value of the target variable.

However, when building prediction models, the algorithm needs to learn about a component's normal operational pattern, as well as its failure patterns.

Examples are the equipment make, model, manufactured date, start date of service, location of the system, and other technical specifications.

Examples of relevant data for the sample PdM use cases are tabulated below: Given the above data sources, the two main data types observed in PdM domain are: Predictor and target variables should be preprocessed/transformed into numerical, categorical, and other data types depending on the algorithm being used.

As a prerequisite to feature engineering, prepare the data from various streams to compose a schema from which it is easy to build features.

Each row in the table represents a training instance, and the columns represent predictor features (also called independent attributes or variables).

For static data, Other data preprocessing steps include handling missing values and normalization of attribute values.

With the above preprocessed data sources in place, the final transformation before feature engineering is to join the above tables based on the asset identifier and timestamp.

In contrast, PdM involves predicting failures over a future time period, based on features that represent machine behavior over historical time period.

This section discusses lag features that can be constructed from data sources with timestamps, and feature creation from static data sources.

But for certain problems, picking a large W (say 12 months) can provide the whole history of an asset until the time of the record.

Rolling aggregate features Examples of rolling aggregates over a time window are count, average, CUMESUM (cumulative sum) measures, min/max values.

Another useful technique in PdM is to capture trend changes, spikes, and level changes using algorithms that detect anomalies in data.

k can be a small number to capture short-term effects, or a large number to capture long-term degradation patterns.

Some examples for the circuit breaker use case are voltage, current, power capacity, transformer type, and power source.

Binary classification is used to predict the probability that a piece of equipment fails within a future time period - called the future horizon period X.

Multi-class classification techniques can be used in PdM solutions for two scenarios: The question here is: "What is the probability that an asset will fail in the next nZ units of time where n is the number of periods?"

Labeling for multi-class classification for failure time prediction The question here is: "What is the probability that the asset will fail in the next X units of time due to root cause/problem Pi?"

To answer this question, label X records prior to the failure of an asset as "about to fail due to root cause Pi"

Labeling for multi-class classification for root cause prediction The model assigns a failure probability due to each Pi as well as the probability of no failure.

This technique helps limit problems like overfitting and gives an insight on how the model will generalize to an independent data set.

The training and testing routine for PdM needs to take into account the time varying aspects to better generalize on unseen future data.

Hence, if the dataset is split randomly into training and validation set, some of the training examples may be later in time than some of validation examples.

The recommended way is to split the examples into training and validation set in a time-dependent manner, where all validation examples are later in time than all training examples.

Hyperparameter values chosen by train/validation split result in better future model performance than with the values chosen randomly by cross-validation.

The final model can be generated by training a learning algorithm over entire training data using the best hyperparameter values.

A good estimate is the performance metric of hyperparameter values computed over the validation set, or an average performance metric computed from cross-validation.

When time-series are stationary and easy to predict, both random and time-dependent approaches generate similar estimations of future performance.

But when time-series are non-stationary, and/or hard to predict, the time-dependent approach will generate more realistic estimates of future performance.

For example, for binary classification, create features based on past events, and create labels based on future events within "X"

For time-dependent split, pick a training cutoff time Tc at which to train a model, with hyperparameters tuned using historical data up to Tc.

To prevent leakage of future labels that are beyond Tc into the training data, choose the latest time to label training examples to be X units before Tc.

In the example shown in Figure 7, each square represents a record in the data set where features and labels are computed as described above.

Time-dependent split for binary classification The green squares represent records belonging to the time units that can be used for training.

The black squares represent the records of the final labeled data set that should not be used in the training data set, given the above constraint.

Other domains where failures and anomalies are rare occurrences face a similar problem, for examples, fraud detection and network intrusion.

With class imbalance in data, performance of most standard learning algorithms is compromised, since they aim to minimize the overall error rate.

For a data set with 99% negative and 1% positive examples, a model can be shown to have 99% accuracy by labeling all instances as negative.

Random oversampling involves selecting a random sample from minority class, replicating these examples, and adding them to training data set.

To mitigate the problem of unequal loss, assign a high cost to mis-classification of the minority class, and try to minimize the overall cost.

For instance, a decision to ground an aircraft based on an incorrect prediction of engine failure can disrupt schedules and travel plans.

Typical performance metrics used to evaluate PdM models are discussed below: For binary classification, The benefit the data science exercise is realized only when the trained model is made operational.

The new data must exactly conform to the model signature of the trained model in two ways: The above process is stated in many ways in academic and industry literature.

Scenarios involving anomaly detection and failure detection typically implement online scoring (also called real time scoring).

There are a couple of alternatives - both suboptimal: The final section of this guide provides a list of PdM solution templates, tutorials, and experiments implemented in Azure.

The Azure AI learning path for predictive maintenance provides the training material for a deeper understanding of the concepts and math behind the algorithms and techniques used in PdM problems.

In addition, free MOOCS (massive open online courses) on AI are offered online by academic institutions like Stanford and MIT, and other educational companies.

How to Build a Data Analytics Strategy by LinkedIn Product Manager

Data Analytics in Product Management Event in San Francisco about how to build a data analytics strategy. Subscribe here: Follow ..

Predictive Workforce Playbook: Building a Business Case for Analytics

Greta Roberts, co-founder and CEO of Talent Analytics, discusses challenges that many organizations face when working to implement an analytics program ...

Game theory lessons - Historical example: Tobacco companies

Support us on Indiegogo and get early access to the 365 Data Science Program! On Facebook: ..

The Digital:Lab - Digital Transformation of the Volkswagen Group

The Volkswagen Digital:Lab, based in Berlin, is part of the group's approach to take on the challenges posed by the digital transformation of the company. Data ...

The Lean Product Playbook with Dan Olsen in Silicon Valley

Product Management event in Silicon Valley about The Lean Product Playbook. Subscribe here: Follow us on Twitter: ..

Agency KPIs, How to Keep up with Technology & What Should Be in Agency Contracts? #AskSwenk ep32

This Week's Questions: (0:25) What KPI's does a digital marketing agency need to measure? (2:10) You are on pretty much every channel. I've listened to you ...

MSBI Tutorials for Beginners | Business Intelligence Tutorial | Learn MSBI | MSBI Training | Edureka

This Edureka MSBI Tutorial video will help you learn the basics of MSBI. This powerful suite is composed of tools which helps in providing best solutions for ...

Gaming Analytics with Firebase

In this talk Abe Haskins dives into the advantages of using Firebase Analytics to report user behaviour in your mobile games. He'll cover how Firebase Analytics ...

Yael Garten, LinkedIn | Women in Data Science 2017

Yael Garten, Director of Data Science at LinkedIn, sits down with host Lisa Martin at Women in Data Science 2017, at Stanford University in Palo Alto, California.

#170: Culture Change and Digital Transformation with Alex Osterwalder and Dave Gray

170: Culture Change and Digital Transformation with Alex Osterwalder and Dave Gray Culture change is a key for any digital transformation initiative. The shift ...