AI News, Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation

Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation

We developed a solution called Kernel Projection Kalman Filter for business applications that require static or dynamical, dynamical or time-varying dynamical, linear or non-linear Machine Learning, i.e., pretty much all applications - therefore, Kernel Projection Kalman Filter is a 'universal' solution .

Given a set of inputs and outputs, find a static map between the two during supervised “Training” and use this static map for business purposes during “Operation” (which is called “Testing” during pre-operation evaluation).

In real life, static is hardly the case ...  Before we proceed further, it will be useful to review my blog, “Prediction – the other dismal science?”, where we discussed “detection” and “prediction”.

Also, we know that ML learns a “map” that relates the input and output of a System – if the Systems does not change (remains static), static maps can be used for Detection and Prediction during the operational phase.

Also, with the Static ML solution, when “abnormal” is indicated, prediction of what the machine’s condition (via ML “map” output) will not be possible since the System has changed (as the detection of Abnormal indicated). In summary, Static ML is adequate for one-off detection (and subsequent offline intervention) IF your business has a high tolerance for false positives.

Kalman Predictor in the “In-Stream” or operational phase provides the following: At a simple level, if we move all the decision-making we did with Predictor output (and thresholding) to State trajectories, IoT solution performance will be better due to the less volatile nature of States;

With Static ML, the option would have been (1) to train Static ML for multiple types of work pieces separately and switch the map when work piece is switched – a tedious and error-prone solution or (2) train the Static ML map for all potential work piece types – which will “smear out” the map and make it less accurate overall; in probabilistic terms, this is caused by non-homogeneity of data beyond heteroscedasticity.

If you contribute to the view that true learning is “generalization from past experience AND the results of new action” and therefore ML business solutions ought to be like flu shots (adjust the mix and apply on a regular basis), then every ML application is a case of Dynamical Machine Learning.

Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation

We developed a solution called Kernel Projection Kalman Filter for business applications that require static or dynamical, dynamical or time-varying dynamical, linear or non-linear Machine Learning, i.e., pretty much all applications - therefore, Kernel Projection Kalman Filter is a 'universal' solution .

Given a set of inputs and outputs, find a static map between the two during supervised “Training” and use this static map for business purposes during “Operation” (which is called “Testing” during pre-operation evaluation).

In real life, static is hardly the case ...  Before we proceed further, it will be useful to review my blog, “Prediction – the other dismal science?”, where we discussed “detection” and “prediction”.

Also, we know that ML learns a “map” that relates the input and output of a System – if the Systems does not change (remains static), static maps can be used for Detection and Prediction during the operational phase.

Also, with the Static ML solution, when “abnormal” is indicated, prediction of what the machine’s condition (via ML “map” output) will not be possible since the System has changed (as the detection of Abnormal indicated). In summary, Static ML is adequate for one-off detection (and subsequent offline intervention) IF your business has a high tolerance for false positives.

Kalman Predictor in the “In-Stream” or operational phase provides the following: At a simple level, if we move all the decision-making we did with Predictor output (and thresholding) to State trajectories, IoT solution performance will be better due to the less volatile nature of States;

With Static ML, the option would have been (1) to train Static ML for multiple types of work pieces separately and switch the map when work piece is switched – a tedious and error-prone solution or (2) train the Static ML map for all potential work piece types – which will “smear out” the map and make it less accurate overall; in probabilistic terms, this is caused by non-homogeneity of data beyond heteroscedasticity.

If you contribute to the view that true learning is “generalization from past experience AND the results of new action” and therefore ML business solutions ought to be like flu shots (adjust the mix and apply on a regular basis), then every ML application is a case of Dynamical Machine Learning.

NEXT Machine Learning Paradigm: “DYNAMICAL"​ ML

In the process of DYNAMICAL machine learning (DML) applied to industrial IoT, the data model and the algorithms used (Generalized Dynamical Machine Learning) naturally generates what is called the “State-space” model of the machine.

It may not *look* like the machine but it captures the dynamics in all its detail (there can be challenges in relating “states” to actual machine components though).

By “comparing” output predictions or classes and input features, one can “figure out” the relationship or the “map” between the two – this is what a Machine Learning algorithm does.

What we are seeing is that the map is “dynamical” in the sense that it may be moving from the down-sloping line to the middle to the up-sloping line and slowly doing this dance!

4.Health and Life Sciences: Predictive models monitoring vital signs to send an alert to the right set of doctors and nurses to take an action.

In the in-stream portion, when “supervisor” or “desired” data are available (stock price prediction case: at the end of the day *actual* prices are known), this information can be used to “learn” via Exact Recursive updates (shown by purple arrows).

full description of Rocket Kalman structure is beyond our scope here but the time-varying state-space data model, non-linear projection onto a higher dimension (M >

DML as an Improvement to “Train then Test” ML Once we recognize that no map is ever truly static and that static map is only an approximation of a truly dynamical map, DML can be exploited for traditional ML applications where input-output pairs are first used to reverse engineer the map which is then reused with new and unseen data.

Here, the arrival sequence of inputs is important since faults will, in all likelihood, increase in amplitude and frequency of occurrence as time passes before ending in a machine failure (abrupt failures are not predictable, however sophisticated our ML is!).

Then when you want to predict failure in a different machine, inputs from this test machine (“Test set”) can be compared to the machine used for training for *temporal location similarity*.

in the approach described here using DML, we use the “video frame” that is the *best match*, thus opening up the possibility of significantly better results!

Static DYNAMICAL Machine Learning – What is the Difference?

  In an earlier blog, “Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation”, I introduced the need for Dynamical ML as we now enter the “Walk” stage of “Crawl-Walk-Run” evolution of machine learning.

First, I defined Static ML as follows: Given a set of inputs and outputs, find a static map between the two during supervised “Training” and use this static map for business purposes during “Operation”.

If you contribute to the view that true learning is “generalization from past experience AND the results of new action” and therefore ML business solutions ought to be like flu shots (adjust the mix based on response effect and apply on a regular basis), then every ML application is a case of Dynamical Machine Learning.

The main confusion seems to be due to the fact that learning rules in most Static ML algorithms are step-by-step or iterative – is this not a case of “dynamic” learning?

Given N pairs of {x, y}, where as usual, ‘x‘ are the input features and ‘y’ is the desired output in the Training Set, our problem is to find ‘f’ where ‘f’ is a linear or nonlinear and time-invariant or time-varying function.

complete solution from Bayesian estimation perspective of the unknown function, f, is the Conditional Expectation (mean) of y given regressor, x.

Map, “f”, that we find can be in two forms: (1) Parametric where “f” can be explicitly written out (such as a multiple linear regression equation) or (2) Non-parametric where “f” is not explicitly defined (such as multi-layer perceptron, deep neural network, etc.).

Once “learned”, this model or map is used with new data for useful tasks such as classification and regression during “operation”.

Multiple Linear Regression Model: y = a0 + a1 x1 + a2 x2 + .

y[n] =  a0 + a1 x1[n] + a2 x2[n] + .

State-space Model: s[n] = A s[n-1] + D q[n-1] y[n] = H[n] s[n] + r[n]  

Detailed discussion of such models is available in the book, “SYSTEMS Analytics”, but we will note here that output, y, is a function of ‘s’ (so-called “States”) and the first equation shows that these States evolve according to a Markov process.

SYSTEMS Analytics book under Formal Learning Methods section (page 51) explains more details – Jacobian and Hessian matrices come into play in advanced Gradient Search methods.

Once the conditional posterior distribution of ‘y’ in the State-space data model is estimated, we still need its “point” estimate for practical applications (“what is the class label?”).

Such scalar-valued result is obtained from Statistical Decision Theory by minimizing a loss function that incorporates the posterior distribution (as an example, for a quadratic loss function, the optimum is the expected value which will result in Minimum Mean Squared Error).

In this configuration for pattern classification, attributes/ features, u[n], that may have temporal dependence are processed (once) in real-time as each u[n] arrives.

The Real-time Recursive algorithm (Kalman with modifications for the time-varying dynamical case) processes the current data (and “memorized” past inputs and outputs) and provides a “class label” prediction sequentially.

Introduction to Anomaly Detection

Simple Statistical Methods The simplest approach to identifying irregularities in data is to flag the data points that deviate from common statistical properties of a distribution, includingmean, median, mode, and quantiles.

you can find a very intuitive explanation of ithere.) Challenges The low pass filter allows you to identify anomalies in simple use cases, but there are certain situations where this technique won't work.

The nearest set of data points are evaluated using a score, which could be Eucledian distance or a similar measure dependent on the type of the data (categorical or numerical).

The algorithm learns a soft boundary in order to cluster the normal data instances using the training set, and then, using the testing instance, it tunes itself to identify the abnormalities that fall outside the learned region.

Mod-08 Lec-32 Linear Stochastic Dynamics - Kalman Filter

Dynamic Data Assimilation: an introduction by Prof S. Lakshmivarahan,School of Computer Science,University of Oklahoma.For more details on NPTEL visit ...

Forecasting Stock Returns with TensorFlow, Cloud ML Engine, and Thomson Reuters

Learn how to build a stock price forecasting model using Thomson Reuters tick data in BigQuery, and how to do distributed training of the model using ...

YOLO Object Detection (TensorFlow tutorial)

You Only Look Once - this object detection algorithm is currently the state of the art, outperforming R-CNN and it's variants. I'll go into some different object ...

Thermal Signature Detection using K-Means Segmentation # 2

Thermal Signature Detection using K-Means Segmentation. SVM may be applied later to candidates for classification. Note: This video is for educational and ...