AI News, QA – Why Machine Learning Systems are Non-testable

QA – Why Machine Learning Systems are Non-testable

This post represents views on why machine learning systems or models are termed as non-testable from quality control/quality assurance perspectives.

Before I proceed ahead, let me humbly state that data scientists/machine learning community has been saying that ML models are testable as they are first trained and then tested using techniques such as cross-validation etc., based on different techniques to increase the model performance, optimize the model.  However, “testing”

the model is referred with the scenario during the development (model building) phase when data scientists test the model performance by comparing the model outputs (predicted values) with the actual values.  This is not the same as testing the model for any given input for which the output (expected) value is not known beforehand.

Given that machine learning systems are non-testable, it can be said that performing QA or quality control checks on machine learning systems is not easy, and, thus, a matter of concern given the trust, the end-users need to have on such systems.

Project stakeholders must need to understand the non-testability aspects of machine learning systems in order to put appropriate quality controls in place to serve trustable machine learning models to end users in production.

In testing of software applications, a frequently invoked assumption is that there are testers or external mechanisms such as automated software tests (unit tests/integration tests) which could accurately determine whether or not the output produced by the program/software apps is correct.

Given that it, that ML models are non-testable due to absense of test oracle, let’s look at some of the ways (pseudo-oracle) which could be used to perform quality control checks on the machine learning models in some ways or the other.

QA: Why Machine Learning Systems Are Non-Testable

This post represents views on why Machine Learning systems or models are termed as non-testable from quality control/quality assurance perspectives.

the model is referred with the scenario during the development (model building) phase when data scientists test the model performance by comparing the model outputs (predicted values) with the actual values.

Project stakeholders must need to understand the non-testability aspects of Machine Learning systems in order to put appropriate quality controls in place to serve trustable Machine Learning models to end users in production.

It is easier to perform QC checks/testing on software applications as the outputs for different classes of inputs can be verified against expected values which are known prior to starting testing.

In testing of software applications, a frequently invoked assumption is that there are testers or external mechanisms, such as automated software tests (unit tests/integration tests) that could accurately determine whether or not the output produced by the program/software apps is correct.

software/program can be termed as non-testable in the following scenarios: In case the testers or test mechanisms could state whether the program output is correct or not without knowing the correct answer is termed as a partial oracle.

The following represents thoughts in relation to why Machine Learning programs are termed as non-testable programs: Given that ML models are non-testable due to the absence of test oracle, let's look at some of the ways (pseudo-oracle) that could be used to perform quality control checks on the Machine Learning models in some ways or the other.

How to Train TensorFlow Models Using GPUs

In recent years, there has been significant progress in the field of machine learning.

Much of this progress can be attributed to the increasing use of graphics processing units (GPUs) to accelerate the training of machine learning models.

In particular, the extra computational power has lead to the popularization of deep learning — the use of complex, multi-level neural networks to create models, capable of feature detection from large amounts of unlabeled training data.

GPUs are great for deep learning because the type of calculations they were designed to process are the same as those encountered in deep learning.

Images, videos, and other graphics are represented as matrices so that when you perform any operation, such as a zoom-in effect or a camera rotation, all you are doing is applying some mathematical transformation to a matrix.

In practice, this means that GPUs, compared to central processing units (CPUs), are more specialized at performing matrix operations and several other types of advanced mathematical transformations.

If you would like a particular operation to run on a device of your choice instead of using the defaults, you can use with tf.device to create a device context.

For example: For benchmarking purposes, we will use a convolutional neural network (CNN) for recognizing images that are provided as part of the TensorFlow tutorials.

Machine Learning: If It’s Testable, It’s Teachable

Ever wondered how you went to YouTube to watch just a five-minute video but ended up there for three hours?

We have to create a complex algorithm that will tell them to crawl the image pixel by pixel and find patterns in that picture that match pixel patterns usually found in known dog images and give an estimated guess.

The teacher bot itself cannot distinguish between a dog and a number 5, but it can test whether student bots are right in identifying them.

Based on the test data, the builder bot keeps building on different student bots by adjusting different permutations and combinations of student bot algorithm mechanics.

The teacher bot keeps testing, and based on the grades of the student bots, the builder bot keeps the best-performing bots and ruthlessly discards the rest.

If we give the bot a video of a dog or 5 upside down or letter ‘S’ instead of 5, will the bot still be able to figure that out?

To solve this, humans have to create longer automated test cases with more questions for the student bots to pass, including even the wrong and the right scenarios so that it is prepared for the wrong cases, too.

As there is not a single bot or some ten or twenty questions but millions of bots and zillions of questions, how does the "test, build, test"

The task here given to the student bots is to record the watch time of a user while keeping them engaged and the student bot that keeps the user engaged for the longest watch time will score the highest.

The teacher bots assess all the student bots and the student bots keep giving recommendations to the users so that the user remains engaged.

How Machines Learn

How do all the algorithms around us learn to do their jobs? Bot Wallpapers on Patreon: Discuss this video: ..

9. Verification and Validation

MIT 16.842 Fundamentals of Systems Engineering, Fall 2015 View the complete course: Instructor: Olivier de Weck The focus of ..

Finale Doshi-Velez: "A Roadmap for the Rigorous Science of Interpretability" | Talks at Google

With a growing interest in interpretability, there is an increasing need to characterize what exactly we mean by it and how to sensibly compare the interpretability ...

NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Hazy - Making Data-driven...

Big Learning Workshop: Algorithms, Systems, and Tools for Learning at Scale at NIPS 2011 Invited Talk: Hazy: Making Data-driven Statistical Applications ...

Input Space Partitioning

CppCon 2017: Mike Ritchie “Microcontrollers in Micro-increments...”

Microcontrollers in Micro-increments : A Test-driven C++ Workflow for Embedded Systems — Presentation Slides, PDFs, Source Code and ..

The Integrated Information Theory of Consciousness

Brains, Minds and Machines Seminar Series The Integrated Information Theory of Consciousness Speaker: Dr. Christof Koch, Chief Scientific Officer, Allen ...

Software Engineers in Test at Google - Covering your (Code)Bases

Richard Miller: More Innovation Through Education [Entire Talk]

Richard Miller, president of Olin College, describes disruptive ideas about education and learning that universities should adopt to graduate more creative, ...