AI News, Blog on Machine Learning, Statistics Software Development

Blog on Machine Learning, Statistics Software Development

Hi, my name is Vasilis Vryniotis, I'm a Data Scientist, a Software Engineer and author of Datumbox Machine Learning Framework and a proud geek.

I specialize in building complex Machine Learning Frameworks & Pipelines, developing custom Statistical Models & Algorithms and performing Data Modeling.

In my spare time I develop the Datumbox Machine Learning Framework, I contribute to other open-source projects and from time to time I publish in this Blog articles about Machine Learning, Statistics and Computer Science.

Empower your people with verifiable digital records.

“Using the blockchain and strong cryptography, it is now possible to create a certification infrastructure that puts us in control of the full record of our achievements and accomplishments.”

Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to 'learn' (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.[1]

These analytical models allow researchers, data scientists, engineers, and analysts to 'produce reliable, repeatable decisions and results' and uncover 'hidden insights' through learning from historical relationships and trends in the data.[12]

Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: 'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.'[13]

Developmental learning, elaborated for robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration and social interaction with human teachers and using guidance mechanisms such as active learning, maturation, motor synergies, and imitation.

Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval.[17]:708–710;

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases).

Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge.

Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

Loss functions express the discrepancy between the predictions of the model being trained and the actual problem instances (for example, in classification, one wants to assign a label to instances, and models are trained to correctly predict the pre-assigned labels of a set of examples).

The difference between the two fields arises from the goal of generalization: while optimization algorithms can minimize the loss on a training set, machine learning is concerned with minimizing the loss on unseen samples.[19]

The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.

An artificial neural network (ANN) learning algorithm, usually called 'neural network' (NN), is a learning algorithm that is vaguely inspired by biological neural networks.

They are usually used to model complex relationships between inputs and outputs, to find patterns in data, or to capture the statistical structure in an unknown joint probability distribution between observed variables.

Falling hardware prices and the development of GPUs for personal use in the last few years have contributed to the development of the concept of deep learning which consists of multiple hidden layers in an artificial neural network.

Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples.

Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.

Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to some predesignated criterion or criteria, while observations drawn from different clusters are dissimilar.

Different clustering techniques make different assumptions on the structure of the data, often defined by some similarity metric and evaluated for example by internal compactness (similarity between members of the same cluster) and separation between different clusters.

Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG).

Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions, allowing reconstruction of the inputs coming from the unknown data generating distribution, while not being necessarily faithful for configurations that are implausible under that distribution.

Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features.

genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and uses methods such as mutation and crossover to generate new genotype in the hope of finding good solutions to a given problem.

In 2006, the online movie company Netflix held the first 'Netflix Prize' competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%.

Classification machine learning models can be validated by accuracy estimation techniques like the Holdout method, which splits the data into a training and test sets (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set.

In comparison, the k-fold-cross-validation method randomly splits the data into k subsets where the k - 1 instances of the data subsets are used to train the model while the kth subset instance is used to test the predictive ability of the training model.

For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[50][51]

There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these 'greed' biases are addressed.[53]

About Us

As the only blockchain-based records provider in the world with an in-market product for multi-chain issuing and self-sovereign digital identity, our offering is revolutionizing the way business in all sectors issue and verify claims and the way individuals understand and use their digital identities.

With MIT launched, Learning Machine raises seed to replace paper with blockchain credentials

Transcripts, diplomas, resumes — simple documents with enormously important economic consequences for their holders.

The right classes or GPA on a transcript can radically change the career prospects of a young graduate, and yet, the infrastructure that manages these critical documents still centers on mailing official paper to processing centers.

“These kids started to Snapchat their grades and send them in [and they] didn’t understand why it wasn’t that easy.” Jagers founded SlideRoom in 2007, and built it into a growing business across university admissions departments.

“And we were thinking, ‘How cool would it be to use the blockchain as a decentralized verification network?’” After 10 years of building SlideRoom, Jagers finally accepted an offer to sell last summer and work on Learning Machine full-time.

Jagers believes that the focus on open standards and interoperability will give Learning Machine an edge in a market filled with blockchain credentials startups.

The company has developed a set of schemas that institutions can use to standardize credentials, and Learning Machine is developing more schemas as clients require more flexibility.

For instance, completing the final class of a degree program wouldn’t just give you the class transcript grade, but also the degree itself.

Right now though, “It’s about showing success with this first round of pilots.” If blockchain is going to work in the enterprise, deep expertise, traditional business models, and open standards appears to be a potentially winning formula.

Essential Tools for Machine Learning - MATLAB Video

See what's new in the latest release of MATLAB and Simulink: Download a trial: Machine learning is quickly .

Learn Machine Learning in 3 Months (with curriculum)

How is a total beginner supposed to get started learning machine learning? I'm going to describe a 3 month curriculum to help you go from beginner to ...

Hello World - Machine Learning Recipes #1

Six lines of Python is all it takes to write your first machine learning program! In this episode, we'll briefly introduce what machine learning is and why it's ...

Predicting Stock Prices - Learn Python for Data Science #4

In this video, we build an Apple Stock Prediction script in 40 lines of Python using the scikit-learn library and plot the graph using the matplotlib library.

Machine Learning? Teaching Computers!!! What is it?

Namaskaar Dosto, yeh ek bahut hi interesting video hai, aap mein se bahut se users ne Machine Learning ke baare mein shayad pehle suna hoga, ya fir ...

DensePose - 3D Machine Vision

Can machine vision map humans from videos to 3D Models? Yes! DensePose is a new architecture by the team at Facebook AI research that does just that.

Anshul Kundaje: Machine learning to decode the genome

The future of personalized medicine is inevitably connected to the future of artificial intelligence, says Anshul Kundaje, assistant professor of genetics and of ...

Lecture 05 - Training Versus Testing

Training versus Testing - The difference between training and testing in mathematical terms. What makes a learning model able to generalize? Lecture 5 of 18 of ...

1. Introduction to Statistics

NOTE: This video was recorded in Fall 2017. The rest of the lectures were recorded in Fall 2016, but video of Lecture 1 was not available. MIT 18.650 Statistics ...

Build a TensorFlow Image Classifier in 5 Min

In this episode we're going to train our own image classifier to detect Darth Vader images. The code for this repository is here: ...