# AI News, Machine Learning-Algorithms and Applications

## Machine Learning-Algorithms and Applications

Over the past couple of weeks, I&rsquo;ve talked about machine learning (ML) and how it can help you get more out of your analytics and data integration efforts.

Because ML is not a specific technology, but rather a deep and complex set of mathematical algorithms, it&rsquo;s important to understand which types of algorithms will help you get the insights you seek from your data&mdash;and which will give you insights you didn&rsquo;t even know you wanted.

Of course, it won&rsquo;t be 100% accurate (humans value and run companies, so there&rsquo;s irrationality, and therefore unpredictability) but with enough time and data, the model will get really good at predicting the Dow-Jones Average.

Given enough data and time, the algorithm will show you patterns in the data that you would never discover using supervised methods.However, because the outputs aren&rsquo;t known in advance, it&#39;s often difficult to attest to the validity of the model.

The best of both worlds are hybrid algorithms that combine elements of both supervised and unsupervised learning methods to couple the relative certainty of supervised learning with the power and novel insight generation of unsupervised learning.

The bottom-line benefit is that reinforcement learning can help you discover the optimal outcomes you seek, and it can reveal outcomes that you didn&rsquo;t seek, but that you can leverage to optimize your operations.

There are many other techniques&mdash;such as anomaly detection to help detect fraud and bolster risk management efforts&mdash;that can help you improve your analytics and increase your bottom line.

## Unsupervised learning

Unsupervised machine learning is the machine learning task of inferring a function that describes the structure of 'unlabeled' data (i.e.

central application of unsupervised learning is in the field of density estimation in statistics,[1] though unsupervised learning encompasses many other problems (and solutions) involving summarizing and explaining various key features of data.

In particular, the method of moments is shown to be effective in learning the parameters of latent variable models.[5] Latent variable models are statistical models where in addition to the observed variables, a set of latent variables also exists which is not observed.

A highly practical example of latent variable models in machine learning is the topic modeling which is a statistical model for generating the words (observed variables) in the document based on the topic (latent variable) of the document.

It is shown that method of moments (tensor decomposition techniques) consistently recover the parameters of a large class of latent variable models under some assumptions.[5] The Expectation–maximization algorithm (EM) is also one of the most practical methods for learning latent variable models.

According to Giora Engel, co-founder of LightCyber, in a Dark Reading article, 'The great promise machine learning holds for the security industry is its ability to detect advanced and unknown attacks—particularly those leading to data breaches.'[6] The basic premise is that a motivated attacker will find their way into a network (generally by compromising a user's computer or network account through phishing, social engineering or malware).

## Essentials of Machine Learning Algorithms (with Python and R Codes)

Note: This article was originally published on Aug 10, 2015 and updated on Sept 9th, 2017

Google&#8217;s self-driving cars and robots get a lot of press, but the company&#8217;s real future is in machine learning, the technology that enables computers to get smarter and more personal.

The idea behind creating this guide is to simplify the journey of aspiring data scientists and machine learning enthusiasts across the world.

How it works: This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables).

Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data.

This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions.

These algorithms can be applied to almost any data problem: It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s).

In this equation: These coefficients a and b are derived based on minimizing the sum of squared difference of distance between data points and regression line.

And, Multiple Linear Regression(as the name suggests) is characterized by multiple (more than 1) independent variables. While finding best fit line, you can fit a polynomial or curvilinear regression.

It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s).

It chooses parameters that maximize the likelihood of observing the sample values rather than that minimize the sum of squared errors (like in ordinary regression).

source: statsexchange In the image above, you can see that population is classified into four different groups based on multiple attributes to identify &#8216;if they will play or not&#8217;.

In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.

For example, if we only had two features like Height and Hair length of an individual, we&#8217;d first plot these two variables in two dimensional space where each point has two co-ordinates (these co-ordinates are known as Support Vectors)

In the example shown above, the line which splits the data into two differently classified groups is the black line, since the two closest points are the farthest apart from the line.

It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Step 1: Convert the data set to frequency table Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.

Yes) * P(Yes) / P (Sunny) Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64 Now, P (Yes |

However, it is more widely used in classification problems in the industry. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors.

Its procedure follows a simple and easy  way to classify a given data set through a certain number of  clusters (assume k clusters).

We know that as the number of cluster increases, this value keeps on decreasing but if you plot the result you may see that the sum of squared distance decreases sharply up to some value of k, and then much more slowly after that.

grown as follows: For more details on this algorithm, comparing with decision tree and tuning model parameters, I would suggest you to read these articles: Python R Code

For example: E-commerce companies are capturing more details about customer like their demographics, web crawling history, what they like or dislike, purchase history, feedback and many others to give them personalized attention more than your nearest grocery shopkeeper.

How&#8217;d you identify highly significant variable(s) out 1000 or 2000? In such cases, dimensionality reduction algorithm helps us along with various other algorithms like Decision Tree, Random Forest, PCA, Factor Analysis, Identify based on correlation matrix, missing value ratio and others.

GBM is a boosting algorithm used when we deal with plenty of data to make a prediction with high prediction power. Boosting is actually an ensemble of learning algorithms which combines the prediction of several base estimators in order to improve robustness over a single estimator.

The XGBoost has an immensely high predictive power which makes it the best choice for accuracy in events as it possesses both linear model and the tree learning algorithm, making the algorithm almost 10x faster than existing gradient booster techniques.

It is designed to be distributed and efficient with the following advantages: The framework is a fast and high-performance gradient boosting one based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Since the LightGBM is based on decision tree algorithms, it splits the tree leaf wise with the best fit whereas other boosting algorithms split the tree depth wise or level wise rather than leaf-wise.

So when growing on the same leaf in Light GBM, the leaf-wise algorithm can reduce more loss than the level-wise algorithm and hence results in much better accuracy which can rarely be achieved by any of the existing boosting algorithms.

Catboost can automatically deal with categorical variables without showing the type conversion error, which helps you to focus on tuning your model better rather than sorting out trivial errors.

My sole intention behind writing this article and providing the codes in R and Python is to get you started right away. If you are keen to master machine learning, start right away.

## Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to 'learn' (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.[1] The name machine learning was coined in 1959 by Arthur Samuel.[2] Evolved from the study of pattern recognition and computational learning theory in artificial intelligence,[3] machine learning explores the study and construction of algorithms that can learn from and make predictions on data[4] – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions,[5]:2 through building a model from sample inputs.

Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: 'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.'[13] This definition of the tasks in which machine learning is concerned offers a fundamentally operational definition rather than defining the field in cognitive terms.

Machine learning tasks are typically classified into two broad categories, depending on whether there is a learning 'signal' or 'feedback' available to a learning system: Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned system:[5]:3 Among other categories of machine learning problems, learning to learn learns its own inductive bias based on previous experience.

Developmental learning, elaborated for robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration and social interaction with human teachers and using guidance mechanisms such as active learning, maturation, motor synergies, and imitation.

Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation.[17]:488 By 1980, expert systems had come to dominate AI, and statistics was out of favor.[18] Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval.[17]:708–710;

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases).

Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge.

Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.[20] He also suggested the term data science as a placeholder to call the overall field.[20] Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic model,[21] wherein 'algorithmic model' means more or less the machine learning algorithms like Random forest.

Multilinear subspace learning algorithms aim to learn low-dimensional representations directly from tensor representations for multidimensional data, without reshaping them into (high-dimensional) vectors.[26] Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features.

In machine learning, genetic algorithms found some uses in the 1980s and 1990s.[30][31] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[32] Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves 'rules' to store, manipulate or apply, knowledge.

They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions.[34] Applications for machine learning include: In 2006, the online movie company Netflix held the first 'Netflix Prize' competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%.

A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for \$1&#160;million.[40] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ('everything is a recommendation') and they changed their recommendation engine accordingly.[41] In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of Machine Learning to predict the financial crisis.

[42] In 2012, co-founder of Sun Microsystems Vinod Khosla predicted that 80% of medical doctors jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[43] In 2014, it has been reported that a machine learning algorithm has been applied in Art History to study fine art paintings, and that it may have revealed previously unrecognized influences between artists.[44] Although machine learning has been very transformative in some fields, effective machine learning is difficult because finding patterns is hard and often not enough training data are available;

as a result, machine-learning programs often fail to deliver.[45][46] Classification machine learning models can be validated by accuracy estimation techniques like the Holdout method, which splits the data into a training and test sets (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set.

Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[49] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[50][51] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning.

There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these 'greed' biases are addressed.[53] Software suites containing a variety of machine learning algorithms include the following&#160;:

## Supervised vs. Unsupervised Machine Learning

Data scientists use many different kinds of machine learning algorithms to discover patterns in big data that lead to actionable insights.

On the other hand, unsupervised machine learning is more closely aligned with what some call true artificial intelligence — the idea that a computer can learn to identify complex processes and patterns without a human to provide guidance along the way.

While a supervised classification algorithm learns to ascribe inputted labels to images of animals, its unsupervised counterpart will look at inherent similarities between the images and separate them into groups accordingly, assigning its own new label to each group.

In a practical example, this type of algorithm is useful for customer segmentation because it will return groups based on parameters that a human may not consider due to pre-existing biases about the company’s demographic.

## How to choose algorithms for Microsoft Azure Machine Learning

The answer to the question &quot;What machine learning algorithm should I use?&quot;

It depends on how the math of the algorithm was translated into instructions for the computer you are using.

Even the most experienced data scientists can&#39;t tell which algorithm will perform best before trying them.

The Microsoft Azure Machine Learning Algorithm Cheat Sheet helps you choose the right machine learning algorithm for your predictive analytics solutions from the Microsoft Azure Machine Learning library of algorithms. This

This cheat sheet has a very specific audience in mind: a beginning data scientist with undergraduate-level machine learning, trying to choose an algorithm to start with in Azure Machine Learning Studio.

That means that it makes some generalizations and oversimplifications, but it points you in a safe direction.

As Azure Machine Learning grows to encompass a more complete set of available methods, we&#39;ll add them.

These recommendations are compiled feedback and tips from many data scientists and machine learning experts.

We didn&#39;t agree on everything, but I&#39;ve tried to harmonize our opinions into a rough consensus.

data scientists I talked with said that the only sure way to find

Supervised learning algorithms make predictions based on a set of examples.

For instance, historical stock prices can be used to hazard guesses

company&#39;s financial data, the type of industry, the presence of disruptive

it uses that pattern to make predictions for unlabeled testing data—tomorrow&#39;s

Supervised learning is a popular and useful type of machine learning.

In unsupervised learning, data points have no labels associated with them.

grouping it into clusters or finding different ways of looking at complex

In reinforcement learning, the algorithm gets to choose an action in response

signal a short time later, indicating how good the decision was. Based

where the set of sensor readings at one point in time is a data

The number of minutes or hours necessary to train a model varies a great deal

time is limited it can drive the choice of algorithm, especially when

regression algorithms assume that data trends follow a straight line.

These assumptions aren&#39;t bad for some problems, but on others they bring

Non-linear class boundary - relying on a linear classification algorithm

Data with a nonlinear trend - using a linear regression method would generate

much larger errors than necessary Despite their dangers, linear algorithms are very popular as a first line

Parameters are the knobs a data scientist gets to turn when setting up an

as error tolerance or number of iterations, or options between variants

to make sure you&#39;ve spanned the parameter space, the time required to

train a model increases exponentially with the number of parameters.

For certain types of data, the number of features can be very large compared

algorithms, making training time unfeasibly long.

Some learning algorithms make particular assumptions about the structure of

- shows excellent accuracy, fast training times, and the use of linearity ○

- shows good accuracy and moderate training times As mentioned previously, linear regression fits

curve instead of a straight line makes it a natural fit for dividing

logistic regression to two-class data with just one feature - the class

boundary is the point at which the logistic curve is just as close to both classes Decision forests (regression, two-class, and multiclass), decision

all based on decision trees, a foundational machine learning concept.

decision tree subdivides a feature space into regions of roughly uniform

values Because a feature space can be subdivided into arbitrarily small regions,

it&#39;s easy to imagine dividing it finely enough to have one data point

a large set of trees are constructed with special mathematical care

memory at the expense of a slightly longer training time.

Boosted decision trees avoid overfitting by limiting how many times they can

a variation of decision trees for the special case where you want to know

that input features are passed forward (never backward) through a sequence

a long time to train, particularly for large data sets with lots of features.

typical support vector machine class boundary maximizes the margin separating

Any new data points that fall far outside that boundary

PCA-based anomaly detection - the vast majority of the data falls into

data set is grouped into five clusters using K-means There is also an ensemble one-v-all multiclass classifier, which

Machine Learning in R - Supervised vs. Unsupervised

Learn the basics of Machine Learning with R. Start our Machine Learning Course for free: ...

Machine Learning Algorithms | Machine Learning Tutorial | Data Science Training | Edureka

Data Science Training - ) This Machine Learning Algorithms Tutorial shall teach you what machine learning is, and the ..

Comparing machine learning models in scikit-learn

We've learned how to train different machine learning models and make predictions, but how do we actually choose which model is "best"? We'll cover the ...

Machine Learning and Predictive Analytics - Supervised & Unsupervised Algorithms - #MachineLearning

Machine Learning and Predictive Analytics. #MachineLearning Learn More: (Fundamentals Of Machine Learning for Predictive Data ..

Machine Learning in R - Classification, Regression and Clustering Problems

Learn the basics of Machine Learning with R. Start our Machine Learning Course for free: ...

Data Analysis: Clustering and Classification (Lec. 1, part 1)

Supervised and unsupervised learning algorithms.

Unsupervised Algorithms in Machine Learning

In this video you will learn what are Unsupervised machine learning algorithms For Training & Study packs on Analytics/Data Science/Big Data, Contact us at ...

EM algorithm: how it works

Full lecture: Mixture models are a probabilistically-sound way to do soft clustering. We assume our data is sampled from K different sources ..

8 Machine Learning Algorithms for beginners | Data Science

In this video you will learn about 8 most important,simple and useful algorithms that every data scientist or aspiring data scientist must know. You will learn in ...

Machine Learning |Ten Poplar ML Algorithms | Supervised & Unsupervised Learning

In this video you will learn about ten most used machine learning algorithms. The algorithms include both supervised & unsupervised algorithms & also include ...