AI News, BOOK REVIEW: List of Must – Read Free Data Science Books

List of Must – Read Free Data Science Books

Data science is an inter-disciplinary field which contains methods and techniques from fields like statistics, machine learning, Bayesian etc.

Author: Reza Nasiri Mahalati This compilation by Professor Sanjay emphasizes on applied linear algebra and linear dynamical systems with applications to circuits, signal processing, communications, and control systems.

Author: Stephen Boyd and Lieven Vandenberghe This book provides a comprehensive introduction to the subject and shows in detail how such problems can be solved numerically with great efficiency.

Author: Sean Luke This is an open set of lecture notes on metaheuristics algorithms, intended for undergraduate students, practitioners, programmers, and other non-experts.

Author: Hal Daumé III CIML is a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.).

List of Must — Read Free Data Science Books

Data science is an inter-disciplinary field which contains methods and techniques from fields like statistics, machine learning, Bayesian etc.

Author: Reza Nasiri Mahalati This compilation by Professor Sanjay emphasizes on applied linear algebra and linear dynamical systems with applications to circuits, signal processing, communications, and control systems.

Author: Stephen Boyd and Lieven Vandenberghe This book provides a comprehensive introduction to the subject and shows in detail how such problems can be solved numerically with great efficiency.

Author: Sean Luke This is an open set of lecture notes on metaheuristics algorithms, intended for undergraduate students, practitioners, programmers, and other non-experts.

Author: Hal Daumé III CIML is a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.).

10 Free Must-Read Machine Learning E-Books For Data Scientists & AI Engineers

So you love reading but can’t afford to splurge too much money on books?

We begin the list by going from the basics of statistics, then machine learning foundations and finally advanced machine learning.

One of the stand-out features of this book is it covers the basics of Bayesian statistics as well, a very important branch for any aspiring data scientist.

Authors: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani One of the most popular entries in this list, it’s an introduction to data science through machine learning. This book gives clear guidance on how to implement statistical and machine learning methods for newcomers to this field.

Authors: Shai Shalev-Shwartz and Shai Ben-David This book gives a structured introduction to machine learning. It looks at the fundamental theories of machine learning and the mathematical derivations that transform these concepts into practical algorithms.

Following that, it covers a list of ML algorithms, including (but not limited to), stochastic gradient descent, neural networks, and structured output learning.

It takes a fun and visually entertaining look at social filtering and item-based filtering methods and how to use machine learning to implement them.

Authors: Anand Rajaraman and Jeffrey David Ullman As the era of Big Data rages on, mining data to gain actionable insights is a highly sought after skill. This book focuses on algorithms that have been previously used to solve key problems in data mining and which can be used on even the most gigantic of datasets.

It starts off by covering the history of neural networks before deep diving into the mathematics and explanation behind different types of NNs.

Machine learning

Machine learning is a field of computer science that often uses statistical techniques to give computers the ability to 'learn' (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.[1] The name machine learning was coined in 1959 by Arthur Samuel.[2] Evolved from the study of pattern recognition and computational learning theory in artificial intelligence,[3] machine learning explores the study and construction of algorithms that can learn from and make predictions on data[4] – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions,[5]:2 through building a model from sample inputs.

Machine learning is sometimes conflated with data mining,[8] where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning.[5]:vii[9] Machine learning can also be unsupervised[10] and be used to learn and establish baseline behavioral profiles for various entities[11] and then used to find meaningful anomalies.

Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: 'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.'[13] This definition of the tasks in which machine learning is concerned offers a fundamentally operational definition rather than defining the field in cognitive terms.

Machine learning tasks are typically classified into two broad categories, depending on whether there is a learning 'signal' or 'feedback' available to a learning system: Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned system:[5]:3 Among other categories of machine learning problems, learning to learn learns its own inductive bias based on previous experience.

Developmental learning, elaborated for robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration and social interaction with human teachers and using guidance mechanisms such as active learning, maturation, motor synergies, and imitation.

Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation.[17]:488 By 1980, expert systems had come to dominate AI, and statistics was out of favor.[18] Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval.[17]:708–710;

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases).

Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge.

Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.[20] He also suggested the term data science as a placeholder to call the overall field.[20] Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic model,[21] wherein 'algorithmic model' means more or less the machine learning algorithms like Random forest.

Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions, allowing reconstruction of the inputs coming from the unknown data generating distribution, while not being necessarily faithful for configurations that are implausible under that distribution.

Multilinear subspace learning algorithms aim to learn low-dimensional representations directly from tensor representations for multidimensional data, without reshaping them into (high-dimensional) vectors.[27] Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features.

In machine learning, genetic algorithms found some uses in the 1980s and 1990s.[31][32] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[33] Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves 'rules' to store, manipulate or apply, knowledge.

A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.[44] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ('everything is a recommendation') and they changed their recommendation engine accordingly.[45] In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of Machine Learning to predict the financial crisis.

[46] In 2012, co-founder of Sun Microsystems Vinod Khosla predicted that 80% of medical doctors jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[47] In 2014, it has been reported that a machine learning algorithm has been applied in Art History to study fine art paintings, and that it may have revealed previously unrecognized influences between artists.[48] Although machine learning has been very transformative in some fields, effective machine learning is difficult because finding patterns is hard and often not enough training data are available;

as a result, machine-learning programs often fail to deliver.[49][50] Classification machine learning models can be validated by accuracy estimation techniques like the Holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set.

Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[53] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[54][55] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning.

There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these 'greed' biases are addressed.[57] Software suites containing a variety of machine learning algorithms include the following :

Ensemble learning

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance that could be obtained from any of the constituent learning algorithms alone.[1][2][3][4] Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

Evaluating the prediction of an ensemble typically requires more computation than evaluating the prediction of a single model, so ensembles may be thought of as a way to compensate for poor learning algorithms by performing a lot of extra computation.

This flexibility can, in theory, enable them to over-fit the training data more than a single model would, but in practice, some ensemble techniques (especially bagging) tend to reduce problems related to over-fitting of the training data[citation needed].

Empirically, ensembles tend to yield better results when there is a significant diversity among the models.[5][6] Many ensemble methods, therefore, seek to promote diversity among the models they combine.[7][8] Although perhaps non-intuitive, more random algorithms (like random decision trees) can be used to produce a stronger ensemble than very deliberate algorithms (like entropy-reducing decision trees).[9] Using a variety of strong learning algorithms, however, has been shown to be more effective than using techniques that attempt to dumb-down the models in order to promote diversity.[10] While the number of component classifiers of an ensemble has a great impact on the accuracy of prediction, there is a limited number of studies addressing this problem.

This formula can be restated using Bayes' theorem, which says that the posterior is proportional to the likelihood times the prior: Whence, Bootstrap aggregating, often abbreviated as bagging, involves having each model in the ensemble vote with equal weight.

As an example, the random forest algorithm combines random decision trees with bagging to achieve very high classification accuracy.[14] Boosting involves incrementally building an ensemble by training each new model instance to emphasize the training instances that previous models mis-classified.

Bayesian parameter averaging (BPA) is an ensemble technique that seeks to approximate the Bayes Optimal Classifier by sampling hypotheses from the hypothesis space, and combining them using Bayes' law.[15] Unlike the Bayes optimal classifier, Bayesian model averaging (BMA) can be practically implemented.

It has been shown that under certain circumstances, when hypotheses are drawn in this manner and averaged according to Bayes' law, this technique has an expected error that is bounded to be at most twice the expected error of the Bayes optimal classifier.[16] Despite the theoretical correctness of this technique, early work showed experimental results suggesting that the method promoted over-fitting and performed worse compared to simpler ensemble techniques such as bagging;[17] however, these conclusions appear to be based on a misunderstanding of the purpose of Bayesian model averaging vs.

Recent rigorous proofs demonstrate the accuracy of BMA in variable selection and estimation in high-dimensional settings,[19] and provide empirical evidence highlighting the role of sparsity-enforcing priors within the BMA in alleviating overfitting.[20] Bayesian model combination (BMC) is an algorithmic correction to Bayesian model averaging (BMA).

When tested with only one problem, a bucket of models can produce no better results than the best model in the set, but when evaluated across many problems, it will typically produce much better results, on average, than any model in the set.

It involves training only the fast (but imprecise) algorithms in the bucket, and then using the performance of these algorithms to help determine which slow (but accurate) algorithm is most likely to do best.[23] Stacking (sometimes called stacked generalization) involves training a learning algorithm to combine the predictions of several other learning algorithms.

Stacking typically yields performance better than any single one of the trained models.[24] It has been successfully used on both supervised learning tasks (regression,[25] classification and distance learning [26]) and unsupervised learning (density estimation).[27] It has also been used to estimate bagging's error rate.[4][28] It has been reported to out-perform Bayesian model-averaging.[29] The two top-performers in the Netflix competition utilized blending, which may be considered to be a form of stacking.[30]

1. Introduction to Statistics

NOTE: This video was recorded in Fall 2017. The rest of the lectures were recorded in Fall 2016, but video of Lecture 1 was not available. MIT 18.650 Statistics ...

R Tutorial 13: Variable Selection and Best Subsets Selection (regsubsets)

This video is going to show how to perform variable selection and best subsets selection using regsubsets() in R. Measures include R-squared, Adjusted ...

The Bayesian Trap

Bayes' theorem explained with examples and implications for life. Check out Audible: Support Veritasium on Patreon: ..

Factor Analysis - an introduction

This video provides an introduction to factor analysis, and explains why this technique is often used in the social sciences. Check out ...

How to Build a Text Mining, Machine Learning Document Classification System in R!

We show how to build a machine learning document classification system from scratch in less than 30 minutes using R. We use a text mining approach to ...

Maximum Likelihood estimation - an introduction part 1

This video introduces the concept of Maximum Likelihood estimation, by means of an example using the Bernoulli distribution. Check out ...

Econometric model building - general to specific

Check out for course materials, and information regarding updates on each of the courses

Statistical Text Analysis for Social Science

What can text analysis tell us about society? Corpora of news, books, and social media encode human beliefs and culture. But it is impossible for a researcher to ...

Big Data: Statistical Inference and Machine Learning - free online course at FutureLearn.com

Sign up now at 'Big Data: Statistical Inference and Machine Learning' is a free online course by QUT on FutureLearn.com Many people have ..