# AI News, Ten Machine Learning Algorithms You Should Know to Become a Data Scientist ## Ten Machine Learning Algorithms You Should Know to Become a Data Scientist

That said, no one can deny the fact that as practicing Data Scientists, we will have to know basics of some common machine learning algorithms, which would help us engage with a new-domain problem we come across.

Covariance Matrix of data points is analyzed here to understand what dimensions(mostly)/ data points (sometimes) are more important (ie have high variance amongst themselves, but low covariance with others).

As is obvious, use this algorithm to fit simple curves / regression https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.polyfit.html https://lagunita.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/linear_regression.pdf Least Squares can get confused with outliers, spurious fields and noise in data.

As is obvious from the name, you can use this algorithm to create K clusters in dataset http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html https://www.youtube.com/watch?v=hDmNF9JG3lo https://www.datascience.com/blog/k-means-clustering Logistic Regression is constrained Linear Regression with a nonlinearity (sigmoid function is used mostly or you can use tanh too) application after weights are applied, hence restricting the outputs close to +/- classes (which is 1 and 0 in case of sigmoid).

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html https://www.youtube.com/watch?v=-la3q9d7AKQ SVMs are linear models like Linear/ Logistic Regression, the difference is that they have different margin-based loss function (The derivation of Support Vectors is one of the most beautiful mathematical results I have seen along with eigenvalue calculation).

FFNNs can be used to train a classifier or extract features as autoencoders http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html https://github.com/keras-team/keras/blob/master/examples/reuters_mlp_relu_vs_selu.py   http://www.deeplearningbook.org/contents/mlp.html http://www.deeplearningbook.org/contents/autoencoders.html http://www.deeplearningbook.org/contents/representation.html Almost any state of the art Vision based Machine Learning result in the world today has been achieved using Convolutional Neural Networks.

https://developer.nvidia.com/digits https://github.com/kuangliu/torchcv https://github.com/chainer/chainercv https://keras.io/applications/ http://cs231n.github.io/ https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/ RNNs model sequences by applying the same set of weights recursively on the aggregator state at a time t and input at a time t (Given a sequence has inputs at times 0..t..T, and have a hidden state at each time t which is output from t-1 step of RNN).

RNN (If here is a densely connected unit and a nonlinearity, nowadays f is generally LSTMs or GRUs ). LSTM unit which is used instead of a plain dense layer in a pure RNN.

Use RNNs for any sequence modelling task specially text classification, machine translation, language modelling https://github.com/tensorflow/models (Many cool NLP research papers from Google are here) https://github.com/wabyking/TextClassificationBenchmark http://opennmt.net/    http://cs224d.stanford.edu/ http://www.wildml.com/category/neural-networks/recurrent-neural-networks/ http://colah.github.io/posts/2015-08-Understanding-LSTMs/ CRFs are probably the most frequently used models from the family of Probabilitic Graphical Models (PGMs).

Before Neural Machine Translation systems came in CRFs were the state of the art and in many sequence tagging tasks with small datasets, they will still learn better than RNNs which require a larger amount of data to generalize.

The two common decision trees algorithms used nowadays are Random Forests (which build different classifiers on a random subset of attributes and combine them for output) and Boosting Trees (which train a cascade of trees one on top of others, correcting the mistakes of ones below them).

Decision Trees can be used to classify datapoints (and even regression) http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html http://xgboost.readthedocs.io/en/latest/ https://catboost.yandex/ http://xgboost.readthedocs.io/en/latest/model.html https://arxiv.org/abs/1511.05741 https://arxiv.org/abs/1407.7502 http://education.parrotprediction.teachable.com/p/practical-xgboost-in-python If you are still wondering how can any of the above methods solve tasks like defeating Go world champion like DeepMind did, they cannot.

To learn strategy to solve a multi-step problem like winning a game of chess or playing Atari console, we need to let an agent-free in the world and learn from the rewards/penalties it faces.

Linear Regression - Machine Learning Fun and Easy

Linear Regression - Machine Learning Fun and Easy

Logistic Regression in R | Machine Learning Algorithms | Data Science Training | Edureka

Data Science Training - ) This Logistic Regression Tutorial shall give you a clear understanding as to how a Logistic ..

Linear Regression Algorithm | Linear Regression in R | Data Science Training | Edureka

Data Science Training - ) This Edureka Linear Regression tutorial will help you understand all the basics of linear ..

Regression Features and Labels - Practical Machine Learning Tutorial with Python p.3

We'll be using the numpy module to convert data to numpy arrays, which is what Scikit-learn wants. We will talk more on preprocessing and cross_validation ...

3.4: Linear Regression with Gradient Descent - Intelligence and Learning

In this video I continue my Machine Learning series and attempt to explain Linear Regression with Gradient Descent. My Video explaining the Mathematics of ...

Data Science & Machine Learning - Linear Regression Model - DIY- 10(a) -of-50

Data Science & Machine Learning - Linear Regression Model - DIY- 10(a) -of-50 Do it yourself Tutorial by Bharati DW Consultancy cell: +1-562-646-6746 (Cell ...

Difference between Classification and Regression - Georgia Tech - Machine Learning