AI News, 41 Essential Machine Learning Interview Questions (with answers)
41 Essential Machine Learning Interview Questions (with answers)
We’ve traditionally seen machine learning interview questions pop up in several categories.
The third has to do with your general interest in machine learning: you’ll be asked about what’s going on in the industry and how you keep up with the latest machine learning trends.
Finally, there are company or industry-specific questions that test your ability to take your general machine learning knowledge and turn it into actionable points to drive the bottom line forward.
We’ve divided this guide to machine learning interview questions into the categories we mentioned above so that you can more easily get to the information you need when it comes to machine learning interview questions.
This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set.
The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset.
For example, in order to do classification (a supervised learning task), you’ll need to first label the data you’ll use to train the model to classify data into your labeled groups.
K-means clustering requires only a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually learn how to cluster them into groups by computing the mean of the distance between different points.
It’s often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).
More reading: Precision and recall (Wikipedia) Recall is also known as the true positive rate: the amount of positives your model claims compared to the actual number of positives there are throughout the data.
Precision is also known as the positive predictive value, and it is a measure of the amount of accurate positives your model claims compared to the number of positives it actually claims.
It can be easier to think of recall and precision in the context of a case where you’ve predicted that there were 10 apples and 5 oranges in a case of 10 apples.
Mathematically, it’s expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition.
Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu.
(Quora) Despite its practical applications, especially in text mining, Naive Bayes is considered “Naive” because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components.
clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn’t carrying a baby.
More reading: Deep learning (Wikipedia) Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data.
More reading: Using k-fold cross-validation for time-series model selection (CrossValidated) Instead of using standard k-folds cross-validation, you have to pay attention to the fact that a time series is not randomly distributed data —
More reading: Pruning (decision trees) Pruning is what happens in decision trees when branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model.
For example, if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud.
More reading: Regression vs Classification (Math StackExchange) Classification produces discrete values and dataset to strict categories, while regression gives you continuous results that allow you to better distinguish differences between individual points.
You would use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.) Q21- Name an example where ensemble techniques might be useful.
They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data). You could list some examples of ensemble methods, from bagging to boosting to a “bucket of models” method and demonstrate how they could increase predictive power.
(Quora) This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations.
There are three main methods to avoid overfitting: 1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.
More reading: How to Evaluate Machine Learning Algorithms (Machine Learning Mastery) You would first split the dataset into training and test sets, or perhaps use cross-validation techniques to further segment the dataset into composite sets of training and test sets within the data.
More reading: Kernel method (Wikipedia) The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space.
This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products.
More reading: Writing pseudocode for parallel programming (Stack Overflow) This kind of question demonstrates your ability to think in parallelism and how you could handle concurrency in programming implementations dealing with big data.
For example, if you were interviewing for music-streaming startup Spotify, you could remark that your skills at developing a better recommendation model would increase user retention, which would then increase revenue in the long run.
The startup metrics Slideshare linked above will help you understand exactly what performance indicators are important for startups and tech companies as they think about revenue and growth.
Your interviewer is trying to gauge if you’d be a valuable member of their team and whether you grasp the nuances of why certain things are set the way they are in the company’s data process based on company- or industry-specific conditions.
This overview of deep learning in Nature by the scions of deep learning themselves (from Hinton to Bengio to LeCun) can be a good reference paper and an overview of what’s happening in deep learning —
More reading: Mastering the game of Go with deep neural networks and tree search (Nature) AlphaGo beating Lee Sidol, the best human player at Go, in a best-of-five series was a truly seminal event in the history of machine learning and deep learning.
The Nature paper above describes how this was accomplished with “Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play.” Cover image credit: https://www.flickr.com/photos/iwannt/8596885627
How to choose algorithms for Microsoft Azure Machine Learning
The answer to the question "What machine learning algorithm should I use?"
It depends on how the math of the algorithm was translated into instructions for the computer you are using.
Even the most experienced data scientists can't tell which algorithm will perform best before trying them.
The Microsoft Azure Machine Learning Algorithm Cheat Sheet helps you choose the right machine learning algorithm for your predictive analytics solutions from the Microsoft Azure Machine Learning library of algorithms. This
This cheat sheet has a very specific audience in mind: a beginning data scientist with undergraduate-level machine learning, trying to choose an algorithm to start with in Azure Machine Learning Studio.
That means that it makes some generalizations and oversimplifications, but it points you in a safe direction.
As Azure Machine Learning grows to encompass a more complete set of available methods, we'll add them.
These recommendations are compiled feedback and tips from many data scientists and machine learning experts.
We didn't agree on everything, but I've tried to harmonize our opinions into a rough consensus.
data scientists I talked with said that the only sure way to find
Supervised learning algorithms make predictions based on a set of examples.
For instance, historical stock prices can be used to hazard guesses
company's financial data, the type of industry, the presence of disruptive
it uses that pattern to make predictions for unlabeled testing data—tomorrow's
Supervised learning is a popular and useful type of machine learning.
In unsupervised learning, data points have no labels associated with them.
grouping it into clusters or finding different ways of looking at complex
In reinforcement learning, the algorithm gets to choose an action in response
signal a short time later, indicating how good the decision was. Based
where the set of sensor readings at one point in time is a data
The number of minutes or hours necessary to train a model varies a great deal
time is limited it can drive the choice of algorithm, especially when
regression algorithms assume that data trends follow a straight line.
These assumptions aren't bad for some problems, but on others they bring
Non-linear class boundary - relying on a linear classification algorithm
Data with a nonlinear trend - using a linear regression method would generate
much larger errors than necessary Despite their dangers, linear algorithms are very popular as a first line
Parameters are the knobs a data scientist gets to turn when setting up an
as error tolerance or number of iterations, or options between variants
to make sure you've spanned the parameter space, the time required to
train a model increases exponentially with the number of parameters.
For certain types of data, the number of features can be very large compared
algorithms, making training time unfeasibly long.
Some learning algorithms make particular assumptions about the structure of
- shows excellent accuracy, fast training times, and the use of linearity ○
- shows good accuracy and moderate training times As mentioned previously, linear regression fits
curve instead of a straight line makes it a natural fit for dividing
logistic regression to two-class data with just one feature - the class
boundary is the point at which the logistic curve is just as close to both classes Decision forests (regression, two-class, and multiclass), decision
all based on decision trees, a foundational machine learning concept.
decision tree subdivides a feature space into regions of roughly uniform
values Because a feature space can be subdivided into arbitrarily small regions,
it's easy to imagine dividing it finely enough to have one data point
a large set of trees are constructed with special mathematical care
memory at the expense of a slightly longer training time.
Boosted decision trees avoid overfitting by limiting how many times they can
a variation of decision trees for the special case where you want to know
that input features are passed forward (never backward) through a sequence
a long time to train, particularly for large data sets with lots of features.
typical support vector machine class boundary maximizes the margin separating
Any new data points that fall far outside that boundary
PCA-based anomaly detection - the vast majority of the data falls into
data set is grouped into five clusters using K-means There is also an ensemble one-v-all multiclass classifier, which
progressively improve performance on a specific task) with data, without being explicitly programmed. The name Machine learning was coined in 1959 by Arthur Samuel. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions,:2 through building a model from sample inputs.
Machine learning is sometimes conflated with data mining, where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning.:vii Machine learning can also be unsupervised and be used to learn and establish baseline behavioral profiles for various entities and then used to find meaningful anomalies.
These analytical models allow researchers, data scientists, engineers, and analysts to 'produce reliable, repeatable decisions and results' and uncover 'hidden insights' through learning from historical relationships and trends in the data. Effective machine learning is difficult because finding patterns is hard and often not enough training data are available;
Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: 'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.' This definition of the tasks in which machine learning is concerned offers a fundamentally operational definition rather than defining the field in cognitive terms.
Machine learning tasks are typically classified into two broad categories, depending on whether there is a learning 'signal' or 'feedback' available to a learning system: Another categorization of machine learning tasks arises when one considers the desired output of a machine-learned system::3 Among other categories of machine learning problems, learning to learn learns its own inductive bias based on previous experience.
Developmental learning, elaborated for robot learning, generates its own sequences (also called curriculum) of learning situations to cumulatively acquire repertoires of novel skills through autonomous self-exploration and social interaction with human teachers and using guidance mechanisms such as active learning, maturation, motor synergies, and imitation.
Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation.:488 By 1980, expert systems had come to dominate AI, and statistics was out of favor. Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval.:708–710;
Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases).
Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge.
Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics. He also suggested the term data science as a placeholder to call the overall field. Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic model, wherein 'algorithmic model' means more or less the machine learning algorithms like Random forest.
Multilinear subspace learning algorithms aim to learn low-dimensional representations directly from tensor representations for multidimensional data, without reshaping them into (high-dimensional) vectors. Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features.
In machine learning, genetic algorithms found some uses in the 1980s and 1990s. Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms. Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves `rules’ to store, manipulate or apply, knowledge.
They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions. Applications for machine learning include: In 2006, the online movie company Netflix held the first 'Netflix Prize' competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%.
A joint team made up of researchers from ATT Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million. Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ('everything is a recommendation') and they changed their recommendation engine accordingly. In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of Machine Learning to predict the financial crisis.
 In 2012, co-founder of Sun Microsystems Vinod Khosla predicted that 80% of medical doctors jobs would be lost in the next two decades to automated machine learning medical diagnostic software. In 2014, it has been reported that a machine learning algorithm has been applied in Art History to study fine art paintings, and that it may have revealed previously unrecognized influences between artists. Classification machine learning models can be validated by accuracy estimation techniques like the Holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set.
Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices. For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants. Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning.
Essentials of Machine Learning Algorithms (with Python and R Codes)
Note: This article was originally published on Aug 10, 2015 and updated on Sept 9th, 2017
Google’s self-driving cars and robots get a lot of press, but the company’s real future is in machine learning, the technology that enables computers to get smarter and more personal.
The idea behind creating this guide is to simplify the journey of aspiring data scientists and machine learning enthusiasts across the world.
How it works: This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables).
Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data.
This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions.
These algorithms can be applied to almost any data problem: It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s).
In this equation: These coefficients a and b are derived based on minimizing the sum of squared difference of distance between data points and regression line.
And, Multiple Linear Regression(as the name suggests) is characterized by multiple (more than 1) independent variables. While finding best fit line, you can fit a polynomial or curvilinear regression.
It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s).
It chooses parameters that maximize the likelihood of observing the sample values rather than that minimize the sum of squared errors (like in ordinary regression).
source: statsexchange In the image above, you can see that population is classified into four different groups based on multiple attributes to identify ‘if they will play or not’.
In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.
For example, if we only had two features like Height and Hair length of an individual, we’d first plot these two variables in two dimensional space where each point has two co-ordinates (these co-ordinates are known as Support Vectors)
In the example shown above, the line which splits the data into two differently classified groups is the black line, since the two closest points are the farthest apart from the line.
It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
Step 1: Convert the data set to frequency table Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.
Yes) * P(Yes) / P (Sunny) Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64 Now, P (Yes |
However, it is more widely used in classification problems in the industry. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors.
Its procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters).
We know that as the number of cluster increases, this value keeps on decreasing but if you plot the result you may see that the sum of squared distance decreases sharply up to some value of k, and then much more slowly after that.
grown as follows: For more details on this algorithm, comparing with decision tree and tuning model parameters, I would suggest you to read these articles: Python R Code
For example: E-commerce companies are capturing more details about customer like their demographics, web crawling history, what they like or dislike, purchase history, feedback and many others to give them personalized attention more than your nearest grocery shopkeeper.
How’d you identify highly significant variable(s) out 1000 or 2000? In such cases, dimensionality reduction algorithm helps us along with various other algorithms like Decision Tree, Random Forest, PCA, Factor Analysis, Identify based on correlation matrix, missing value ratio and others.
GBM is a boosting algorithm used when we deal with plenty of data to make a prediction with high prediction power. Boosting is actually an ensemble of learning algorithms which combines the prediction of several base estimators in order to improve robustness over a single estimator.
The XGBoost has an immensely high predictive power which makes it the best choice for accuracy in events as it possesses both linear model and the tree learning algorithm, making the algorithm almost 10x faster than existing gradient booster techniques.
It is designed to be distributed and efficient with the following advantages: The framework is a fast and high-performance gradient boosting one based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Since the LightGBM is based on decision tree algorithms, it splits the tree leaf wise with the best fit whereas other boosting algorithms split the tree depth wise or level wise rather than leaf-wise.
So when growing on the same leaf in Light GBM, the leaf-wise algorithm can reduce more loss than the level-wise algorithm and hence results in much better accuracy which can rarely be achieved by any of the existing boosting algorithms.
Catboost can automatically deal with categorical variables without showing the type conversion error, which helps you to focus on tuning your model better rather than sorting out trivial errors.
My sole intention behind writing this article and providing the codes in R and Python is to get you started right away. If you are keen to master machine learning, start right away.
How To Build a Machine Learning Classifier in Python with Scikit-learn
Machine learning is a research field in computer science, artificial intelligence, and statistics.
Banks use machine learning to detect fraudulent activity in credit card transactions, and healthcare companies are beginning to use machine learning to monitor, assess, and diagnose patients.
Make sure you’re in the directory where your environment is located, and run the following command: With our programming environment activated, check to see if the Sckikit-learn module is already installed: If sklearn is installed, this command will complete with no error.
If it is not installed, you will see the following error message: The error message indicates that sklearn is not installed, so download the library using pip: Once the installation completes, launch Jupyter Notebook: In Jupyter, create a new Python Notebook called ML Tutorial.
The dataset has 569 instances, or data, on 569 tumors and includes information on 30 attributes, or features, such as the radius of the tumor, texture, smoothness, and area.
The important dictionary keys to consider are the classification label names (target_names), the actual labels (target), the attribute/feature names (feature_names), and the attributes (data).
Given the label we are trying to predict (malignant versus benign tumor), possible useful attributes include the size, radius, and texture of the tumor.
To get a better understanding of our dataset, let's take a look at our data by printing our class labels, the first data instance's label, our feature names, and the feature values for the first data instance: You'll see the following results if you run the code:
As the image shows, our class names are malignant and benign, which are then mapped to binary values of 0 and 1, where 0 represents malignant tumors and 1 represents benign tumors.
Then initialize the model with the GaussianNB() function, then train the model by fitting it to the data using gnb.fit(): After we train the model, we can then use the trained model to make predictions on our test set, which we do using the predict() function.
As you see in the Jupyter Notebook output, the predict() function returned an array of 0s and 1s which represent our predicted values for the tumor class (malignant vs.
Using the array of true class labels, we can evaluate the accuracy of our model's predicted values by comparing the two arrays (test_labels vs.
- On Thursday, February 21, 2019
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Training | Edureka
Data Science Training - ) This Machine Learning Algorithms Tutorial shall teach you what machine learning is, and the various ways in which you can use..
The Best Way to Prepare a Dataset Easily
In this video, I go over the 3 steps you need to prepare a dataset to be fed into a machine learning model. (selecting the data, processing it, and transforming it). The example I use is preparing...
The Best Way to Visualize a Dataset Easily
In this video, we'll visualize a dataset of body metrics collected by giving people a fitness tracking device. We'll go over the steps necessary to preprocess the data, then use a technique...
Natalie Hockham: Machine learning with imbalanced data sets
Classification algorithms tend to perform poorly when data is skewed towards one class, as is often the case when tackling real-world problems such as fraud detection or medical diagnosis....
Machine Learning on Geospatial Datasets for Segmentation, Prediction and Modeling
by Stuart Lynn Machine learning is powerful technique that allows us to create predictive data driven models that can learn off complex multivariable data. The geospatial world is full of...
k-Nearest Neighbor kNN with IRIS dataset
This vlog introduces k - nearest machine learning algorithm. On R its demonstrated by the IRIS dataset. We learn data exploration, sampling, modeling, scoring, evaluating.
kNN Machine Learning Algorithm - Excel
kNN, k Nearest Neighbors Machine Learning Algorithm tutorial. Follow this link for an entire Intro course on Machine Learning using R, did I mention it's FREE:
Machine learning Data analysis CaseStudy Analysis of Student Performance Dataset 1
Train an Image Classifier with TensorFlow for Poets - Machine Learning Recipes #6
Monet or Picasso? In this episode, we'll train our own image classifier, using TensorFlow for Poets. Along the way, I'll introduce Deep Learning, and add context and background on why the...
Predicting the Winning Team with Machine Learning
Can we predict the outcome of a football game given a dataset of past games? That's the question that we'll answer in this episode by using the scikit-learn machine learning library as our...