AI News, BOOK REVIEW: What are the best interview questions to evaluate a machine learning researcher?
- On Thursday, October 4, 2018
- By Read More
What are the best interview questions to evaluate a machine learning researcher?
Here I have listed some sample questions for you to refer.
But if you want to quantify practical knowledge, you can use online assessment software like Interview Mocha to see if a candidate has candid experience in Machine Learning.
Interview Mocha provides intelligent reporting which can give you an idea about candidate’s performance in each skill set.
You can invite very few candidates who have performed excellently in these tests and you will be spending your productive time with only right candidates.
41 Essential Machine Learning Interview Questions (with answers)
We’ve traditionally seen machine learning interview questions pop up in several categories.
The third has to do with your general interest in machine learning: you’ll be asked about what’s going on in the industry and how you keep up with the latest machine learning trends.
Finally, there are company or industry-specific questions that test your ability to take your general machine learning knowledge and turn it into actionable points to drive the bottom line forward.
We’ve divided this guide to machine learning interview questions into the categories we mentioned above so that you can more easily get to the information you need when it comes to machine learning interview questions.
This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set.
The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset.
For example, in order to do classification (a supervised learning task), you’ll need to first label the data you’ll use to train the model to classify data into your labeled groups.
K-means clustering requires only a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually learn how to cluster them into groups by computing the mean of the distance between different points.
It’s often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).
More reading: Precision and recall (Wikipedia) Recall is also known as the true positive rate: the amount of positives your model claims compared to the actual number of positives there are throughout the data.
Precision is also known as the positive predictive value, and it is a measure of the amount of accurate positives your model claims compared to the number of positives it actually claims.
It can be easier to think of recall and precision in the context of a case where you’ve predicted that there were 10 apples and 5 oranges in a case of 10 apples.
Mathematically, it’s expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition.
Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu.
(Quora) Despite its practical applications, especially in text mining, Naive Bayes is considered “Naive” because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components.
clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn’t carrying a baby.
More reading: Deep learning (Wikipedia) Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data.
More reading: Using k-fold cross-validation for time-series model selection (CrossValidated) Instead of using standard k-folds cross-validation, you have to pay attention to the fact that a time series is not randomly distributed data —
More reading: Pruning (decision trees) Pruning is what happens in decision trees when branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model.
For example, if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud.
More reading: Regression vs Classification (Math StackExchange) Classification produces discrete values and dataset to strict categories, while regression gives you continuous results that allow you to better distinguish differences between individual points.
You would use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.) Q21- Name an example where ensemble techniques might be useful.
They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data). You could list some examples of ensemble methods, from bagging to boosting to a “bucket of models” method and demonstrate how they could increase predictive power.
(Quora) This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations.
There are three main methods to avoid overfitting: 1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.
More reading: How to Evaluate Machine Learning Algorithms (Machine Learning Mastery) You would first split the dataset into training and test sets, or perhaps use cross-validation techniques to further segment the dataset into composite sets of training and test sets within the data.
More reading: Kernel method (Wikipedia) The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space.
This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products.
More reading: Writing pseudocode for parallel programming (Stack Overflow) This kind of question demonstrates your ability to think in parallelism and how you could handle concurrency in programming implementations dealing with big data.
For example, if you were interviewing for music-streaming startup Spotify, you could remark that your skills at developing a better recommendation model would increase user retention, which would then increase revenue in the long run.
The startup metrics Slideshare linked above will help you understand exactly what performance indicators are important for startups and tech companies as they think about revenue and growth.
Your interviewer is trying to gauge if you’d be a valuable member of their team and whether you grasp the nuances of why certain things are set the way they are in the company’s data process based on company- or industry-specific conditions.
This overview of deep learning in Nature by the scions of deep learning themselves (from Hinton to Bengio to LeCun) can be a good reference paper and an overview of what’s happening in deep learning —
More reading: Mastering the game of Go with deep neural networks and tree search (Nature) AlphaGo beating Lee Sidol, the best human player at Go, in a best-of-five series was a truly seminal event in the history of machine learning and deep learning.
The Nature paper above describes how this was accomplished with “Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play.” Want more? Brush up your skills with our free machine learning course.
45 questions to test a Data Scientist on Regression (Skill test – Regression Solution)
This skill test was designed to test your conceptual and practical knowledge of various regression techniques.
I am sure they all will agree it was the best skill assessment test on regression they have come across.
Around 530 people participate in the skilltest and the highest score was 38. Here are a few statistics about the distribution.
Which of the following step / assumption in regression modeling impacts the trade-off between under-fitting and over-fitting the most.
The use of a constant-term Solution: A Choosing the right degree of polynomial plays a critical role in fit of regression.
If we choose higher degree of polynomial, chances of overfit increase significantly.
What is leave-one out cross validation mean square error in case of linear regression (Y = bX+c)?
After fitting the line with 2 points and leaving 1 point for cross validation.
Leave one out cross validation mean square error = (2^2 +(2/3)^2 +1^2) /3 = 49/27
Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero).
In other words, it will be zero, if the test data is perfect representative of train data but not always.
Solution: C “R squared” individually can’t tell whether a variable is significant or not because each time when we add a feature, “R squared” can either increase or stay constant.
On the other hand, p-value and t-statistics merely measure how strong is the evidence that there is non zero association.
To test linear relationship of y(dependent) and x(independent) continuous variables, which of the following plot best suited?
A correlation between age and health of a person found to be -1.09. On the basis of this you would tell the doctors that: A.
Suppose we have generated the data with help of polynomial regression of degree 3 (degree 3 will perfectly fit this data).
2 and 4 Solution: C If we fit higher degree polynomial greater than 3, it will overfit the data because model will become more complex.
If we fit the lower degree polynomial less than 3 which means that we have less complex model so in this case high bias and low variance.
Both are True Solution: C 1.With small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e.
So with a small hypothesis space, it’s less likely to find a hypothesis to fit the data exactly i.e.
Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100). Now, we rescale one of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso regression with the same regularization parameter.
None of these Solution: B Big feature values =⇒ smaller coefficients =⇒ less lasso penalty =⇒ more likely to have be kept
None of above Solution: B “Ridge regression” will use all predictors in final model whereas “Lasso regression” can be used for feature selection because coefficient values can be zero.
Which of the following statement(s) can be true post adding a variable in a linear regression model?
None of the above Solution: A Each time when you add a feature, R squared always either increase or stays constant, but it is not true in case of Adjusted R squared.
The following visualization shows the fit of three different models (in blue line) on same training data.
Only 5 Solution: C The trend of the data looks like a quadratic trend over independent variable X.
A higher degree (Right graph) polynomial might have a very high accuracy on the train population but is expected to fail badly on test dataset.
= β0 + β1 X1 + β2 X2……+ βn Xn Which of the following statement(s) are true?
Suppose I applied a logistic regression model on data and got training accuracy X and testing accuracy Y.
None of these Solution: A SSE is the sum of the squared errors of prediction, so SSE = (-.2)^2 + (.4)^2 + (-.8)^2 + (1.3)^2 + (-.7)^2 = 3.02
Ignoring the plot scales (the variables have been standardized), which of the two scatter plots (plot1, plot2) is more likely to be a plot showing the values of height (Var1 –
As individuals get taller, they take up more volume, which leads to an increase in height, so a positive relationship is expected.
The plot on the right has this positive relationship while the plot on the left shows a negative relationship.
Suppose the distribution of salaries in a company X has median $35,000, and 25th and 75th percentiles are $21,000 and $53,000 respectively.
Suppose you have n datasets with two continuous variables (y is dependent variable and x is independent variable).
None of theses Solution: A In particular, if we have very few observations and it’s small, then our models can rapidly overfits data.
Because we have only a few points and as we’re increasing in our model complexity like the order of the polynomial, it becomes very easy to hit all of our observations.
On the other hand, if we have lots and lots of observations, even with really, really complex models, it is difficult to overfit because we have dense observations across our input.
bias is high, variance is high Solution: C If lambda is very large it means model is less complex.
bias is high, variance is high Solution: B If lambda is very small it means model is complex.
Out of the three residual plots given below, which of the following represent worse model(s) compared to others?
Which bold point, if removed will have the largest effect on fitted regression line as shown in above figure(dashed)?
Although c is also an outlier in given data space but it is closed to the regression line(residual is less) so it will not affect much.
In a simple linear regression model (One independent variable), If we change the input variable by 1 unit.
Which of the following function is used by logistic regression to convert the probability in the range between [0,1].
Q43: Which of the following statement is true about partial derivative of the cost functions w.r.t weights / coefficients in linear-regression and logistic-regression?
Solution: A If there are n classes, then n separate logistic regression has to fit, where the probability of each category is predicted over the rest of the categories combined.
Note: consider Y = β0 + β1*X. Here, β0 is intercept and β1 is coefficient.
Solution: B β0 and β1: β0 = 0, β1 = 1 is in X1 color(black) and β0 = 0, β1 = −1 is in X4 color (green)
If you have any suggestions or improvements you think we should make in the next skilltest, let us know by dropping your feedback in the comments section.
30 Questions to test a data scientist on Linear Regression [Solution: Skilltest – Linear Regression]
Linear Regression is still the most prominently used statistical technique in data science industry and in academia to explain relationships between features.
You missed on the real time test, but can read this article to find out how many could have answered correctly.
More than 800 people participated in the skill test and the highest score obtained was 28.
A Neural network can be used as a universal approximator, so it can definitely implement a linear regression algorithm.
Both A and B Solution: (A) In linear regression, we try to minimize the least square errors of the model to identify the line of best fit.
5) Which of the following evaluation metrics can be used to evaluate a model while modeling a continuous output variable?
A) AUC-ROCB) AccuracyC) LoglossD) Mean-Squared-Error Solution: (D) Since linear regression gives output as continuous values, so in such case we use mean squared error metric to evaluate the model performance.
A) TRUEB) FALSE Solution: (A) True, In case of lasso regression we apply absolute penalty which makes some of the coefficients zero.
Now Imagine that you are applying linear regression by fitting the best fit line using least square error on this data.
can’t judge the relationship Solution: (B) The absolute value of the correlation coefficient denotes the strength of the relationship.
A) TRUEB) FALSE Solution: (B) With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e.
13) We can also compute the coefficient of linear regression with the help of an analytical method called “Normal Equation”.
of these Solution: (B) In lasso some of the coefficient value become zero, but in case of Ridge, the coefficients become close to zero but not zero.
None of these Solution: (A) As already discussed, lasso applies absolute penalty, so some of the coefficients will become zero.
19) Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and you found that there is a relationship between them.
If there exists any relationship between them,it means that the model has not perfectly captured the information in the data.
Question Context 20-22: Suppose that you have a dataset D1 and you design a linear regression model of degree 3 polynomial and you found that the training and testing error is “0”
of these Solution: (A) Since is more degree 4 will be more complex(overfit the data) than the degree 3 model so it will again perfectly fit the data.
of these Solution: (B) If a degree 3 polynomial fits the data perfectly, it’s highly likely that a simpler model(degree 2 polynomial) might under fit the data.
will be low, variance will be low Solution: (C) Since a degree 2 polynomial will be less complex as compared to degree 3, the bias will be high and variance will be low.
Question Context 23: Which of the following is true about below graphs(A,B, C left to right) between the cost function and Number of iterations?
of these Solution: (A) In case of high learning rate, step will be high, the objective function will decrease quickly initially, but it will not find the global minima and objective function starts increasing after a few iterations.
Question Context 24-25: We have been given a dataset with n records in which we have input attribute as x and output attribute as y.
To test our linear regressor, we split the data in training set and test set randomly.
As the training set size increases, what do you expect will happen with the mean training error?
25) What do you expect will happen with bias and variance as you increase the size of training data?
26) What would be the root mean square training error for this data if you run a Linear Regression model of the form (Y = A0+A1X)?
2 and 3 Solution: (A) In case of under fitting, you need to induce more variables in variable space or you can add some polynomial degree variables to make the model more complex to be able to fir the data better.
tried my best to make the solutions as comprehensive as possible but if you have any questions / doubts please drop in your comments below.
Machine Learning Online Test
Machine Learning is the science of getting computers to act without being explicitly programmed.
Machine Learning is a part of artificial intelligence which uses statistical techniques to give computers the ability to learn with data. Machine
recruiters to find the best suitable candidate by assessing his/her ability to work on Machine Learning.
powerful reporting helps you to analyze section wise performance of candidate to gauge his/her strengths and weaknesses.
- On Sunday, June 16, 2019
Machine Learning Interview Questions and Answers | Machine Learning Interview Preparation | Edureka
Machine Learning Training with Python: ** This Machine Learning Interview Questions and Answers video will help you to ..
FAQ Answers -1 : Analytics Interview Q&A Discussion | Data Science
In this video I shall discuss ten important and basic interview questions asked in technical round of Analytics or Data Science interviews. In data science ...
Data Science Interview Questions | Data Science Tutorial | Data Science Interviews | Edureka
Data Science Training - ) This Data Science Interview Questions and Answers video will help you to prepare yourself for ..
Data Science - Scenario Based Practical Interview Questions with Answers - Part -1
Practical interview questions with answers Data Science - Scenario Based Practical Interview Questions with Answers - Machine Learning, Neural Nets.
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview Questions | Simplilearn
This Deep Learning interview questions and answers video will help you prepare for Deep Learning interviews. This video is ideal for both beginners as well as ...
Scenario Based Practical Data Science Interview Questions with Answers - Part -2
Practical interview questions with answers Data Science - Scenario Based Practical Interview Questions with Answers - Machine Learning, Neural Nets ...
Rachel Thomas - Really Quick Questions with a Fast.AI Researcher
In this interview, I ask Fast.AI researcher Rachel Thomas 67 questions about machine learning and her day to day life. She was selected by Forbes as one of “20 ...
Crack analytics and machine learning interviews at campus placements
Data Science is the future. Data Science is already making a massive impact in diverse fields and is rapidly being adopted by organizations, big and small.
Lessons Learned the Hard Way: Hacking the Data Science Interview
Galvanize Graduate Greg Kamradt shares tips and tricks for mastering Data Science Job interviews. Read notes about this talk here: ...
How to Prepare Data for Machine Learning and A.I.
In this video, Alina discusses how to Prepare data for Machine Learning and AI. Artificial Intelligence is only as powerful as the quality of the data collection, ...