AI News, BOOK REVIEW: How to Ace a Data ScienceInterview

How to Ace a Data ScienceInterview

As I mentioned in my first post, I have just finished an extensive tech job search, which featured eight on-sites, along with countless phone screens and informal chats. I was interviewing for a combination of data science and software engineering (machine learning) positions, and I got a pretty good sense of what those interviews are like.

During the interview phase of the process, your recruiter is on your side and can usually tell you what types of interviews you’ll have. Even if the recruiter is reluctant to share that, common practices in the industry are a good guide to what you’re likely to see.

Data science roles generally fall into two broad ares of focus: statistics and machine learning. I only applied to the latter category, so that’s the type of position discussed in this post.

Always: Often: You will encounter a similar set of interviews for a machine learning software engineering position, though more of the questions will fall in the coding category.

figuring out which product to recommend to a user, which users are going to stop using the site, which ad to display, etc.), but can also be a toy example (e.g.

recommending board games to a friend). This type of interview doesn’t depend on much background knowledge, other than having a general understanding of machine learning concepts (see below).

In the latter case, you’ll generally have a discussion with the interviewer about some plausible definitions (e.g., what does it mean for a user to “stop using the site”?).

Think about what might be predictive of the variable you are trying to predict, and what information you would actually have available. I’ve found it helpful to give context around what I’m trying to capture, and to what extent the features I’m proposing reflect that information.

But maybe some purchases were mistakes, and you vowed to never buy a book like that again. Well, Amazon knows how you’ve interacted with your Kindle books. If there’s a book you started but never finished, it might be a positive signal for general areas you’re interested in, but a negative signal for the particular author.

The project doesn’t have to be directly related to the position you’re interviewing for (though it can’t hurt), but it needs to be the kind of work you can have an in-depth technical discussion about.

Should I look over the syntax for training a model in scikit-learn?) I also had one recruiter tell me I’d be analyzing “big data”, which was a bit intimidating (am I going to be working with distributed databases or something?) until I discovered at the interview that the “big data”

41 Essential Machine Learning Interview Questions (with answers)

We’ve traditionally seen machine learning interview questions pop up in several categories.

The third has to do with your general interest in machine learning: you’ll be asked about what’s going on in the industry and how you keep up with the latest machine learning trends.

Finally, there are company or industry-specific questions that test your ability to take your general machine learning knowledge and turn it into actionable points to drive the bottom line forward.

We’ve divided this guide to machine learning interview questions into the categories we mentioned above so that you can more easily get to the information you need when it comes to machine learning interview questions.

This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set.

The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset.

For example, in order to do classification (a supervised learning task), you’ll need to first label the data you’ll use to train the model to classify data into your labeled groups.

K-means clustering requires only a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually learn how to cluster them into groups by computing the mean of the distance between different points.

It’s often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).

More reading: Precision and recall (Wikipedia) Recall is also known as the true positive rate: the amount of positives your model claims compared to the actual number of positives there are throughout the data.

Precision is also known as the positive predictive value, and it is a measure of the amount of accurate positives your model claims compared to the number of positives it actually claims.

It can be easier to think of recall and precision in the context of a case where you’ve predicted that there were 10 apples and 5 oranges in a case of 10 apples.

Mathematically, it’s expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition.

Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu.

(Quora) Despite its practical applications, especially in text mining, Naive Bayes is considered “Naive” because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components.

clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn’t carrying a baby.

More reading: Deep learning (Wikipedia) Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data.

More reading: Using k-fold cross-validation for time-series model selection (CrossValidated) Instead of using standard k-folds cross-validation, you have to pay attention to the fact that a time series is not randomly distributed data —

More reading: Pruning (decision trees) Pruning is what happens in decision trees when branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model.

For example, if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud.

More reading: Regression vs Classification (Math StackExchange) Classification produces discrete values and dataset to strict categories, while regression gives you continuous results that allow you to better distinguish differences between individual points.

You would use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.) Q21- Name an example where ensemble techniques might be useful.

They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data).  You could list some examples of ensemble methods, from bagging to boosting to a “bucket of models” method and demonstrate how they could increase predictive power.

(Quora) This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations.

There are three main methods to avoid overfitting: 1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.

More reading: How to Evaluate Machine Learning Algorithms (Machine Learning Mastery) You would first split the dataset into training and test sets, or perhaps use cross-validation techniques to further segment the dataset into composite sets of training and test sets within the data.

More reading: Kernel method (Wikipedia) The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space.

This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products.

More reading: Writing pseudocode for parallel programming (Stack Overflow) This kind of question demonstrates your ability to think in parallelism and how you could handle concurrency in programming implementations dealing with big data.

For example, if you were interviewing for music-streaming startup Spotify, you could remark that your skills at developing a better recommendation model would increase user retention, which would then increase revenue in the long run.

The startup metrics Slideshare linked above will help you understand exactly what performance indicators are important for startups and tech companies as they think about revenue and growth.

Your interviewer is trying to gauge if you’d be a valuable member of their team and whether you grasp the nuances of why certain things are set the way they are in the company’s data process based on company- or industry-specific conditions.

This overview of deep learning in Nature by the scions of deep learning themselves (from Hinton to Bengio to LeCun) can be a good reference paper and an overview of what’s happening in deep learning —

More reading: Mastering the game of Go with deep neural networks and tree search (Nature) AlphaGo beating Lee Sidol, the best human player at Go, in a best-of-five series was a truly seminal event in the history of machine learning and deep learning.

The Nature paper above describes how this was accomplished with “Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play.” Want more?  Brush up your skills with our free machine learning course.

The Most Comprehensive Data Science & Machine Learning Interview Guide You’ll Ever Need

Are you aspiring to become a data scientist, but struggling to crack the interviews?

In this article, we provide you with a comprehensive list of questions, case studies and guesstimates asked in data science and machine learning interviews.

We have also listed additional resources including handy tips and tricks to guide you through your interview process and come out on the other side successfully.

From probability to correlation, linear and regression to logistic regression, your concepts will be set in stone by the time you reach the last question!

If you can answer and understand these questions, rest assured, you will give a tough fight in your job interview.

It seems easy from the outside but it has it’s own tricky features. If you are learning statistical concepts, you are bound to face these questions which mostly people try to avoid.

Your statistical concepts should be rock solid before you go for an interview in this field. To help you improve and test your knowledge on statistics, we have put together this list of questions.

Linear Regression is still one of the most prominently used statistical techniques in the data science industry and in academia to explain relationships between features.

Logistic Regression is likely the most commonly used algorithm for solving all classification problems. The questions in this article are especially designed for you to test your knowledge on logistic regression and its nuances.

If you are a data scientist (or an aspiring one), then you need to be good at Machine Learning – no two ways about it. These questions have been designed to test your conceptual knowledge of machine learning and make you industry ready.

This trait is particularly important in business context when it comes to explaining a decision to stakeholders, which makes an integral part of the interview process as well.

‘Support Vector Machines’ is like a sharp knife – it works on smaller datasets, but on them, it can be much more stronger and powerful in building models.

One of the most common questions in interviews is based on how you will deal with a massive dataset that consists of millions of rows and thousands of columns.

Deep learning is the hottest research field in the industry right now. It has led to amazing innovations, incredible breakthroughs, and we are only just getting started!

With big players like Google and IBM launching automated platforms to build image classification models, the interest in this field is pretty high. The questions in this article are especially designed for you to test your knowledge on how to handle image data, with an emphasis on image processing.

Taxi aggregators have become a MASSIVE deal in certain parts of the country.  In this article, we’ll solve a case study of taxi aggregators.

You are given data about alternate roads and have to figure out possible routes that minimize the time taken to travel.

In this article, we will look at a real life case in the form of a call center optimization problem. This case study will give you a good feel of how to simulate an entire environment in such an operation intensive function.

If you aspire to become a data scientist, your out-of-the-box thinking and ability to quickly calculate and structure your thoughts is critical.

You will be given a puzzle or a guess estimate questions (sometimes both) to see how quickly and logically you solve a challenging problem.

In this article, the author has covered some of the trickiest and most challenging puzzles he was given while interviewing for data science roles. These questions have been asked at companies like Goldman Sachs, Amazon, Google, JP Morgan, etc.

In case you were not able to crack two of the puzzles within the given time limits, you might need to solve different variety of puzzles to get a hang of these types of questions.

The puzzles are divided into 3 stages and you have not been given solutions to the first stage, If you don’t get those answers yourself, you might need to go through puzzle solving from scratch!

Questions on tools are a mandatory part of every data science interview and you should have certain things already in your mind before you face the panel.

is one of the most popular languages in use today, thanks to it’s open source nature and an excellent user community.

These questions are widely asked in companies that have a broad analytics base and deal with big data on a daily basis.

A lot of candidates often take these tips for granted, and end up getting disappointed when the offer letter fails to materialize.

However, you can train yourself to make sure that you present your best when it matters the most. This article provides some tips using which you can blaze through any analytics interview.

Things like body language, the way you structure your thoughts, your awareness of the industry, domain knowledge and how caught up you are with all the latest developments in machine learning –

As an analyst, getting into details and studying them carefully, almost becomes second nature to you. In an interview, you will likely be interviewed by someone who has been an analyst for a longer duration that you have been. Hence, you should expect a thorough and close examination of minute details.

It covers aspects like the different points the employer judges you on, the different stages of an interview, how a technical interview is conducted, etc. This guide is meant to help you ace the next analytics interview you sit for!

An average working person in weekday spends 25-30% of his time sleeping, 40-60% of his time working , 10% of time eating and 15-25% idle.

This is a list of questions that you should ask your prospective employer before taking up a job in Analytics. The aim of these questions is to make sure you know what you are getting into.

After 8 years of working in the Quality Assurance field, she managed to carve out a career in the data science field through hard work, application and some luck.

He has also given some nifty tips and a heavy dose of inspiration and experience which everyone in his position can lean on to get their first break.

Data Science and Machine Learning Interview Questions

Ah the dreaded machine learning interview.

To give you a bit of perspective, I was in graduate school in the last few months of my masters in machine learning and computer vision with most of my previous experience being research/academic, but with 8 months at an early stage startup (unrelated to ML).

I’m going to simply list the most common ones since there’s many resources about them online and go more in depth into some of the less common and trickier ones.

40 Interview Questions asked at Startups in Machine Learning / Data Science

Machine learning and data science are being looked as the drivers of the next industrial revolution happening in the world today.

This also means that there are numerous exciting startups looking for data scientists.  What could be a better start for your aspiring career!

You might also find some real difficult techincal questions on your way. The set of questions asked depend on what does the startup do.

If you can answer and understand these question, rest assured, you will give a tough fight in your job interview.

(You are free to make practical assumptions.) Answer: Processing a high dimensional data on a limited memory machine is a strenuous task, your interviewer would be fully aware of that.

Not to forget, that’s the motive of doing PCA where, we aim to select fewer components (than features) which can explain the maximum variance in the data set.

By doing rotation, the relative location of the components doesn’t change, it only changes the actual coordinates of the points.

If we don’t rotate the components, the effect of PCA will diminish and we’ll have to select more number of components to explain variance in the data set.

Answer: This question has enough hints for you to start thinking! Since, the data is spread across median, let’s assume it’s a normal distribution.

We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected.

In an imbalanced data set, accuracy should not be used as a measure of performance because 96% (as given) might only be predicting majority class correctly, but our class of interest is minority class (4%) which is the people who actually got diagnosed with cancer.

Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine class wise performance of the classifier.

On the other hand, a decision tree algorithm is known to work best to detect non –

The reason why decision tree failed to provide robust predictions because it couldn’t map the linear relationship as good as a regression model did.

Therefore, we learned that, a linear regression model can provide robust prediction given the data set satisfies its linearity assumptions.

You are assigned a new project which involves helping a food delivery company save more money.

A machine learning problem consist of three things: Always look for these three factors to decide if machine learning is a tool to solve a particular problem.

Discarding correlated variables have a substantial effect on PCA because, in presence of correlated variables, the variance explained by a particular component gets inflated.

For example: You have 3 variables in a data set, of which 2 are correlated. If you run PCA on this data set, the first principal component would exhibit twice the variance than it would exhibit with uncorrelated variables.

Answer: As we know, ensemble learners are based on the idea of combining weak learners to create strong learners.

For example: If model 1 has classified User1122 as 1, there are high chances model 2 and model 3 would have done the same, even if its actual value is 0.

kmeans algorithm partitions a data set into clusters such that a cluster formed is homogeneous and the points in each cluster are close to each other.

kNN algorithm tries to classify an unlabeled observation based on its k (can be any number ) surrounding neighbors.

In absence of intercept term (ymean), the model can make no such evaluation, with large denominator, ∑(y - y´)²/∑(y)² equation’s value becomes smaller than actual, resulting in higher R².

In addition, we can use calculate VIF (variance inflation factor) to check the presence of multicollinearity. VIF value <= 4 suggests no multicollinearity whereas a value of >= 10 implies serious multicollinearity.

Answer: You can quote ISLR’s authors Hastie, Tibshirani who asserted that, in presence of few variables with medium / large sized effect, use lasso regression.

Conceptually, we can say, lasso regression (L1) does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model.

No, we can’t conclude that decrease in number of pirates caused the climate change because there might be other factors (lurking or confounding variables) influencing this phenomenon.

Therefore, there might be a correlation between global average temperature and number of pirates, but based on this information we can’t say that pirated died because of rise in global average temperature.

For example: if we calculate the covariances of salary ($) and age (years), we’ll get different covariances which can’t be compared because of having unequal scales.

In boosting, after the first round of predictions, the algorithm weighs misclassified predictions higher, such that they can be corrected in the succeeding round.

This sequential process of giving higher weights to misclassified predictions continue until a stopping criterion is reached.

In simple words, the tree algorithm find the best possible feature which can divide the data set into purest possible children nodes.

Training error 0.00 means the classifier has mimiced the training data patterns to an extent, that they are not available in the unseen data.

Hence, when this classifier was run on unseen sample, it couldn’t find those patterns and returned prediction with higher error.

Answer: In such high dimensional data sets, we can’t use classical regression techniques, since their assumptions tend to fail.

n, we can no longer calculate a unique least square coefficient estimate, the variances become infinite, so OLS cannot be used at all.

To combat this situation, we can use penalized regression methods like lasso, LARS, ridge which can shrink the coefficients to reduce variance.

(Hint: Think SVM) Answer: In case of linearly separable data, convex hull represents the outer boundaries of the two group of data points.

Using one hot encoding, the dimensionality (a.k.a features) in a data set get increased because it creates a new variable for each level present in categorical variables.

In label encoding, the levels of a categorical variables gets encoded as 0 and 1, so no new variable is created.

In time series problem, k fold can be troublesome because there might be some pattern in year 4 or 5 which is not in year 3.

They exploit behavior of other users and items in terms of transaction history, ratings, selection and purchase information.

In the context of confusion matrix, we can say Type I error occurs when we classify a value as positive (1) when it is actually negative (0).

On the contrary, stratified sampling helps to maintain the distribution of target variable in the resultant distributed samples also.

We will consider adjusted R² as opposed to R² to evaluate model fit because R² increases irrespective of improvement in prediction accuracy as we add more variables.

For example: a gene mutation data set might result in lower adjusted R² and still provide fairly good predictions, as compared to a stock market data where lower adjusted R² implies that model is not good.

Example: Think of a chess board, the movement made by a bishop or a rook is calculated by manhattan distance because of their respective vertical &

If the business requirement is to build a model which can be deployed, then we’ll use regression or a decision tree model (easy to interpret and explain) instead of black box algorithms like SVM, GBM etc.

A high bias error means we have a under-performing model which keeps on missing important trends. Variance on the other side quantifies how are the prediction made on same observation different from each other.

A high variance model will over-fit on your training population and perform badly on any observation beyond training.

Answer: OLS and Maximum likelihood are the methods used by the respective regression methods to approximate the unknown parameter (coefficient) value.

In simple words, Ordinary least square(OLS) is a method used in linear regression which approximates the parameters resulting in minimum distance between actual and predicted values. Maximum Likelihood helps in choosing the the values of parameters which maximizes the likelihood that the parameters are most likely to produce observed data.

These questions are meant to give you a wide exposure on the types of questions asked at startups in machine learning.

I interviewed at five top companies in Silicon Valley in five days, and luckily got five job offers

After talking with my wife and gaining her full support, I decided to take actions and make my first ever career change.

Although I’m interested in machine learning positions, the positions at the five companies are slightly different in the title and the interviewing process.

While I agree that coding interviews might not be the best way to assess all your skills as a developer, there is arguably no better way to tell if you are a good engineer in a short period of time.

I spent several weeks going over common data structures and algorithms, then focused on areas I wasn’t too familiar with, and finally did some frequently seen problems.

Many questions can be asked during system design interviews, including but not limited to system architecture, object oriented design,database schema design,distributed system design,scalability, etc.

For the most part I read articles on system design interviews, architectures of large-scale systems, and case studies.

Here are some resources that I found really helpful: Although system design interviews can cover a lot of topics, there are some general guidelines for how to approach the problem: With all that said, the best way to practice for system design interviews is to actually sit down and design a system, i.e.

For example, if you use HBase, rather than simply using the client to run some DDL and do some fetches, try to understand its overall architecture, such as the read/write flow, how HBase ensures strong consistency, what minor/major compactions do, and where LRU cache and Bloom Filter are used in the system.

Many blogs are also a great source of knowledge, such as Hacker Noon and engineering blogs of some companies, as well as the official documentation of open source projects.

Make sure you understand basic concepts such as bias-variance trade-off, overfitting, gradient descent, L1/L2 regularization,Bayes Theorem,bagging/boosting,collaborative filtering,dimension reduction, etc.

Try not to merely using the API for Spark MLlib or XGBoost and calling it done, but try to understand why stochastic gradient descent is appropriate for distributed training, or understand how XGBoost differs from traditional GBDT, e.g.

After a failed attempt at a rock star startup (which I will touch upon later), I prepared hard for several months, and with help from my recruiters, I scheduled a full week of onsites in the Bay Area.

I flew in on Sunday, had five full days of interviews with around 30 interviewers at some best tech companies in the world, and very luckily, got job offers from all five of them.

The upside is that you might benefit from the hot hand and the downside is that the later ones might be affected if the first one does not go well, so I don’t recommend it for everyone.

More surprisingly, Google even let me skip their phone screening entirely and schedule my onsite to fill the vacancy after learning I had four onsites coming in the next week.

New products such as Experiences and restaurant reservation, high end niche market, and expansion into China all contribute to a positive prospect.

Airbnb’s coding interview is a bit unique because you’ll be coding in an IDE instead of whiteboarding, so your code needs to compile and give the right answer.

With its product lines dominating the social network market and big investments in AI and VR, I can only see more growth potential for Facebook in the future.

I was extremely thrilled because 1) I use Spark and love Scala, 2) Databricks engineers are top-notch, and 3) Spark is revolutionizing the whole big data world.

The bar is very high and the process is quite long, including one pre-screening questionnaire, one phone screening, one coding assignment, and one full onsite.

For several weeks I was on a regular schedule of preparing for the interview till 1am, getting up at 8:30am the next day and fully devoting myself to another day at work.

Interviewing at five companies in five days was also highly stressful and risky, and I don’t recommend doing it unless you have a very tight schedule.

I’d like to thank all my recruiters who patiently walked me through the process, the people who spend their precious time talking to me, and all the companies that gave me the opportunities to interview and extended me offers.

Lastly but most importantly, I want to thank my family for their love and support — my parents for watching me taking the first and every step, my dear wife for everything she has done for me, and my daughter for her warming smile.

Machine Learning Interview Questions And Answers | Data Science Interview Questions | Simplilearn

This Machine Learning Interview Questions And Answers video will help you prepare for Data Science and Machine learning interviews. This video is ideal for ...

Data Science: Reality vs Expectations ($100k+ Starting Salary 2018)

Skillshare might not like this. You can sign up for a 2 month trial for Skillshare, complete the data science course and then cancel your membership before being ...

How To Begin Your Presentation with Simon Sinek

Watch the entire class on! Simon Sinek ( is an author best known for popularizing the .

How to: Work at Google — Example Coding/Engineering Interview

Watch our video to see two Google engineers demonstrate a mock interview question. After they code, our engineers highlight best practices for interviewing at ...

UML Class Diagram Tutorial

Learn how to make classes, attributes, and methods in this UML Class Diagram tutorial. There's also in-depth training and examples on inheritance, aggregation ...

Interview with a Data Analyst

This video is part of the Udacity course "Intro to Programming". Watch the full course at

How To Prepare For a Job Interview

Free Training To A Brand New High-End Career (limited time only 2018) Learn how to get a job in digital marketing ..

Java Programming

Cheat Sheet is Here : Slower Java Tutorial : How to Install Java & Eclipse : Best Java Book

How to Predict Stock Prices Easily - Intro to Deep Learning #7

We're going to predict the closing price of the S&P 500 using a special type of recurrent neural network called an LSTM network. I'll explain why we use ...