AI News, 2014 Data Science Salary Survey

2014 Data Science Salary Survey

Contributed by John King and Roger Magoulas If there’s one thing folks in the data community are good at, it’s using their analytic skills to find high paying jobs.

The last three years, O’Reilly Media has been running an anonymous Data Science Salary Survey, in concert with the Strata + Hadoopworld Conference to look at what factors most affect the salaries of data analysts and data engineers.

Looking at what tools best correlate with the highest salaries helps identify market conditions where demand for skills is greater than the supply of workers (geography plays a role in supply and demand, we see the highest salaries in tech-intensive California, Texas, the Northwest and Northeast (MA to VA)).

The clusters do show that using one tool in a cluster increases the probability of using another tool in the cluster, and many respondents tended towards one or two clusters, using few tools from the others.

That Cluster 2 (Hadoop Ecosystem) contains a large set of tools reflects two characteristics of the cluster: respondents in the cluster use the highest number of tools (18-19, almost double what we see for some of the tools in Cluster 1);

When looking at the Clusters and how they fit into the regression model we built from the data, we see respondents in Cluster 1 (Proprietary Analytics), with slightly lower salaries, while respondents in Cluster 2 (Hadoop Ecosystem) and Cluster 3 (Data Science) with slightly higher salaries relative to the mean.

Looking at results from the salary survey can provide some guidance on tools in the greatest demand, information you can use to help set your learning path and career choices.

That’s the conclusion of a couple of recent surveys that found that data analysts and engineers with big data chops are earning more than $120,000, compared with the reported average IT salary of $89,450.

than median salaries earned by data analysts and engineers across other industries, according to a recent salary survey by O’Reilly Media, which also analyzed the tools used by data professionals.

It uses in-memory primitives and other enhanced technologies to outperform MapReduce and offers more computational options, with tool libraries for enhanced SQL querying, streaming data analytics, machine learning and more.

The more tools a data professional used, the higher the salary, with those using up to 10 tools earning a median salary of $82,000 rising to $110,000 for those using 11 to 20 tools and $143k for those using more than 20.

These clusters were: After discarding clusters 4 and 5 because they were not significant indicators of salary, O’Reilly determined that users of Cluster 2 and 3 tools earn more, with each tool from Cluster 2 contributing $1,645 to the expected total salary and each tool from Cluster 3 contributing $1,900.

The report confirms trends that have been evolving for some time: Hadoop is on the rise, cloud-based data services are important and those who know how to use the advanced, recently developed tools of big data typically earn high salaries.


The median base salary of all respondents was $91k, rising to $98k for total salary (this includes the respondents’ estimates of their non-salary compensation).Following standard practice, median figures are given (the right skew of the salary distribution means that individuals with particularly high salaries will push up the average).

However, since respondents were asked to report their salary to the nearest $10k, the median (and other quantile) calculations are based on a piecewise linear map that uses points at the centers and borders of the respondents’ salary values.

Government employees earn relatively low salaries (the government, science and technology, and education sectors had the lowest median salaries), although respondents who work for government vendors reported higher salaries.

Employees from larger companies reported higher salaries than those from smaller companies, while public companies and late startups had higher median salaries ($106k and $112k) than private companies ($90k) and early startups ($89k).

The interquartile range of early startups was huge – $34k to $135k – so while many early startup employees do make a fraction of what their counterparts at more established companies do, others earn comparable salaries.

Which Data Scientists Earn The Most Money?

The median salary in the U.S. including non-salary compensation was $144,000.

This being a data science survey the authors created a regression model in order to determine how much different factors affected salary.

Regression models can be used to predict the value of one variable based on the values of others, e.g., predict salary based on demographic data or tool usage.

Being female (only 15% of respondents were women) means you earn $17,294 less than your colleagues, an amount consistent with the gender gap as a whole and similar to the $17,318 toll that working at an early stage startup takes from a data professional’s paycheck.

Split your data file by a categorical variable in SPSS

How to use the 'Split File' tool in SPSS to split your data file by a categorical variable. In this example, I split my file by gender so that I can analyse data for males ...

Use an Excel Pivot Table to Group Data by Age Bracket

Several viewers asked me to demonstrate some other ways to -Group a Field- in a Pivot Table. In this tutorial, I show you how to create a Frequency Report by ...

How To Make a Survey for Data Collection? 12 Important Considerations

This video discusses 12 things to keep in mind when designing a survey for data collection (mostly undergrad research considerations). Related videos: ...

What Can Conjoint Analysis Do for You?

This video is a fun introduction to the classic market research technique, conjoint analysis. Help Jane figure out how to build and market a better "bazoogle" to ...

Lecture 01

introduction to Business analytics and data mining modeling using R studio Discussions on key terms used for data mining finally discuss the course roadmap.

Find z-Scores in SPSS; How to Standardize a Variable; Find z Scores

This video shows how to standardize a variable using SPSS. Standardizing a variable puts them in z score form, where the mean is equal to zero and the ...

Projective Technique, Case Study

Survey Shows What Women Really Want

Subscribe to our channel: A survey by Ginger Consulting shows that women are focused on their finances and making important ..

December 10, 2013: CDC's National Healthy Worksite Assessment Tools

This webinar will discuss the development and use of the Worksite Health Assessment tools developed specifically for the National Healthy Worksite Program ...

Intersections of Data, Policy, and Practice

Joint presentations of six projects that showcase collaborations between the University of Chicago and NORC. The participants are: • Kathleen Cagney, ...