When people ask me what it means to be a data scientist, I used to answer, “it means you don’t have to hold my hand.”

By which I meant that as a data scientist (a consulting data scientist), I can handle the data collection, the data cleaning and wrangling, the analysis, and the final presentation of results (both technical and for the business audience) with a minimal amount of assistance from my clients or their people.

This used to be a key selling point, because people with all the necessary skills used to be relatively rare.

So in addition to the emphasis that Mason and Wiggins place on scripting languages and unix tools, I would also add knowledge of SQL, and a tool like R that can access data directly from the database for analysis.

would add that a solid understanding of statistics fundamentals is essential (and the whole Win-Vector blog attests to how much time we spend thinking about fundamentals), but stat and machine learning are not the core of the job.

Does gross national product really predict mortgage defaults, or is it just a proxy variable for time (and in the recent economy, time predicts mortgage default rate pretty well)?

We all come into the job hoping to wield support vector machines or neural nets like Wonder Woman wields her magic lasso: we capture the data, and then wrest the truth out of it, willy-nilly.

