AI News, Statistics is Dead – Long Live Data Science…
Statistics is Dead – Long Live Data Science…
I keep hearing Data Scientists say that ‘Statistics is Dead’, and they even have big debates about it attended by the good and great of Data Science.
If Data Scientists are correct (well, at least some of them) and statistics is dead, then either (1) we don’t need to quantify the uncertainty or (2) we have better tools than statistics to measure it.
don’t believe so and, as far as I can tell, with the explosion of data that we’re experiencing – the amount of data that currently exists doubles every 18 months – the level of uncertainty in data is on the increase.
It may be true that most statistical measures were developed decades ago when ‘Big Data’ just didn’t exist, and that the ‘old’ statistical tests often creak at the hinges when faced with enormous volumes of data, but there simply isn’t a better way of measuring uncertainty than with statistics – at least not yet, anyway.
Data Science is all about extracting knowledge from data (I think just about everyone agrees with this very vague description), and it incorporates many diverse skills, such as mathematics, statistics, artificial intelligence, computer programming, visualisation, image analysis, and much more.
On the other hand – as seems to be happening in Universities here in the UK and over the pond in the good old US of A – there are Data Science courses full of computer programmers that are learning how to handle data, use Hadoop and R, program in Python and plug their data into Artificial Neural Networks.
It’s easy to learn how to use a few tools, but much much harder to use those tools intelligently to extract valuable, actionable information in a specialised field.
There aren’t nearly as many automated statistical tools as there are visualisation tools or predictive tools, so the Data Scientists have to actually do the statistics themselves.
Turning his back on a promising academic career to do something more satisfying, as the CEO and co-founder of Chi-Squared Innovations he now works double the hours for half the pay and 10 times the stress - but 100 times the fun!
Data scientists weigh in: 5 data science tools to consider
Not so much a distinct piece of software as much as a programmatic means for creating custom algorithms, Python is the go-to for many data scientists.
Katie Malone, who started out as a particle physicist before she moved on to co-leading the data science research team at Civis Analytics Inc., said Python was her choice of the data science tools as a physicist, and she's kept on using it in the business world.
For her, one of the big draws is the strong open source ecosystem surrounding Python, which has led her to a wide variety of data science libraries to help her solve specific analytical problems.
'We have successfully deployed Python data science models for optimizing direct-to-customer marketing campaigns and life insurance underwriting and improving real-time bidding for online advertising,' Krishnan said.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured,
Data science is a 'concept to unify statistics, data analysis, machine learning and their related methods' in order to 'understand and analyze actual phenomena' with data.
Turing award winner Jim Gray imagined data science as a 'fourth paradigm' of science (empirical, theoretical, computational and now data-driven) and asserted that 'everything about science is changing because of the impact of information technology' and the data deluge.
In many cases, earlier approaches and solutions are now simply rebranded as 'data science' to be more attractive, which can cause the term to become 'dilute[d] beyond usefulness.'
In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications.
In his report, Cleveland establishes six technical areas which he believed to encompass the field of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.
In 2005, The National Science Board published 'Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century' defining data scientists as 'the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection' whose primary activity is to 'conduct creative inquiry and analysis.'
Turing award winner Jim Gray envisioned 'data-driven science' as a 'fourth paradigm' of science that uses the computational analysis of large data as primary scientific method
Similarly, in business sector, multiple researchers and analysts state that data scientists alone are far from being sufficient in granting companies a real competitive advantage
and consider data scientists as only one of the four greater job families companies require to leverage big data effectively, namely: data analysts, data scientists, big data developers and big data engineers.
Now the data in those disciplines and applied fields that lacked solid theories, like health science and social science, could be sought and utilized to generate powerful predictive models.
In an effort similar to Dhar's, Stanford professor David Donoho, in September 2015, takes the proposition further by rejecting three simplistic and misleading definitions of data science in lieu of criticisms.
Second, data science is not defined by the computing skills of sorting big data sets, in that these skills are already generally used for analyses across all disciplines.
Third, data science is a heavily applied field where academic programs right now do not sufficiently prepare data scientists for the jobs, in that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data science program.
This way, the future of data science not only exceeds the boundary of statistical theories in scale and methodology, but data science will revolutionize current academia and research paradigms.
As Donoho concludes, 'the scope and impact of data science will continue to expand enormously in coming decades as scientific data and data about science itself become ubiquitously available.'
Data Analytics vs Data Science: The Breakdown
Well-known Duke Economics professor Dan Ariely once said about big data: “Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” This concept applies to much of data terminology.
Data analysts have a range of fields and titles, including (but not limited to) database analyst, market research analyst, sales analyst, financial analyst, marketing analyst, advertising analyst, customer success analyst, operations analyst, pricing analyst, and international strategy analyst.
The Type A Data Scientist is very similar to a statistician (and may be one) but knows all the practical details of working with data that aren’t taught in the statistics curriculum: data cleaning, methods for dealing with very large data sets, visualization, deep knowledge of a particular domain, writing well about data, and so on.” DATA SCIENTIST Data
The Type B Data Scientist is mainly interested in using data ‘in production.’ They build models which interact with users, often serving recommendations (products, people you may know, ads, movies, search results).” No matter how you phrase it, people are talking about data.
- On Monday, December 16, 2019
Tools that statisticians need to work as a Data Scientist
Aimee Gott - Mango Solutions What does a statistician need to know about machine learning? The rise of data science has changed the way in which we, as ...
Data Science and Statistics: different worlds?
Chris Wiggins (Chief Data Scientist, New York Times) David Hand (Emeritus Professor of Mathematics, Imperial College) Francine Bennett (Founder, ...
Learn Basic statistics for Business Analytics
Please watch: "logistic regression case study" --~-- Learn Basic statistics for Business Analytics Business ..
Data Science Tutorial for Beginners - 1 | What is Data Science? | Data Analytics Tools | Edureka
Data Science Training - ) Data Science Blog Series: Please
Introduction to Data Science with R - Data Analysis Part 1
Part 1 in a in-depth hands-on tutorial introducing the viewer to Data Science with R programming. The video provides end-to-end data science training, including ...
The Role of Statisticians in Research
The CCTS presented “Tools of the Trade: Advancing Your Career in Biomedical Research," a program designed to introduce researchers to the resources that ...
What's the Difference Between Data Science and Analytics?
Dr. Goutam Chakraborty, professor, Oklahoma State University gives his take on the difference between data science and analytics. He also shares skills you ...
The three facets of data science
When people talk about big data, they nearly always talk about data science, and data scientists, as well. Now, just as the definition of big data is still debated, ...
Data science and data system: Dr Amy Braveman, NASA Jet Propulsion Laboratory
You can view the full event here: Dr Amy Braveman is Principal Statistician ..
R programming for beginners – statistic with R (t-test and linear regression) and dplyr and ggplot
R programming for beginners - This video is an introduction to R programming in which I provide a tutorial on some statistical analysis (specifically using the ...