AI News, Should you teach Python or R for data science?

Should you teach Python or R for data science?

Last week, I published a post titled Lessons learned from teaching an 11-week data science course, detailing my experiences and recommendations from teaching General Assembly's 66-hour introductory data science course.

Here are some questions that might help you (as educators or curriculum developers) to assess which language is a better fit for your students: If your students have some programming experience, Python may be the better choice because its syntax is more similar to other languages, whereas R's syntax is thought to be unintuitive by many programmers.

If your students don't have any programming experience, I think both languages have an equivalent learning curve, though many people would argue that Python is easier to learn because its code reads more like regular human language.

One contributing factor is that companies using a Python-based application stack can more easily integrate a data scientist who writes Python code, since that eliminates a key hurdle in 'productionizing' a data scientist's work.

The line between these two terms is blurry, but machine learning is concerned primarily with predictive accuracy over model interpretability, whereas statistical learning places a greater priority on interpretability and statistical inference.

(For example, scikit-learn makes it very easy to tune and cross-validate your models and switch between different models, but makes it much harder than R to actually 'examine' your models.) Thus, R is probably the better choice if you are teaching statistical learning, though Python also has a nice package for statistical modeling (Statsmodels) that duplicates some of R's functionality.

In R, getting started with your first model is easy: read your data into a data frame, use a built-in model (such as linear regression) along with R's easy-to-read formula language, and then review the model's summary output.

(Et cetera.) Because Python is a general purpose programming language whereas R specializes in a smaller subset of statistically-oriented tasks, those tasks tend to be easier to do (at least initially) in R.

(caret is an excellent R package that attempts to provide a consistent interface for machine learning models in R, but it's nowhere near as elegant a solution as scikit-learn.) In summary, machine learning in R tends to be a more tiresome experience than machine learning in Python once you have moved beyond the basics.

Installing new packages or upgrading existing packages from CRAN (R's package management system) is a trivial process within RStudio, and even installing packages hosted on GitHub is a simple process thanks to the devtools package.

I find data cleaning to be easier in Python because of its rich set of data structures, as well as its far superior implementation of regular expressions (which are often necessary for cleaning text).

Once you understand its core principles (its 'grammar of graphics'), it feels like the most natural way to build your plots, and it becomes easy to produce sophisticated and attractive plots.

Scalable Machine Learning in R and Python with H2O

BIDS Data Science Lecture Series | March 11, 2016 | 1:10-2:30 p.m. | 190 Doe Library, UC Berkeley Speaker: Erin LeDell, Statistician and Machine Learning ...

Getting started with Python and R for Data Science

In this video tutorial, we will take you through some common Python and R packages used for machine learning and data analysis, and go through a simple ...

Deploying Predictive Models in Python and R

By Nick Elprin. Full post with slides here:

Why Python over R for data science analytics quant research

Why Python over R for data science analysis quant research From a cool informative infographics ...

2016 SAS, R, or Python Survey Results – Which Do Analytics & Data Science Pros Prefer?

Every year Burtch Works has sent out an immensely popular and conversation-starting SAS vs. R survey, and this year by popular request we added Python to ...

Python: Top 5 Data Science Libraries

Top 5 Python data science analysis modules for developers.

R vs Python? Best Programming Language for Data Science?

R vs Python. Here I argue why Python is the best language for doing data science. Answering the question 'What is the best programming language for' is never ...

Ending the R vs Python war

Data Science Studio Free Training #6 with Eric Kramer (Dataiku's data scientist). This Free Training was recorded on September 09th, 2015. You can try Data ...

R you Ready to Python? An Introduction to Working with Land Remote Sensing Data in R and Python

Webinar quick summary: Want to learn how to use R and Python to work with remote sensing data? Join us as we demonstrate how to perform basic data ...

Implementing and Training Predictive Customer Lifetime Value Models in Python

Implementing and Training Predictive Customer Lifetime Value Models in Python by Jean-Rene Gauthier, Ben Van Dyke Customer lifetime value models (CLVs) ...