AI News, Revolutions


If your interests lean more towards traditional statistical analysis and inference as used within industries like manufacturing, finance, and the life sciences, I'd lean towards R.

But even that's not a hard-and-fast rule: R has excellent support for machine learning and deep learning frameworks, and Python is often used for traditional data science applications.

Brian Ray recently posted a good overview of the factors that may lead you towards R or Python for data science: their history, the community, performance, third-party support, use cases, and even how to use them together.

My needs generally fall on the statistics / data science end of the spectrum, and my interest in deep learning has been served well by the keras support from RStudio.

Learn Data Science - Resources for Python R

Even though it’s still hard to agree on a precise definition of data science or the role of a data scientist, the interest in the field keeps on rising: numerous blogs prescribe how to “really” learn data science, hot topics in forums such as Quora deal with discussions that relate to “becoming a data scientist”.

This post contains links to projects, news sources, books, talks, podcasts, webinars, tutorials, community pages and courses that you need to check out to learn data science.

With the popularity of the field comes a whole variety of recommendations from all sides: beginners as well as experts, all with different backgrounds, give their view on what it means to actually learn data science.

That means that the mystic square includes resources that are all complimentary to the ones that you have already encountered and registered to, as learning data science doesn’t limit itself to just one resource.

This open data community is perfect for those of you who want to join forces to solve data science problems together or easily find data.

You can also apply to become a volunteer at DataKind to boost your project experience: the timespan of your adventure you can pick, ranging from networking and quick consultation to long-term projects.

The news is maybe not the first thing that beginning data science learners are aware of, but it is certainly worth taking into account… As a beginner, subscribing to one of the newsletters can give you certain advantages: newsletters offer you the possibility to stay up to date with the latest news, the newest case studies, and projects or job offerings.

And, if you’re also a big believer in language baths to learn a language, you will also understand that really “bathing” yourself in the data science world is necessary for you to learn quickly and to make your learning as qualitative as possible.

Besides the newsletters that you might already know and receive on a regular basis, such as the bi-monthly KDNuggets newsletter or the weekly Data Elixir newsletter, we have listed some others for you to keep an eye out for: For more language-specific newsletters, you can check out: There are also some blogs that give you regular updates (and some extras): Just like with the other types of resources to learn, there has been a huge increase in the amount of books that has been published over the past years.

They are a great resource for you to learn data science because they can help you to get inspiration to do your data storytelling better, or if you’re new to data science, they can give you tips on how to approach this topic.

If you’ve already listened to the three ones that I listed, and you’re desperate to find a (good) talk quickly, just head over to the TED website and search for anything related to data, statistics, machine learning, … For talks on more specific topics, you can also head over to the R User Conference 2016 page and look into their searchable video archive.

The top videos from the Strata + Hadoop World conferences include: For those of you who are fans of talks, we have also listed interesting some podcasts that you can listen to: RStudio offers webinars on a variety of topics for those who want to learn data science with R.

The courses that really let you do data science in a qualitative way are: In the end, the number of resources will still be overwhelming, but the mystic square can definitely offer you a great place to start.

Big Data Learning Path for all Engineers and Data Scientists out there

The field of big data is quite vast and it can be a very daunting task for anyone who starts learning big data &

This article provides you a guided path to start your journey to learn big data and will help you land a job in big data industry.

To tackle this problem, I have explained each big data role in detail and also considering different job roles of engineers and computer science graduates.

have tried to answer all your questions which you have or will encounter while learning big data. To help you choose a path according to your interest I have added a tree map which will help you identify the right path.

One of the very first questions that people ask me when they want to start studying Big data is, “Do I learn Hadoop, Distributed computing, Kafka, NoSQL or Spark?” Well, I always have one answer: “It depends on what you actually want to do”.

The Big data engineering revolves around the design, deployment, acquiring and maintenance (storage) of a large amount of data.

The systems which Big data engineers are required to design and deploy make relevant data available to various consumer-facing and internal applications.

While Big Data Analytics revolves around the concept of utilizing the large amounts of data from the systems designed by big data engineers.

Broadly, based on your educational background and industry experience we can categorize each person as follows: (This includes interests and doesn’t necessarily point towards your college education).

 Thus, by using the above categories you can define your profile as follows: Eg 1: “I am a computer science grad with no experience with fairly solid math skills”.

In order to define your needs, you must know the common big data jargon. So let’s find out what does big data actually means?

Scenario 1: Design a system for analyzing sales performance of a company by creating a  data lake from multiple data sources like customer data, leads data, call center data, sales data, product data, weblogs etc.

Solution for Scenario 1: Data Lake for sales data (This is my personal solution, you may come up with a more elegant solution if you do please share below.) So, how does a data engineer go about solving the problem?

point to remember is that a big data system must not only be designed to seamlessly integrate data from various sources to make it available all the time, but it must also be designed in a way to make the analysis of the data and utilization of data for developing applications easy, fast and always available (Intelligent dashboard in this case).

But data sources like weblogs, customer interactions/call center data, image data from the sales catalog, product advertising data.

This is a bit different than any conventional domains like data science and machine learning where you start at something and endeavor to complete everything in the field.

Even though some of the technologies in the tree are pointed to be data scientist’s forte but it is always good to know all the technologies till the leaf nodes if you embark on a path.

But don’t worry, if you do not want to code in these languages ou can choose Python or R because most of the big data technologies now support Python and R extensively.

For a Data Scientist capable of working with big data you need to add a couple of machine learning pipelines to the tree below and concentrate on the machine learning pipelines more than the tree provided below.

And providing a definitive answer to what type of NoSQL database you need to take into account your system requirements like latency, availability, resilience, accuracy and of course the type of data that you are dealing with.

How To Ace Data Science Interviews: R  Python

A big part of data scientists’ day-to-day involves manipulating, analyzing, and visualizing data in an interactive programming environment.

In my view, Python’s strength for data science is in its ability to serve as a real backend language for production systems, meaning any modeling you do as a data scientist can potentially be implemented with little effort on a live website or software product.

While R supports all the standard CS data structures and techniques such as arrays and for loops, it really excels when you’re working on a rectangular data set, like you’d see in a typical spreadsheet program.

Unlike a spreadsheet though, you can still take advantage of computer science concepts like iteration and abstraction (more on these below) which makes it orders of magnitude more powerful than something like Excel.

Additionally, R is the de facto language for quantitative researchers in academia, meaning that the most cutting edge statistical techniques are often available as R packages long before they make their way to any other place, including Python.

So if your primary workflow involves doing offline analysis and data visualization, and especially if you want access to state-of-the-art statistical packages, R is where you want to be.

But again, you can’t really go wrong — both are amazingly useful, and chances are you can make either language do whatever data science task you’re trying to accomplish.

It manages packages, provides access to help files, displays visualizations, and gives you a nice, customizable text editor along with your console.

Instead of a local IDE, Jupyter provides a browser-based notebook, that lets you separate your code into executable chunks, so you run each piece of code and analysis one at a time.

Iteration is an important concept in computer science and is deeply connected to data structures— essentially it’s a way to perform an operation on each item within a data structure.

This might seem like a lot, but once you learn the concept you’ll find that all these different options are just various ways of applying the same fundamental concept: take a data structure and do something with each of it’s elements.

While this example might not save much time or code, functions can get much more complex, at which point writing a function for repetitive tasks can make you code much more readable and concise.

In Python this means getting to know Pandas, a package that provides an entire framework for operating on data frames, rectangular data sets with rows and columns.

While R has native support for rectangular data sets in the form of matrices and data frames, you’ll still make your life much easier by learning either dplyr or data.table.

Each of these packages provides a great interface for manipulating data frames that is better than base R: dplyr is far more intuitive and readable, while data.table is faster and has more concise syntax.

As with SQL, help out your interviewer by talking through your code as you write, so they know what you’re thinking and can give partial credit in the case that you don’t quite get to a final answer.

For example, a simple linear regression that might take you hours if you were to code it from scratch can be executed with just: In Python, you’ll need at least the Numpy and Scipy packages to make sure you have your basic statistical functions covered, but like R, once those are installed you’ll be good to go.

Besides whiteboarding questions, lots of data science interviews will have a takehome component, that asks you to take a sample data set, analyze it and draw some conclusions.

few tips for visualizations in a take home assignment: title your graphs, zero and label your axes, include error bars if applicable, and pick a few colors and use them consistently.

But really once you know the basics of your preferred programming language, it’s all about getting comfortable with a few key tools: data manipulation, statistics, and visualization.

Python Overtakes R for Data Science and Machine Learning

This article summarizes a trend in programming languages usage, based on a number of proxy metrics.

This change started to be more pronounced in early 2017: Python became the language of choice, over R, for data science and machine learning applications.  Statistics from Google Google has one app called Google Trend to find out trends about specific subjects, to compare interest for a number of search topics, broken down by region or time period. 

Search index for Python Data Science (blue) versus R Data Science (red) over the last 5 years, in US We used the app in question to compare search interest for R data Science versus Python Data Science, see above chart.

Top cities in US are: 7,533 full time jobs Our internal statistics We have 83 fresh, active job ads, relevant to data science and mostly in US and London, for Python: you can check them out here.

R Programming For Beginners | Data Science Tutorial | Simplilearn

Become an expert in the various data analytics techniques using R. Master the data exploration, data visualization, predictive analytics, and descriptive analytics ...

Natural Language Processing (NLP) Tutorial | Data Science Tutorial | Simplilearn

Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between ...

NetCD what? An Ecologist’s Guide to Working with Daymet and other NetCDF formatted Data

During this webinar, we will provide an overview of the netCDF data format and show participants how to open, read, write and plot data that is in the netCDF ...

Beginner - Data Science Project

The best way to learn data science and showcase your skills is by doing some actual projects – we learn best by doing. So, how do we choose a project to work ...

Building dataset - p.4 Data Analysis with Python and Pandas Tutorial

In this part of Data Analysis with Python and Pandas tutorial series, we're going to expand things a bit. Let's consider that we're multi-billionaires, ...

Data Analytics Overview | Data Science With Python Tutorial

The Data Science with Python course is designed to impart an in-depth knowledge of the various libraries and packages required to perform data analysis, data ...

Data Science With Python | Data Science Tutorial | Simplilearn

The Data Science with Python course is designed to impart an in-depth knowledge of the various libraries and packages required to perform data analysis, data ...

Alexander Müller - Spatial Range Queries Using Python In-Memory Indices

When you're working with a spatial dataset a common use case is that you need to get points of interests that are within a certain radius of a reference point, also ...

Python Tutorial: Exploratory Data Analysis

Learn more about exploratory data analysis with Python: Yogi Berra said, "You can ..

Why R - Data Analysis with R

This video is part of an online course, Data Analysis with R. Check out the course here: This course was designed as part ..