# AI News, The real prerequisite for machine learning isn’t math, it’s data analysis

- On Wednesday, June 27, 2018
- By Read More

## The real prerequisite for machine learning isn’t math, it’s data analysis

When beginners get started with machine learning, the inevitable question is “what are the prerequisites?

And once they start researching, beginners frequently find well-intentioned but disheartening advice, like the following: You need to master math.

If you’re intimidated by the math, I have some good news for you: in order to get started building machine learning models (as opposed to doing machine learning theory), you need less math background than you think (and almost certainly less math than you’ve been told that you need).

In fact, even if you can get by without having a masterful understanding of calculus and linear algebra, there are other prerequisites that you absolutely need to know (thankfully, the real prerequisites are much easier to master).

If you’re a beginner and your goal is to work in industry or business, math is not the primary prerequisite for machine learning.

Moreover, the incentives shape the training of people entering academia: students in an academic environment are trained to be productive largely as scholars and researchers.

In an academic environment, individuals are rewarded (largely) for producing novel research, and in the context of ML, that truly does require a deep understanding of the mathematics that underlies machine learning and statistics.

They imagine that data scientists spend their days pensively standing at a whiteboard, scribbling math equations between sips of coffee.

If we’re talking about entry level data scientists to intermediate level data scientists, I’d estimate that they spend less than 5% of their time actually doing mathematics.

(And quite frankly, most entry-level data scientists won’t spend much of their time on ML.) When you build a model, you will spend very, very little time doing any math.

But most data scientists do spend a huge amount of their time getting data, cleaning data, and exploring data.

For beginning practitioners (i.e., hackers, coders, software engineers, and people working as data scientists in business and industry) you don’t need to know that much calculus, linear algebra, or other college-level math to get things done.

Although at high levels there are some data scientists who need deep mathematical skill, at a beginning level – I repeat – you do not need to know calculus and linear algebra in order to build a model that makes accurate predictions.

tools like R’s caret and Python’s scikit-learn – tools that do much of the hard math for you – you won’t be able to make these tools work without a solid understanding of exploratory data analysis and data visualization.

that’s sort of a shorthand way of saying “80% of your work will be getting data (from databases, spreadsheets, flat-files), performing exploratory data analysis, reshaping data, visualizing data to find insights, and using EDA.”

While this figure is about data science in general, it also applies to machine learning specifically: when you’re building machine learning models, 80% of your time will be spent getting data, exploring it, cleaning it, and analyzing results (using data visualization).

To be a little more blunt about it, if you don’t know calculus and linear algebra, you can still build useful models, but if you aren’t really proficient with data analysis, you’re screwed.

Many, if not most of the best data scientists and model-builders I know at several Fortune 500 companies aren’t particularly masterful at calculus, linear algebra or advanced math.

Based on working with her and talking with her for several years, I’m confident that her knowledge of calculus and linear algebra was very, very limited.

So before I overstate my case, and potentially alienate a large group of people that I respect and admire, let me be clear: math is important.

In particular, there are people at companies like Google and Facebook who are pushing the boundaries of machine learning – people working on bleeding edge tools.

I’ll write my full advice in another blog post, but I’ll briefly summarize it here: to get started learning practical machine learning, an entry level data scientist needs to have basic comfort working with numbers, calculating percentages, etc.

However, when people tell you that you absolutely need to know calculus, differential equations, optimization theory, linear algebra, and more just to get started building machine learning models, this is flat out wrong.

If you’re a beginner, and you want to get started with machine learning, you can get by without knowing calculus and linear algebra, but you absolutely can’t get by without data analysis.

- On Wednesday, June 27, 2018
- By Read More

## Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

What do I need to know to get started?” And once they start researching, beginners frequently find well-intentioned but disheartening advice, like the following: You need to master math.

If you’re intimidated by the math, I have some good news for you: in order to get started building machine learning models (as opposed to doing machine learning theory), you need less math background than you think (and almost certainly less math than you’ve been told that you need).

In fact, even if you can get by without having a masterful understanding of calculus and linear algebra, there are other prerequisites that you absolutely need to know (thankfully, the real prerequisites are much easier to master).

Moreover, the incentives shape the training of people entering academia: students in an academic environment are trained to be productive largely as scholars and researchers.

In an academic environment, individuals are rewarded (largely) for producing novel research, and in the context of ML, that truly does require a deep understanding of the mathematics that underlies machine learning and statistics.

They imagine that data scientists spend their days pensively standing at a whiteboard, scribbling math equations between sips of coffee.

If we’re talking about entry level data scientists to intermediate level data scientists, I’d estimate that they spend less than 5% of their time actually doing mathematics.

(And quite frankly, most entry-level data scientists won’t spend much of their time on ML.) When you build a model, you will spend very, very little time doing any math.

But most data scientists do spend a huge amount of their time getting data, cleaning data, and exploring data.

For beginning practitioners (i.e., hackers, coders, software engineers, and people working as data scientists in business and industry) you don’t need to know that much calculus, linear algebra, or other college-level math to get things done.

(Note that as this post continues, I’m going to use the term “data analysis” as a shorthand for “getting data, cleaning data, aggregating data, exploring data, and visualizing data.”)

Although at high levels there are some data scientists who need deep mathematical skill, at a beginning level – I repeat – you do not need to know calculus and linear algebra in order to build a model that makes accurate predictions.

Even if you use “off the shelf” tools like R’s caret and Python’s scikit-learn – tools that do much of the hard math for you – you won’t be able to make these tools work without a solid understanding of exploratory data analysis and data visualization.

It’s common knowledge among data scientists that “80% of your work will be data preparation.” This is true, although I want to clarify what this means.

When people say that “80% of your work will be data preparation” that’s sort of a shorthand way of saying “80% of your work will be getting data (from databases, spreadsheets, flat-files), performing exploratory data analysis, reshaping data, visualizing data to find insights, and using EDA.” While this figure is about data science in general, it also applies to machine learning specifically: when you’re building machine learning models, 80% of your time will be spent getting data, exploring it, cleaning it, and analyzing results (using data visualization).

To be a little more blunt about it, if you don’t know calculus and linear algebra, you can still build useful models, but if you aren’t really proficient with data analysis, you’re screwed.

Many, if not most of the best data scientists and model-builders I know at several Fortune 500 companies aren’t particularly masterful at calculus, linear algebra or advanced math.

In particular, there are people at companies like Google and Facebook who are pushing the boundaries of machine learning – people working on bleeding edge tools.

I’ll write my full advice in another blog post, but I’ll briefly summarize it here: to get started learning practical machine learning, an entry level data scientist needs to have basic comfort working with numbers, calculating percentages, etc.

However, when people tell you that you absolutely need to know calculus, differential equations, optimization theory, linear algebra, and more just to get started building machine learning models, this is flat out wrong.

If you’re a beginner, and you want to get started with machine learning, you can get by without knowing calculus and linear algebra, but you absolutely can’t get by without data analysis.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

- On Wednesday, June 27, 2018
- By Read More

## The Mathematics of Machine Learning

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products.

Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications.

Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.

Some online MOOCs and materials for studying some of the Mathematics topics needed for Machine Learning are: Finally, the main aim of this blog post is to give a well-intentioned advice about the importance of Mathematics in Machine Learning and the necessary topics and useful resources for a mastery of these topics.

- On Wednesday, June 27, 2018
- By Read More

## Mathematics for Machine Learning Specialization

Learn from world class experts and be part of a global community, sharing ideas, expertise and technology to find answers to the big scientific questions and tackle global challenges.

Imperial is a multidisciplinary space for education, research, translation and commercialisation, harnessing science and innovation to tackle global challenges.

- On Wednesday, June 27, 2018
- By Read More

was fascinated by artificial intelligence during the end of the 80's and spent numerous hours working on Natural Language Processing challenges using a weird and wonderful programming language called Lisp.

Though F-Secure didn’t immediately enjoy much success with it - as often happens when you’re just a little bit too early with a new technology - it was my second brush with AI and my first with machine learning.

The current renaissance with machine learning took off around 2012, and I continued to feed my fascination with the promise of intelligent machines through books and meetings with researchers on the topic.

But I also began to doubt my own ability to understand and became frustrated with my discussion partners, some of whom seemed more intent on showing off their own advanced understanding of the topic than explaining what they knew in plain, comprehensible language.

Sometimes CEOs and Chairmen may feel that understanding the nuts and bolts of technology is in some way beneath their role, that it’s enough for them to focus on 'creating shareholder value'.

Over time I gained enough understanding to explain what I felt were the most important aspects of machine learning to CEOs, politicians, academics (in other fields) and frankly, any decision makers.

strongly believe that all organizations should take the following next steps: Machine learning is a fundamental new technology that can create immense value to humankind.

- On Wednesday, June 27, 2018
- By Read More

## Ask a Data Scientist: What’s Machine Learning?

Phrases like “neural nets” and “deep learning” tap into our sense of fantasy, but when we jump from new tech to robot takeover, we miss the beauty and power of what machine learning actually is, and the groundbreaking new developments that are pushing industries forward.

Machine learning evolved from pattern recognition and applying algorithms that can learn from data and then make predictions, and it's closely related to computational statistics (thank you Wikipedia).

For instance, we’ve trained computers to accurately predict letters and numbers, the base logic for handwriting recognition used by the postal service.

“A machine’s learning algorithm enables it to identify patterns in observed data, build models that explain the world, and predict things without having explicit pre-programmed rules and models,” says Vishal Maini.

Almost as soon as someone realizes what machine learning can do, they want to ask the crystal ball a question: What’s going to be the next big programming language?

You would need a huge data set, so at least a thousand examples of each of the type of person who bought each product and this increases exponentially with the more features you want to analyze.

A feature is something like age, thing they clicked on previously, etc.” Applying machine learning to your business requires huge data sets that aren’t always accessible, but even if they are, it’s key that that data is in a format that a machine can read.

You do need to work the data into a format where each row is a data point, the kind of thing you'd want to pick.” For example: If you want your algorithm to look at Customer X, who did or didn't buy things, you need to assign values for “bought” and “didn’t buy.” This means a lot of cleaning data.

The answer she would have given would have been ‘seven’—the value assigned to an outcome.” That said, it’s easy to understand why a phrase like “neural networks” is such a buzzword.

neural net is a set of little machine learning algorithms (i.e., little logistic regressions) that are combined to mimic neural activity.

“Suddenly we had mountains of data and a fast, affordable means of drawing insight from it.” Added Hillary, “If you think about it, Excel couldn't handle the number of rows of data five years ago that we need today to do machine learning.

Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in the US have data science skills, while hundreds of companies are hiring for those roles.” This is confirmed by LinkedIn’s 2017 U.S. Emerging Jobs Report which lists “Machine Learning Engineer” as the fastest-growing job.

Building a facial recognition system or teaching a robot to recognize feelings requires an extremely advanced level of math, but if you’re just trying to learn enough about machine learning to apply simple clustering or regression methods, that’s a different story.

I would say, a semester of college-level computer science, or a program where you’re doing it every single day for like six to eight weeks,” she said.

- On Thursday, January 17, 2019

**How to Do Mathematics Easily - Intro to Deep Learning #4**

Let's learn about some key math concepts behind deep learning shall we? We'll build a 3 layer neural network and dive into some key concepts that makes ...

**Lecture: Mathematics of Big Data and Machine Learning**

MIT RES.LL-005 D4M: Signal Processing on Databases, Fall 2012 View the complete course: Instructor: Jeremy Kepner ..

**How Machines Learn**

How do all the algorithms around us learn to do their jobs? Bot Wallpapers on Patreon: Discuss this video: ..

**Just Enough Math: Advanced Math for Business People to Leverage Big Data**

With the commercial successes of machine learning and cloud computing, many business people need just enough math to take advantage of open source ...

**What is machine learning and how to learn it ?**

Machine learning is just to give trained data to a program and get better result for complex problems. It is very close to data ..

**How To Read Text In Binary**

- @tomscott - No, seriously. Here's how to read text when all you can see is a bunch of 0s and 1s. It's easier than it seems. I... I think I might ..

**But what *is* a Neural Network? | Chapter 1, deep learning**

Subscribe to stay notified about new videos: Support more videos like this on Patreon: Special .

**Excel Magic Trick 783: Date Functions & Formulas (17 Examples)**

Download file: 1. DAY function 2. TEXT Function to get Day spelled out, like "Monday" 3. TEXT Function to get ..

**Thinking visually about higher dimensions**

How do you think about a sphere in four dimensions? What about ten dimensions? Podcast! Problem-driven learning on at ..

**"Basic Statistical Arbitrage: Understanding the Math Behind Pairs Trading" by Max Margenot**

This talk was given by Max Margenot at the Quantopian Meetup in Santa Clara on July 17th, 2017. To learn more about Quantopian, visit: ...