# AI News, Machine learning — Is the emperor wearing clothes?

## Machine learning — Is the emperor wearing clothes?

These days, no data science hipster is into the humble straight line.

Flexible, squiggly shapes are all the rage among today’s fashionable crowd (you may know these as neural networks, though there isn’t much that’s neural about them — they were named rather aspirationally more than half a century ago and no one seems to like my suggestion that we rename them to “yoga networks” or “many-layers-of-mathematical-operations”).

If you’re an applied machine learning enthusiast, it’s okay if you don’t memorize them — in practice you’ll just shove your data through as many algorithms as you can and iterate on what seems promising.

Here’s a handy visual summary for you from another of my articles: If that was confusing, maybe you’ll like this analogy more: A poet picks an approach (algorithm) to putting words on paper.

And don’t go around saying that retraining — jargon for rerunning the algorithm to adjust the boundary as new examples are gathered — makes it creature-like or inherently different from your programmer’s standard work product.

If you’re worried that your machine learning system now does the update much faster, invest in good testing or avail yourself of a long time.sleep().

One that turns out not to work when you evaluate its performance on new data… and you’re back to the drawing board, over and over, until finally the heavens open up and your solution stops embarrassing itself.

## Extracting Structured Data From Recipes Using Conditional Random Fields

It took the company almost 20 years, several failed starts and a massive data cleanup effort, but the idea of cooking as a “digital service” (read: web app) is finally a reality.

The product was designed and built from scratch over the course of a year, but it relies heavily on nearly six years of effort to clean, catalogue and structure our massive recipe archive.

As of yesterday, the database contained $17,507$ recipes, $67,578$ steps, $142,533$ tags and $171,244$ ingredients broken down by name, quantity and unit.

In practical terms, this means that if you make Melissa Clark’s pasta with fried lemons and chile flakes recipe, we know how many cups of Parmigiano-Reggiano you need, how long it will take you to cook and how many people you can serve.

That finely structured data, while invisible to the end user, has allowed us to quickly iterate on designs, add granular HTML markup to improve our SEO, build a customized search engine and spin up a simple recipe recommendation system.

Since the database breaks down each ingredient by name, unit, quantity and comment, an average recipe requires over 50 fields, and that number can climb above 100 for more complicated recipes.

For an internal hack week last summer, a colleague and I decided to test our faith in statistical NLP to automatically convert unstructured recipe text into structured data.

We chose to use a discriminative structured prediction model called a linear-chain conditional random field (CRF), which has been successful on similar tasks such as part-of-speech tagging and named entity recognition.

For example, if $x^i = [x_1^i, x_2^i, x_3^i] = [\text{&#8220;pinch&#8221;}, \text{ &#8220;of&#8221;}, \text{ &#8220;salt&#8221;}]$ then $y^i = [y_1^i, y_2^i, y_3^i]= [\text{UNIT}, \text{ UNIT}, \text{ NAME}]$.

We approach this task by modeling the conditional probability of a sequence of tags given the input, denoted $p(\text{tag sequence} \mid \text{ingredient phrase})$ or using the above notation, $p(y \mid x)$.

The process of learning that probability model is described in detail below, but first imagine that someone handed us the perfect probability model $p(y \mid x)$ that returns the “true” probability of a sequence of labels given an ingredient phrase.

In the end, we are able to find the best tag sequence in a time that is quadratic in the number of tags and linear in the number of words ($|\text{tags}|^2 * |\text{words}|$).

So given a model $p(y \mid x)$ that encodes whether a particular tag sequence is a good fit for a ingredient phrase, we can return the best tag sequence.

We construct $\psi$ so that it returns a large, non-negative number if the labels $y_t$ and $y_{t-1}$ are a good match for the $t^{th}$ and ${t-1}^{th}$ words in the sentence respectively, and a small, non-negative number if not.

Each feature function, $f_k(y_t, y_{t-1}, x)$, is chosen by the person who creates the model, based on what information might be useful to determine the relationship between words and labels.

By modeling the conditional probability of labels given words the following way, we have reduced our task of learning $p(y \mid x)$ to the problem of learning “good” weights on each of the feature functions.

By good, I mean that we want to learn large positive weights on features that capture highly likely patterns in the data, large negative weights on features that capture highly unlikely patterns in the data and small weights on features that don&#8217;t capture any patterns in the data.

But there is an ever-increasing appetite from developers and designers for finely structured data to power our digital products and at some point, we will need to develop algorithmic solutions to help with these tasks.

## Machine Learning Algorithm Recipes in scikit-learn

You can read all of the blog posts and watch all the videos in the world, but you&#8217;re not actually going to start really get machine learning until you start practicing.

In this blog post I want to give a few very simple examples of using scikit-learn for some supervised classification algorithms.

In this post you will see 5 recipes of supervised classification algorithms applied to small standard datasets that are provided with the scikit-learn library.

They provide a skeleton that you can copy and paste into your file, project or python REPL and start to play with immediately.

Logistic regression fits a logistic model to data and makes predictions about the probability of an event (between 0 and 1).

Because this is a mutli-class classification problem and logistic regression makes predictions between 0 and 1, a one-vs-all scheme is used (one model per class).

The k-Nearest Neighbor (kNN) method makes predictions by locating similar cases to a given data instance (using a similarity function) and returning the average or majority of the most similar data instances.

Classification and Regression Trees (CART) are constructed from a dataset by making splits that best separate the data for the classes or predictions being made.

In this post you have seen 5 self-contained recipes demonstrating some of the most popular and powerful supervised classification problems.

## How To Prepare Your Data For Machine Learning in Python with Scikit-Learn

It is often a very good idea to prepare your data in such way to best expose the structure of the problem to the machine learning algorithms that you intend to use.

Each recipe follows the same structure: The transforms are calculated in such a way that they can be applied to your training data and any samples of data you may have in the future.

When your data is comprised of attributes with varying scales, many machine learning algorithms can benefit from rescaling the attributes to all have the same scale.

Standardization is a useful technique to transform attributes with a Gaussian distribution and differing means and standard deviations to a standard Gaussian distribution with a mean of 0 and a standard deviation of 1.

It is most suitable for techniques that assume a Gaussian distribution in the input variables and work better with rescaled data, such as linear regression, logistic regression and linear discriminate analysis.

Normalizing in scikit-learn refers to rescaling each observation (row) to have a length of 1 (called a unit norm in linear algebra).

This preprocessing can be useful for sparse datasets (lots of zeros) with attributes of varying scales when using algorithms that weight input values such as neural networks and algorithms that use distance measures such as K-Nearest Neighbors.

You now have recipes to: Your action step for this post is to type or copy-and-paste each recipe and get familiar with data preprocesing in scikit-learn.

## Data Algorithms: Recipes for Scaling Up with Hadoop and Spark 1st Edition

A.state('lowerPricePopoverData',{'trigger':'ns_1YCC77Q2V7PG53KJK4SR_18363_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/132-3591041-9655708?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491906189&PREFIX=ns_1YCC77Q2V7PG53KJK4SR_18363_2_&WDG=book_display_on_website&dpRequestId=1YCC77Q2V7PG53KJK4SR&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/132-3591041-9655708?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491906189&PREFIX=ns_1YCC77Q2V7PG53KJK4SR_18363_2_&WDG=book_display_on_website&dpRequestId=1YCC77Q2V7PG53KJK4SR&from=product-detail&storeID=books','nsPrefix':'ns_1YCC77Q2V7PG53KJK4SR_18363_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'});

return {'trigger':'ns_1YCC77Q2V7PG53KJK4SR_18363_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/132-3591041-9655708?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491906189&PREFIX=ns_1YCC77Q2V7PG53KJK4SR_18363_2_&WDG=book_display_on_website&dpRequestId=1YCC77Q2V7PG53KJK4SR&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/132-3591041-9655708?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491906189&PREFIX=ns_1YCC77Q2V7PG53KJK4SR_18363_2_&WDG=book_display_on_website&dpRequestId=1YCC77Q2V7PG53KJK4SR&from=product-detail&storeID=books','nsPrefix':'ns_1YCC77Q2V7PG53KJK4SR_18363_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'};

return {'trigger':'ns_1YCC77Q2V7PG53KJK4SR_18363_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/132-3591041-9655708?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491906189&PREFIX=ns_1YCC77Q2V7PG53KJK4SR_18363_2_&WDG=book_display_on_website&dpRequestId=1YCC77Q2V7PG53KJK4SR&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/132-3591041-9655708?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491906189&PREFIX=ns_1YCC77Q2V7PG53KJK4SR_18363_2_&WDG=book_display_on_website&dpRequestId=1YCC77Q2V7PG53KJK4SR&from=product-detail&storeID=books','nsPrefix':'ns_1YCC77Q2V7PG53KJK4SR_18363_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'};

Would you like to tell us about a lower price?If you are a seller for this product, would you like to suggest updates through seller support?

Hello World - Machine Learning Recipes #1

Six lines of Python is all it takes to write your first machine learning program! In this episode, we'll briefly introduce what machine learning is and why it's ...

Let’s Write a Decision Tree Classifier from Scratch - Machine Learning Recipes #8

Hey everyone! Glad to be back! Decision Tree classifiers are intuitive, interpretable, and one of my favorite supervised learning algorithms. In this episode, I'll ...

Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka

Machine Learning with Python : ** This Edureka video on Decision Tree Algorithm in Python will ..

The Best Way to Prepare a Dataset Easily

In this video, I go over the 3 steps you need to prepare a dataset to be fed into a machine learning model. (selecting the data, processing it, and transforming it).

🐍🎓 Algorithms and Data Structures Knowledge to Get a Python Job?

▻ Dive into Python data structures with simple code examples How much Computer Science data structures and algorithms ..

Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Science Training | Edureka

Data Science Training - ) This Edureka Decision Tree tutorial will help you understand all the basics of Decision tree

Fibonacci series algorithm(using simple code)

Full code at: Write a program to generate and print the fibonacci series upto n terms

Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Data Science |Simplilearn

This Decision Tree algorithm in Machine Learning tutorial video will help you understand all the basics of Decision Tree along with what is Machine Learning, ...

Visualizing a Decision Tree - Machine Learning Recipes #2

Last episode, we treated our Decision Tree as a blackbox. In this episode, we'll build one on a real dataset, add code to visualize it, and practice reading it - so ...

Let’s Write a Pipeline - Machine Learning Recipes #4

In this episode, we'll write a basic pipeline for supervised learning with just 12 lines of code. Along the way, we'll talk about training and testing data. Then, we'll ...