AI News, Locklin on science

Locklin on science

They all mention regression models, logistic regression, neural nets, trees, ensemble methods, graphical models and SVM type things.

Sometimes I am definitely just whining that people don’t pay enough attention to the things I find interesting, or that I don’t have a good book or review article on the topic.

If you’re not thinking about how you’re exposing your learners to sequentially generated data, you’re probably leaving information on the table, or overfitting to irrelevant data.

which strike me as being of extreme importance, though this is a presentation of new ideas, rather than an exposition of established ones.  Vowpal Wabbit is a useful and interesting piece of software with OK documentation, but there should be a book which takes you from online versions of linear regression (they exist!

Hell, I am at a loss to think of a decent review article, and the subject is unfortunately un-googleable, thanks to the hype over the BFD of “watching lectures and taking tests over the freaking internets.”

The problem is, the guys who do reinforcement learning are generally in control systems theory and robotics, making the literature impenetrable to machine learning researchers and engineers.

I can’t give you a good reference for this subject in general, though Ron Begleiter and friends wrote a very good paper on some classical compression learning implementations and their uses.

Even in marketing problems dealing with survival techniques, there is a time component, and you should know about it.In situations where there are non-linear relationships in the time series, classical regression and time-series techniques will fail.

In situations where you must discover the underlying non-linear model yourself, well, you’re in deep shit if you don’t know some time-series oriented machine learning techniques.  There was much work done in the 80s and 90s on tools like recurrent ANNs and feedforward ANNs for starters, and there has been much work in this line since then.

There are plenty of other useful tools and techniques.  Once in a while someone will mention dynamic time warping in a book, but nobody seems real happy about this technique.  Many books mention Hidden Markov Models, which are important, but they’re only useful when the data is at least semi-Markov, and you have some idea of how to characterize it as a sequence of well defined states.

Even in this case, I daresay not even the natural language recognition textbooks are real helpful (though Rabiner and Juang is OK, it’s also over 20 years old).

This isn’t exactly a cookbook or exposition, mind you: more of a thematic manifesto with a few applications.  Obviously, signal processing has something to say about the subject, but what about learners which are designed to function usefully when we know that most of the data is noise?  Fields such as natural language processing and image processing are effectively ML in the presence of lots of noise and confounding signal, but the solutions you will find in their textbooks are specifically oriented to the problems at hand.  Once in a while something like vector quantization will be reused across fields, but it would be nice if we had an “elements of statistical learning in the presence of lots of noise”

Feature engineering: feature engineering is another topic which doesn’t seem to merit any review papers or books, or even chapters in books, but it is absolutely vital to ML success.

A review article or a book chapter on this sort of thing, thinking through the relationships of these ideas, and helping the practitioner to engineer new kinds of feature for broad problems would be great.

Unsupervised and semi-supervised learning in general: almost all books, and even tools like R inherently assume that you are doing supervised learning, or else you’re doing something real simple, like hierarchical clustering, kmeans or PCA.  In the presence of a good set of features, or an interesting set of data, unsupervised techniques can be very helpful.

Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists 1st Edition

if(typeof tellMeMoreLinkData !== 'undefined'){

A.state('lowerPricePopoverData',{'trigger':'ns_HS6MMF7JZ78A0ZK475HT_26831_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/143-5883186-3688154?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_HS6MMF7JZ78A0ZK475HT_26831_2_&WDG=book_display_on_website&dpRequestId=HS6MMF7JZ78A0ZK475HT&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/143-5883186-3688154?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_HS6MMF7JZ78A0ZK475HT_26831_2_&WDG=book_display_on_website&dpRequestId=HS6MMF7JZ78A0ZK475HT&from=product-detail&storeID=books','nsPrefix':'ns_HS6MMF7JZ78A0ZK475HT_26831_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'});

return {'trigger':'ns_HS6MMF7JZ78A0ZK475HT_26831_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/143-5883186-3688154?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_HS6MMF7JZ78A0ZK475HT_26831_2_&WDG=book_display_on_website&dpRequestId=HS6MMF7JZ78A0ZK475HT&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/143-5883186-3688154?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_HS6MMF7JZ78A0ZK475HT_26831_2_&WDG=book_display_on_website&dpRequestId=HS6MMF7JZ78A0ZK475HT&from=product-detail&storeID=books','nsPrefix':'ns_HS6MMF7JZ78A0ZK475HT_26831_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'};

return {'trigger':'ns_HS6MMF7JZ78A0ZK475HT_26831_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/143-5883186-3688154?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_HS6MMF7JZ78A0ZK475HT_26831_2_&WDG=book_display_on_website&dpRequestId=HS6MMF7JZ78A0ZK475HT&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/143-5883186-3688154?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_HS6MMF7JZ78A0ZK475HT_26831_2_&WDG=book_display_on_website&dpRequestId=HS6MMF7JZ78A0ZK475HT&from=product-detail&storeID=books','nsPrefix':'ns_HS6MMF7JZ78A0ZK475HT_26831_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'};

Would you like to tell us about a lower price?If you are a seller for this product, would you like to suggest updates through seller support?

Discover Feature Engineering, How to Engineer Features and How to Get Good at It

Feature engineering is an informal topic, but one that is absolutely known and agreed to be key to success in applied machine learning.

feature engineering is another topic which doesn’t seem to merit any review papers or books, or even chapters in books, but it is absolutely vital to ML success.

The flexibility of good features will allow you to use less complex models that are faster to run, easier to understand and easier to maintain.

With good features, you are closer to the underlying problem and a representation of all the data you have available and could use to best characterize that underlying problem.

on winning the Flight Quest challenge on Kaggle Here is how I define feature engineering: Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.

You can see the dependencies in this definition: feature engineering is manually designing what the input x’s should be — Tomasz Malisiewicz, answer to “What is feature engineering?”

In this context, feature engineering asks: what is the best representation of the sample data to learn a solution to your problem?

would think to myself “I’m doing feature engineering now” and I would pursue the question “How can I decompose or aggregate raw data to better describe the underlying problem?” The goal was right, but the approach was one of many.

Feature importance scores can also provide you with information that you can use to extract or construct new features, similar but different to those that have been estimated to be useful.

More complex predictive modeling algorithms perform feature importance and selection internally while constructing their model.

Common examples include image, audio, and textual data, but could just as easily include tabular data with millions of attributes.

Key to feature extraction is that the methods are automatic (although may need to be designed and constructed from simpler methods) and solve the problem of unmanageably high dimensional data, most typically used for analog observations stored in digital formats.

Feature selection algorithms may use a scoring method to rank and choose features, such as correlation or other feature importance methods.

More advanced methods may search subsets of features by trial and error, creating and evaluating models automatically in pursuit of the objectively most predictive sub-group of features.

Stepwise regression is an example of an algorithm that automatically performs feature selection as part of the model construction process.

Regularization methods like LASSO and ridge regression may also be considered algorithms with feature selection baked in, as they actively seek to remove or discount the contribution of features as part of the model building process.

This requires spending a lot of time with actual sample data (not aggregates) and thinking about the underlying form of the problem, structures in the data and how best to expose them to predictive modeling algorithms.

With tabular data, it often means a mixture of aggregating or combining features to create new features, and decomposing or splitting features to create new features.

With image data, it can often mean enormous amounts of time prescribing automatic filters to pick out relevant structures.

This is the part of feature engineering that is often talked the most about as an artform, the part that is attributed the importance and signalled as the differentiator in competitive machine learning.

They have been shown to automatically and in a unsupervised or semi-supervised way, learn abstract representations of features (a compressed form), that in turn have supported state-of-the-art results in domains such as speech recognition, image classification, object recognition and other areas.

They cannot (yet, or easily) inform you and the process on how to create more similar and different features like those that are doing well, on a given problem or on similar problems in the future.

The process of applied machine learning (for lack of a better name) that in a broad brush sense involves lots of activities.

Up front is problem definition, next is  data selection and preparation, in the middle is model preparation, evaluation and tuning and at the end is the presentation of results.

It might look something like the following: The traditional idea of “Transforming Data” from a raw state to a state suitable for modeling is where feature engineering fits in.

You can see that before feature engineering, we are munging out data into a format we can even look at, and just before that we are collating and denormalizing data from databases into some kind of central picture.

It suggests a strong interaction with modeling, reminding us of the interplay of devising features and testing them against the coalface of our test harness and final performance measures.

This also suggests we may need to leave the data in a form suitable for the chosen modeling algorithm, such as normalize or standardize the features as a final step.

This sounds like a preprocessing step, it probably is, but it helps us consider what types of finishing touches are needed to the data before effective modeling.

The process might look as follows: You need a well defined problem so that you know when to stop this process and move on to trying other models, other model configurations, ensembles of models, and so on.

You could create a new binary feature called “Has_Color” and assign it a value of “1” when an item has a color and “0” when the color is unknown.

These additional features could be used instead of the Item_Color feature (if you wanted to try a simpler linear model) or in addition to it (if you wanted to get more out of something like a decision tree).

If you suspect there are relationships between times and other attributes, you can decompose a date-time into constituent parts that may allow models to discover and exploit these relationships.

You could create a new ordinal feature called Part_Of_Day with 4 values Morning, Midday, Afternoon, Night with whatever hour boundaries you think are relevant.

You can use similar approaches to pick out time of week relationships, time of month relationships and various structures of seasonality across a year.

That magic domain number could be used to create a new binary feature Item_Above_4kg with a value of “1” for our example of 6289 grams.

In this case you may want to go back to the data collection step and create new features in addition to this aggregate and try to expose more temporal structure in the purchases, like perhaps seasonality.

The simple structure allowed the team to use highly performant but very simple linear methods to achieve the winning predictive model.

The paper provides details of how specific temporal and other non-linearities in the problem structure were reduced to simple composite binary indicators.

The heritage health prize was a 3 million dollar prize awarded to the team who could best predict which patients would be admitted to hospital within the next year.

If you are working with digital representations of analog observations like images, video, sound or text, you might like to dive deeper into some feature extraction literature.

Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists 1st Edition, Kindle Edition

if(typeof tellMeMoreLinkData !== 'undefined'){

A.state('lowerPricePopoverData',{'trigger':'ns_FPHJHEDK2W1GX7C5479S_27443_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/146-2518025-3285405?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=B07BNX4MWC&PREFIX=ns_FPHJHEDK2W1GX7C5479S_27443_2_&WDG=ebooks_display_on_website&dpRequestId=FPHJHEDK2W1GX7C5479S&from=product-detail&storeID=digital-textencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/146-2518025-3285405?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=B07BNX4MWC&PREFIX=ns_FPHJHEDK2W1GX7C5479S_27443_2_&WDG=ebooks_display_on_website&dpRequestId=FPHJHEDK2W1GX7C5479S&from=product-detail&storeID=digital-text','nsPrefix':'ns_FPHJHEDK2W1GX7C5479S_27443_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'});

return {

return {'trigger':'ns_FPHJHEDK2W1GX7C5479S_27443_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/146-2518025-3285405?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=B07BNX4MWC&PREFIX=ns_FPHJHEDK2W1GX7C5479S_27443_2_&WDG=ebooks_display_on_website&dpRequestId=FPHJHEDK2W1GX7C5479S&from=product-detail&storeID=digital-textencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/146-2518025-3285405?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=B07BNX4MWC&PREFIX=ns_FPHJHEDK2W1GX7C5479S_27443_2_&WDG=ebooks_display_on_website&dpRequestId=FPHJHEDK2W1GX7C5479S&from=product-detail&storeID=digital-text','nsPrefix':'ns_FPHJHEDK2W1GX7C5479S_27443_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'};

return {'trigger':'ns_FPHJHEDK2W1GX7C5479S_27443_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/146-2518025-3285405?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=B07BNX4MWC&PREFIX=ns_FPHJHEDK2W1GX7C5479S_27443_2_&WDG=ebooks_display_on_website&dpRequestId=FPHJHEDK2W1GX7C5479S&from=product-detail&storeID=digital-textencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/146-2518025-3285405?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=B07BNX4MWC&PREFIX=ns_FPHJHEDK2W1GX7C5479S_27443_2_&WDG=ebooks_display_on_website&dpRequestId=FPHJHEDK2W1GX7C5479S&from=product-detail&storeID=digital-text','nsPrefix':'ns_FPHJHEDK2W1GX7C5479S_27443_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'};

Would you like to tell us about a lower price?

Deep Learning

He told Page, who had read an early draft, that he wanted to start a company to develop his ideas about how to build a truly intelligent computer: one that could understand language and then make inferences and decisions on its own.

The basic idea—that software can simulate the neocortex’s large array of neurons in an artificial “neural network”—is decades old, and it has led to as many disappointments as breakthroughs.

Last June, a Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats.

In October, Microsoft chief research officer Rick Rashid wowed attendees at a lecture in China with a demonstration of speech software that transcribed his spoken words into English text with an error rate of 7 percent, translated them into Chinese-language text, and then simulated his own voice uttering them in Mandarin.

Hinton, who will split his time between the university and Google, says he plans to “take ideas out of this field and apply them to real problems” such as image recognition, search, and natural-language understanding, he says.

Extending deep learning into applications beyond speech and image recognition will require more conceptual and software breakthroughs, not to mention many more advances in processing power.

Neural networks, developed in the 1950s not long after the dawn of AI research, looked promising because they attempted to simulate the way the brain worked, though in greatly simplified form.

These weights determine how each simulated neuron responds—with a mathematical output between 0 and 1—to a digitized feature such as an edge or a shade of blue in an image, or a particular energy level at one frequency in a phoneme, the individual unit of sound in spoken syllables.

Programmers would train a neural network to detect an object or phoneme by blitzing the network with digitized versions of images containing those objects or sound waves containing those phonemes.

The eventual goal of this training was to get the network to consistently recognize the patterns in speech or sets of images that we humans know as, say, the phoneme “d” or the image of a dog.

This is much the same way a child learns what a dog is by noticing the details of head shape, behavior, and the like in furry, barking animals that other people call dogs.

Once that layer accurately recognizes those features, they’re fed to the next layer, which trains itself to recognize more complex features, like a corner or a combination of speech sounds.

Because the multiple layers of neurons allow for more precise training on the many variants of a sound, the system can recognize scraps of sound more reliably, especially in noisy environments such as subway platforms.

Hawkins, author of On Intelligence, a 2004 book on how the brain works and how it might provide a guide to building intelligent machines, says deep learning fails to account for the concept of time.

Brains process streams of sensory data, he says, and human learning depends on our ability to recall sequences of patterns: when you watch a video of a cat doing something funny, it’s the motion that matters, not a series of still images like those Google used in its experiment.

In high school, he wrote software that enabled a computer to create original music in various classical styles, which he demonstrated in a 1965 appearance on the TV show I’ve Got a Secret.

Since then, his inventions have included several firsts—a print-to-speech reading machine, software that could scan and digitize printed text in any font, music synthesizers that could re-create the sound of orchestral instruments, and a speech recognition system with a large vocabulary.

This isn’t his immediate goal at Google, but it matches that of Google cofounder Sergey Brin, who said in the company’s early days that he wanted to build the equivalent of the sentient computer HAL in 2001: A Space Odyssey—except one that wouldn’t kill people.

“My mandate is to give computers enough understanding of natural language to do useful things—do a better job of search, do a better job of answering questions,” he says.

queries as quirky as “a long, tiresome speech delivered by a frothy pie topping.” (Watson’s correct answer: “What is a meringue harangue?”) Kurzweil isn’t focused solely on deep learning, though he says his approach to speech recognition is based on similar theories about how the brain works.

“That’s not a project I think I’ll ever finish.” Though Kurzweil’s vision is still years from reality, deep learning is likely to spur other applications beyond speech and image recognition in the nearer term.

Microsoft’s Peter Lee says there’s promising early research on potential uses of deep learning in machine vision—technologies that use imaging for applications such as industrial inspection and robot guidance.

The Business of Artificial Intelligence

For more than 250 years the fundamental drivers of economic growth have been technological innovations.

The internal combustion engine, for example, gave rise to cars, trucks, airplanes, chain saws, and lawnmowers, along with big-box retailers, shopping centers, cross-docking warehouses, new supply chains, and, when you think about it, suburbs.

that is, the machine’s ability to keep improving its performance without humans having to explain exactly how to accomplish all the tasks it’s given.

The effects of AI will be magnified in the coming decade, as manufacturing, retailing, transportation, finance, health care, law, advertising, insurance, entertainment, education, and virtually every other industry transform their core processes and business models to take advantage of machine learning.

We see business plans liberally sprinkled with references to machine learning, neural nets, and other forms of the technology, with little connection to its real capabilities.

The term artificial intelligence was coined in 1955 by John McCarthy, a math professor at Dartmouth who organized the seminal conference on the topic the following year.

A study by the Stanford computer scientist James Landay and colleagues found that speech recognition is now about three times as fast, on average, as typing on a cell phone.

Vision systems, such as those used in self-driving cars, formerly made a mistake when identifying a pedestrian as often as once in 30 frames (the cameras in these systems record about 30 frames a second);

The error rate for recognizing images from a large database called ImageNet, with several million photographs of common, obscure, or downright weird images, fell from higher than 30% in 2010 to about 4% in 2016 for the best systems.

Google’s DeepMind team has used ML systems to improve the cooling efficiency at data centers by more than 15%, even after they were optimized by human experts.

A system using IBM technology automates the claims process at an insurance company in Singapore, and a system from Lumidatum, a data science platform firm, offers timely advice to improve customer support.

Infinite Analytics developed one ML system to predict whether a user would click on a particular ad, improving online ad placement for a global consumer packaged goods company, and another to improve customers’

For instance, Aptonomy and Sanbot, makers respectively of drones and robots, are using improved vision systems to automate much of the work of security guards.

More fundamentally, we can marvel at a system that understands Chinese speech and translates it into English, but we don’t expect such a system to know what a particular Chinese character means —

The fallacy that a computer’s narrow understanding implies broader understanding is perhaps the biggest source of confusion, and exaggerated claims, about AI’s progress.

The most important thing to understand about ML is that it represents a fundamentally different approach to creating software: The machine learns from examples, rather than being explicitly programmed for a particular outcome.

For most of the past 50 years, advances in information technology and its applications have focused on codifying existing knowledge and procedures and embedding them in machines.

In this second wave of the second machine age, machines built by humans are learning from examples and using structured feedback to solve on their own problems such as Polanyi’s classic one of recognizing a face.

Artificial intelligence and machine learning come in many flavors, but most of the successes in recent years have been in one category: supervised learning systems, in which the machine is given lots of examples of the correct answer to a particular problem.

Hello World - Machine Learning Recipes #1

Six lines of Python is all it takes to write your first machine learning program! In this episode, we'll briefly introduce what machine learning is and why it's ...

11. Introduction to Machine Learning

MIT 6.0002 Introduction to Computational Thinking and Data Science, Fall 2016 View the complete course: Instructor: Eric Grimson ..

Java Programming

Cheat Sheet is Here : Slower Java Tutorial : How to Install Java & Eclipse : Best Java Book

SMART Boards Why are they so easy to use?

Watch teachers and students demonstrate what makes the SMART Board so easy to use, and hear what teachers have to say about how SMART products are ...

Building a Recommendation Engine with Machine Learning Techniques (Brian Sam-Bodden) - FSF 2016

In this talk Brian will walk you through the ideas, techniques and technologies used to build a SaaS Recommendation Engine. From building an efficient ...

Report Writing

This video lecture explains how to put a report together as an assignment, and focuses on the elements which are required in a good report.

Machine Learning Techniques and Applications in Finance, Healthcare and Recommendation Systems

David Vogel, Trustee (Voloridge Investment Management, LLC) Abstract: The introductory portion of this talk will review some state-of-the-art machine learning ...

How to write a good essay

How to write an essay- brief essays and use the principles to expand to longer essays/ even a thesis you might also wish to check the video on Interview ...

Digital Art for Beginners: How to Get Started Quickly

Digital Art for Beginners is a guide for artists who are completely new to digital art and want to know the basics of what's required to quickly get started making art ...

Machine Learning & Artificial Intelligence: Crash Course Computer Science #34

So we've talked a lot in this series about how computers fetch and display data, but how do they make decisions on this data? From spam filters and self-driving ...