AI News, Locklin on science

Locklin on science

They all mention regression models, logistic regression, neural nets, trees, ensemble methods, graphical models and SVM type things.

Sometimes I am definitely just whining that people don’t pay enough attention to the things I find interesting, or that I don’t have a good book or review article on the topic.

If you’re not thinking about how you’re exposing your learners to sequentially generated data, you’re probably leaving information on the table, or overfitting to irrelevant data.

which strike me as being of extreme importance, though this is a presentation of new ideas, rather than an exposition of established ones.  Vowpal Wabbit is a useful and interesting piece of software with OK documentation, but there should be a book which takes you from online versions of linear regression (they exist!

Hell, I am at a loss to think of a decent review article, and the subject is unfortunately un-googleable, thanks to the hype over the BFD of “watching lectures and taking tests over the freaking internets.”

The problem is, the guys who do reinforcement learning are generally in control systems theory and robotics, making the literature impenetrable to machine learning researchers and engineers.

I can’t give you a good reference for this subject in general, though Ron Begleiter and friends wrote a very good paper on some classical compression learning implementations and their uses.

Even in marketing problems dealing with survival techniques, there is a time component, and you should know about it.In situations where there are non-linear relationships in the time series, classical regression and time-series techniques will fail.

In situations where you must discover the underlying non-linear model yourself, well, you’re in deep shit if you don’t know some time-series oriented machine learning techniques.  There was much work done in the 80s and 90s on tools like recurrent ANNs and feedforward ANNs for starters, and there has been much work in this line since then.

There are plenty of other useful tools and techniques.  Once in a while someone will mention dynamic time warping in a book, but nobody seems real happy about this technique.  Many books mention Hidden Markov Models, which are important, but they’re only useful when the data is at least semi-Markov, and you have some idea of how to characterize it as a sequence of well defined states.

Even in this case, I daresay not even the natural language recognition textbooks are real helpful (though Rabiner and Juang is OK, it’s also over 20 years old).

This isn’t exactly a cookbook or exposition, mind you: more of a thematic manifesto with a few applications.  Obviously, signal processing has something to say about the subject, but what about learners which are designed to function usefully when we know that most of the data is noise?  Fields such as natural language processing and image processing are effectively ML in the presence of lots of noise and confounding signal, but the solutions you will find in their textbooks are specifically oriented to the problems at hand.  Once in a while something like vector quantization will be reused across fields, but it would be nice if we had an “elements of statistical learning in the presence of lots of noise”

Feature engineering: feature engineering is another topic which doesn’t seem to merit any review papers or books, or even chapters in books, but it is absolutely vital to ML success.

A review article or a book chapter on this sort of thing, thinking through the relationships of these ideas, and helping the practitioner to engineer new kinds of feature for broad problems would be great.

Unsupervised and semi-supervised learning in general: almost all books, and even tools like R inherently assume that you are doing supervised learning, or else you’re doing something real simple, like hierarchical clustering, kmeans or PCA.  In the presence of a good set of features, or an interesting set of data, unsupervised techniques can be very helpful.

Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists 1st Edition

if(typeof tellMeMoreLinkData !== 'undefined'){

A.state('lowerPricePopoverData',{'trigger':'ns_GQ7WG1THMDTPCD94T32A_22176_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/137-0615265-3477365?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_GQ7WG1THMDTPCD94T32A_22176_2_&WDG=book_display_on_website&dpRequestId=GQ7WG1THMDTPCD94T32A&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/137-0615265-3477365?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_GQ7WG1THMDTPCD94T32A_22176_2_&WDG=book_display_on_website&dpRequestId=GQ7WG1THMDTPCD94T32A&from=product-detail&storeID=books','nsPrefix':'ns_GQ7WG1THMDTPCD94T32A_22176_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'});

return {'trigger':'ns_GQ7WG1THMDTPCD94T32A_22176_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/137-0615265-3477365?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_GQ7WG1THMDTPCD94T32A_22176_2_&WDG=book_display_on_website&dpRequestId=GQ7WG1THMDTPCD94T32A&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/137-0615265-3477365?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_GQ7WG1THMDTPCD94T32A_22176_2_&WDG=book_display_on_website&dpRequestId=GQ7WG1THMDTPCD94T32A&from=product-detail&storeID=books','nsPrefix':'ns_GQ7WG1THMDTPCD94T32A_22176_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'};

return {'trigger':'ns_GQ7WG1THMDTPCD94T32A_22176_1_hmd_pricing_feedback_trigger_product-detail','destination':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/137-0615265-3477365?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_GQ7WG1THMDTPCD94T32A_22176_2_&WDG=book_display_on_website&dpRequestId=GQ7WG1THMDTPCD94T32A&from=product-detail&storeID=booksencodeURI('&originalURI=' + window.location.pathname)','url':'/gp/pdp/pf/pricingFeedbackForm.html/ref=_pfdpb/137-0615265-3477365?ie=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ASIN=1491953241&PREFIX=ns_GQ7WG1THMDTPCD94T32A_22176_2_&WDG=book_display_on_website&dpRequestId=GQ7WG1THMDTPCD94T32A&from=product-detail&storeID=books','nsPrefix':'ns_GQ7WG1THMDTPCD94T32A_22176_2_','path':'encodeURI('&originalURI=' + window.location.pathname)','title':'Tell Us About a Lower Price'};

Would you like to tell us about a lower price?If you are a seller for this product, would you like to suggest updates through seller support?

Machine Learning

Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known.

The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors.

Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data.

Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition.

Discover Feature Engineering, How to Engineer Features and How to Get Good at It

Feature engineering is an informal topic, but one that is absolutely known and agreed to be key to success in applied machine learning.

feature engineering is another topic which doesn’t seem to merit any review papers or books, or even chapters in books, but it is absolutely vital to ML success.

The flexibility of good features will allow you to use less complex models that are faster to run, easier to understand and easier to maintain.

With good features, you are closer to the underlying problem and a representation of all the data you have available and could use to best characterize that underlying problem.

on winning the Flight Quest challenge on Kaggle Here is how I define feature engineering: Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.

You can see the dependencies in this definition: feature engineering is manually designing what the input x’s should be — Tomasz Malisiewicz, answer to “What is feature engineering?”

In this context, feature engineering asks: what is the best representation of the sample data to learn a solution to your problem?

would think to myself “I’m doing feature engineering now” and I would pursue the question “How can I decompose or aggregate raw data to better describe the underlying problem?” The goal was right, but the approach was one of many.

Feature importance scores can also provide you with information that you can use to extract or construct new features, similar but different to those that have been estimated to be useful.

More complex predictive modeling algorithms perform feature importance and selection internally while constructing their model.

Common examples include image, audio, and textual data, but could just as easily include tabular data with millions of attributes.

Key to feature extraction is that the methods are automatic (although may need to be designed and constructed from simpler methods) and solve the problem of unmanageably high dimensional data, most typically used for analog observations stored in digital formats.

Feature selection algorithms may use a scoring method to rank and choose features, such as correlation or other feature importance methods.

More advanced methods may search subsets of features by trial and error, creating and evaluating models automatically in pursuit of the objectively most predictive sub-group of features.

Stepwise regression is an example of an algorithm that automatically performs feature selection as part of the model construction process.

Regularization methods like LASSO and ridge regression may also be considered algorithms with feature selection baked in, as they actively seek to remove or discount the contribution of features as part of the model building process.

This requires spending a lot of time with actual sample data (not aggregates) and thinking about the underlying form of the problem, structures in the data and how best to expose them to predictive modeling algorithms.

With tabular data, it often means a mixture of aggregating or combining features to create new features, and decomposing or splitting features to create new features.

With image data, it can often mean enormous amounts of time prescribing automatic filters to pick out relevant structures.

This is the part of feature engineering that is often talked the most about as an artform, the part that is attributed the importance and signalled as the differentiator in competitive machine learning.

They have been shown to automatically and in a unsupervised or semi-supervised way, learn abstract representations of features (a compressed form), that in turn have supported state-of-the-art results in domains such as speech recognition, image classification, object recognition and other areas.

They cannot (yet, or easily) inform you and the process on how to create more similar and different features like those that are doing well, on a given problem or on similar problems in the future.

The process of applied machine learning (for lack of a better name) that in a broad brush sense involves lots of activities.

Up front is problem definition, next is  data selection and preparation, in the middle is model preparation, evaluation and tuning and at the end is the presentation of results.

It might look something like the following: The traditional idea of “Transforming Data” from a raw state to a state suitable for modeling is where feature engineering fits in.

You can see that before feature engineering, we are munging out data into a format we can even look at, and just before that we are collating and denormalizing data from databases into some kind of central picture.

It suggests a strong interaction with modeling, reminding us of the interplay of devising features and testing them against the coalface of our test harness and final performance measures.

This also suggests we may need to leave the data in a form suitable for the chosen modeling algorithm, such as normalize or standardize the features as a final step.

This sounds like a preprocessing step, it probably is, but it helps us consider what types of finishing touches are needed to the data before effective modeling.

The process might look as follows: You need a well defined problem so that you know when to stop this process and move on to trying other models, other model configurations, ensembles of models, and so on.

You could create a new binary feature called “Has_Color” and assign it a value of “1” when an item has a color and “0” when the color is unknown.

These additional features could be used instead of the Item_Color feature (if you wanted to try a simpler linear model) or in addition to it (if you wanted to get more out of something like a decision tree).

If you suspect there are relationships between times and other attributes, you can decompose a date-time into constituent parts that may allow models to discover and exploit these relationships.

You could create a new ordinal feature called Part_Of_Day with 4 values Morning, Midday, Afternoon, Night with whatever hour boundaries you think are relevant.

You can use similar approaches to pick out time of week relationships, time of month relationships and various structures of seasonality across a year.

That magic domain number could be used to create a new binary feature Item_Above_4kg with a value of “1” for our example of 6289 grams.

In this case you may want to go back to the data collection step and create new features in addition to this aggregate and try to expose more temporal structure in the purchases, like perhaps seasonality.

The simple structure allowed the team to use highly performant but very simple linear methods to achieve the winning predictive model.

The paper provides details of how specific temporal and other non-linearities in the problem structure were reduced to simple composite binary indicators.

The heritage health prize was a 3 million dollar prize awarded to the team who could best predict which patients would be admitted to hospital within the next year.

If you are working with digital representations of analog observations like images, video, sound or text, you might like to dive deeper into some feature extraction literature.

Deep Learning

He told Page, who had read an early draft, that he wanted to start a company to develop his ideas about how to build a truly intelligent computer: one that could understand language and then make inferences and decisions on its own.

The basic idea—that software can simulate the neocortex’s large array of neurons in an artificial “neural network”—is decades old, and it has led to as many disappointments as breakthroughs.

Last June, a Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats.

In October, Microsoft chief research officer Rick Rashid wowed attendees at a lecture in China with a demonstration of speech software that transcribed his spoken words into English text with an error rate of 7 percent, translated them into Chinese-language text, and then simulated his own voice uttering them in Mandarin.

Hinton, who will split his time between the university and Google, says he plans to “take ideas out of this field and apply them to real problems” such as image recognition, search, and natural-language understanding, he says.

Extending deep learning into applications beyond speech and image recognition will require more conceptual and software breakthroughs, not to mention many more advances in processing power.

Neural networks, developed in the 1950s not long after the dawn of AI research, looked promising because they attempted to simulate the way the brain worked, though in greatly simplified form.

These weights determine how each simulated neuron responds—with a mathematical output between 0 and 1—to a digitized feature such as an edge or a shade of blue in an image, or a particular energy level at one frequency in a phoneme, the individual unit of sound in spoken syllables.

Programmers would train a neural network to detect an object or phoneme by blitzing the network with digitized versions of images containing those objects or sound waves containing those phonemes.

The eventual goal of this training was to get the network to consistently recognize the patterns in speech or sets of images that we humans know as, say, the phoneme “d” or the image of a dog.

This is much the same way a child learns what a dog is by noticing the details of head shape, behavior, and the like in furry, barking animals that other people call dogs.

Once that layer accurately recognizes those features, they’re fed to the next layer, which trains itself to recognize more complex features, like a corner or a combination of speech sounds.

Because the multiple layers of neurons allow for more precise training on the many variants of a sound, the system can recognize scraps of sound more reliably, especially in noisy environments such as subway platforms.

Hawkins, author of On Intelligence, a 2004 book on how the brain works and how it might provide a guide to building intelligent machines, says deep learning fails to account for the concept of time.

Brains process streams of sensory data, he says, and human learning depends on our ability to recall sequences of patterns: when you watch a video of a cat doing something funny, it’s the motion that matters, not a series of still images like those Google used in its experiment.

In high school, he wrote software that enabled a computer to create original music in various classical styles, which he demonstrated in a 1965 appearance on the TV show I’ve Got a Secret.

Since then, his inventions have included several firsts—a print-to-speech reading machine, software that could scan and digitize printed text in any font, music synthesizers that could re-create the sound of orchestral instruments, and a speech recognition system with a large vocabulary.

This isn’t his immediate goal at Google, but it matches that of Google cofounder Sergey Brin, who said in the company’s early days that he wanted to build the equivalent of the sentient computer HAL in 2001: A Space Odyssey—except one that wouldn’t kill people.

“My mandate is to give computers enough understanding of natural language to do useful things—do a better job of search, do a better job of answering questions,” he says.

queries as quirky as “a long, tiresome speech delivered by a frothy pie topping.” (Watson’s correct answer: “What is a meringue harangue?”) Kurzweil isn’t focused solely on deep learning, though he says his approach to speech recognition is based on similar theories about how the brain works.

“That’s not a project I think I’ll ever finish.” Though Kurzweil’s vision is still years from reality, deep learning is likely to spur other applications beyond speech and image recognition in the nearer term.

Microsoft’s Peter Lee says there’s promising early research on potential uses of deep learning in machine vision—technologies that use imaging for applications such as industrial inspection and robot guidance.

Top 25 Best Machine Learning Books You Should Read

There are loads of free resources available online (such as Solutions Review’s buyer’s guides and best practices), and those are great, but sometimes it’s best to do things the old fashioned way.

We’ve carefully selected the top machine learning books based on relevance, popularity, review ratings, publish date, and ability to add business value.

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems “By using concrete examples, minimal theory, and two production-ready Python frameworks—scikit-learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems.

Deep Learning (Adaptive Computation and Machine Learning series) “The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning.

It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology;

Understanding just how these programs and processes function can help you to navigate this new technology. If you’re not familiar with the possibilities of machine learning, you’ll be surprised to see the variety of ways it can be utilized beyond the much-publicized aspects like speech recognition.

The book addresses real-life problems faced by practitioners on a daily basis, and explains scientifically sound solutions using math, supported by code and examples.

Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way.”

Many examples are given, with a liberal use of color graphics. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression &

Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 2nd Edition “Thoroughly updated using the latest Python open source libraries, this book offers the practical knowledge and techniques you need to create and contribute to machine learning, deep learning, and modern data analysis. Fully extended and modernized, Python Machine Learning Second Edition now includes the popular TensorFlow deep learning library.

Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies (MIT Press) “This introductory textbook offers a detailed and focused treatment of the most important machine learning approaches used in predictive data analytics, covering both theoretical concepts and practical applications.

An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) “An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years.

Machine Learning: The Art and Science of Algorithms that Make Sense of Data “Peter Flach’s clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss.

Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts.

Deep Learning: A Practicioner’s Approach “This hands-on guide not only provides the most practical information available on the subject, but also helps you get started building efficient deep learning networks. Authors Adam Gibson and Josh Patterson provide theory on deep learning before introducing their open-source Deeplearning4j (DL4J) library for developing production-class workflows.

Second Edition: Expert techniques for predictive modeling to solve all your data analysis problems “With this book you’ll discover all the analytical tools you need to gain insights from complex data and learn how to to choose the correct algorithm for your specific needs.

Through full engagement with the sort of real-world problems data-wranglers face, you’ll learn to apply machine learning methods to deal with common tasks, including classification, prediction, forecasting, market analysis, and clustering.

Python Machine Learning, 1st Edition “If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.

Hello World - Machine Learning Recipes #1

Six lines of Python is all it takes to write your first machine learning program! In this episode, we'll briefly introduce what machine learning is and why it's ...

11 Secrets to Memorize Things Quicker Than Others

We learn things throughout our entire lives, but we still don't know everything because we forget a lot of information. Bright Side will tell you about 11 simple ...

Report Writing

This video lecture explains how to put a report together as an assignment, and focuses on the elements which are required in a good report.

11. Introduction to Machine Learning

MIT 6.0002 Introduction to Computational Thinking and Data Science, Fall 2016 View the complete course: Instructor: Eric Grimson ..

Machine Learning & Artificial Intelligence: Crash Course Computer Science #34

So we've talked a lot in this series about how computers fetch and display data, but how do they make decisions on this data? From spam filters and self-driving ...

How to speak so that people want to listen | Julian Treasure

Have you ever felt like you're talking, but nobody is listening? Here's Julian Treasure to help you fix that. As the sound expert demonstrates some useful vocal ...

Top 10 C# Best Practices (plus bonuses)

To sign up for announcements about new videos and exclusive discounts, fill out this form: (your email will be kept safe ..

English Lesson: Learn Report Writing

Find 1500+ education videos available at Writing is an integral part of this advanced era. You have to be good in writing ..

Psychological Research - Crash Course Psychology #2

You can directly support Crash Course at Subscribe for as little as $0 to keep up with everything we're doing. Also, if you ..