AI News, Learning from users faster using machine learning
- On Sunday, September 30, 2018
- By Read More
Learning from users faster using machine learning
I had an interesting idea a few weeks ago, best explained through an example.
I don’t quite follow it, but my understanding is that it’s more of a way to reduce noise caused by uneven assignment between the test and control group.
Instead of using say “purchased a widget” as an outcome metric, try to predict based on user attributes whether the user is going to purchase the widget.
My idea is: create a model that predicts whether someone is going to purchase a widget given a lot of additional data.
And instead of using the actual target metric (what fraction of people bought widgets) we use the predicted metric, using our machine learning model.
So for instance as inputs to the model we throw in all kinds of features, and then try to predict the target (did the user buy the widget or not?) I
“simulated” a conversion rate A/B test by picking three random subsets of users to our site in some way I’m not going to disclose.
We look at the fraction of users who make it through the entire conversion flow, and we plot the conversion rate with a confidence interval.
In this case, it looks like we can actually get a confidence interval that’s almost 50% smaller, which means we can get to statistical significance about 4x faster.
If we plot the conversion rates and the confidence intervals for a larger set of groups, we can see that the uncertainty is consistently smaller using the predicted values:
Consider this blog post a bit of a wacky experiment – I think the outcome is super interesting, and worth thinking more about.
The predicted conversion rate might have a tighter confidence interval, but it’s no longer guaranteed to converge to the “correct” value.
haven’t spent enough time understanding this, and I haven’t made up my mind if this tool is going to be something I’m planning to use for real data.
How to improve breeding value prediction for feed conversion ratio in the case of incomplete longitudinal body weights.
With the development of automatic self-feeders, repeated measurements of feed intake are becoming easier in an increasing number of species.
On the basis of the missing BW profile in French Large White pigs (male pigs weighed weekly, females and castrated males weighed monthly), we compared 2 different ways of predicting missing BW, 1 using a Gompertz model and 1 using a linear interpolation.
We performed a simulation study on this data set to mimic missing BW values according to the pattern of weekly proportions of incomplete BW data in females and castrated males.
In French Large White pigs, in the growing period extending from d 65 to 168, prediction of missing BW using a Gompertz growth model slightly improved the estimations, but the linear interpolation improved the estimation to a greater extent.
Why Conversion Rate Isn’t The Whole Story: Using Customer Data To Predict Value And Optimize Media Spend
There have been some tremendous advances in understanding the value of digital (and offline) marketing initiatives in recent years.
For those doing this well, it means knowing just how much it costs to acquire a new lead or customer, sell a product or service, or get people to interact and engage in upper funnel activities.
But while all of the latest tools and technologies are helping us to figure out how our different marketing channels are helping us achieve X, Y or Z and at what cost, many of them are missing a critical component to the equation: the customer or prospect.
We’ve been fortunate at Cardinal Path (my employer) to see and work with the collective data and analytics technology stacks of some leading organizations, and a common theme has been the missing bridge between a goldmine of customer data and all of this marketing performance data.
To illustrate the concepts here, let’s say that we’ve got ourselves a coffee shop (watch out, Starbucks!), and our marketing strategy is focused on getting new leads that hopefully become long-term customers.
We’ve implemented a holistic analytics plan that can help us tie each of these marketing initiatives to the business objective of lead generation, and at a very high level, we might end up spending quite a bit of time reviewing data like this: Look familiar?
With data like this, you can quickly compare how each of your marketing investments is performing, you can calculate ROI and understand the volume of new leads you’re getting, and of course, you can drill down into any one of these and perform this analysis at varying levels of granularity.
The point is, whatever business you’re in, there’s probably a pretty wide array of customer lifetime values in your database, and it’s also pretty likely that you’ve done some work to figure out who your most (and least) valuable customers are.
By looking into all the data points captured in even a relatively simple customer relationship management (CRM) system, businesses can begin to understand the lifetime value of their customers based on the products they’re purchasing, the average selling prices (ASPs), average order values (AOVs), the frequency of their purchases, and even the likelihood that they’ve ceased to be a customer (or churned).
The truth is, every ad that’s served and clicked, every interaction on a website or a mobile app and every social interaction has the potential to become a rich part of that customer data set and can be used to predict the value of future prospects.
To this end, e-commerce trailblazer 1stdibs (disclosure: client) recently undertook a project to gain a holistic view of its business integrating marketing data, customer data, e-commerce data and others onto a single stack.
Using prediction models with CoreML
Apple released CoreML, a new framework for integrating machine learning models into any iOS app so predictions can happen on the device, without using any external service.
However, this tutorial won’t explain how to choose an algorithm to create a prediction model, how to preprocess the data, train the model, test it, and tune the process, all essential parts of a machine learning project.
The sliders change different values that represent six elements of a wine’s chemical composition, and they are used to predict from which cultivar (1, 2, or 3) a wine comes from.
This is a good beginner’s dataset because it has no missing values and since the target is a discrete value (1, 2, or 3), it represents a classic classification problem, where we have to identify the category an observation belongs to, unlike a regression problem, where we have to predict a continuous value (for example, predicting the price of a house based on certain factors).
MacOS Sierra includes that version of Python out of the box, however, if you are going to use Python for data science projects, it’s often a good idea to use Anaconda which will ensure that we are using the correct version of Python and have all of the necessary dependencies installed.
You can reload .bash_profile without having to open a new terminal with the command: Make sure your environment is up-to-date: Verify your installation, you should see something similar to this: Now that Anaconda is installed, let’s build the script for the model.
Now create a file, let’s say wine_model.py, and start by importing Pandas, which we’ll use to load the model: Also, import the model we’re going to use to make predictions, random forest, from scikit-learn: Random forest is a regression and classification algorithm that belongs to a class of machine learning algorithms called ‘ensemble methods’.
In this case, we’ll use the read_csv() function: Pandas assume the first row should be used as the column names, but the file doesn’t have a header, so what we do here is specify the column names and explicitly tell it that there’s no header.
Note: In the data set, the cultivar is a numeric value, however, when trying to use the CoreML format in Xcode, the generated class interprets it as an Int64 type and it was throwing an error when unwrapping the dictionary with the classification probability.
This way, we can compute the model accuracy using cross-validation (with five folds, just as an example) and then printing the score’s mean: Using the actual predictions, we can also compute the score with the methods cross_val_predict and metrics.accuracy_score: Next, we fit the model with the data: And finally, let’s convert the model to the Core ML format, specifying the input feature names we’re going to use in our Swift code, and save it: Optionally, we can also specify the name of the output attribute that will hold the predicted cultivar with the output_feature_names parameter.
Run this program with: The output should be something similar to this: You’ll get slightly different results because the data will be grouped in different ways, but we can see that one method estimated a 0.88 accuracy and the other, 0.89 (also, notice in the output the predicted values).
Our model has six inputs, and they all are continuous values, so drag into the stack view six sliders with corresponding labels for each input and an additional label for the predicted cultivar.
Select the stack view, control-click inside of it, drag a line to the left and while you keep pressing the alt key, you’ll see that the option Leading Space to Safe Area changes to Leading Space to Container Margin (this is a new option in iOS 11).
Add a variable to ViewController: And a formatter for the input values: The method updateValues will update the labels according to the values of the sliders and call the wine class to predict the cultivar asynchronously: This is the definition of the method updateLabels: Updating all the labels at once probably isn’t optimal, but to keep things simple in this example let’s do it that way.
The method predictCultivar passes the value of all the sliders as Double to the predict method of the model instance and sets the text of the cultivar label with the number and its probability: Finally, let’s add a call to the method updateValues() in viewDidLoad: And that’s it.
How to Make Predictions with scikit-learn
Once you choose and fit a final machine learning model in scikit-learn, you can use it to make predictions on new data instances.
In this tutorial, you will discover exactly how you can make classification and regression predictions with a finalized machine learning model in the scikit-learn Python library.
You can learn more about how to train a final model here: Classification problems are those where the model learns a mapping between input features and an output feature that is a label, such as “spam”
class prediction is: given the finalized model and one or more data instances, predict the class for the data instances.
We can predict the class for new data instances using our finalized classification model in scikit-learn using the predict() function.
Running the example predicts the class for the three new data instances, then prints the data and the predictions together.
This is called a probability prediction where given a new instance, the model returns the probability for each outcome class as a value between 0 and 1.
You can make these types of predictions in scikit-learn by calling the predict_proba() function, for example: This function is only available on those classification models capable of making a probability prediction, which is most, but not all, models.
Running the instance makes the probability predictions and then prints the input data instance and the probability of each instance belonging to the first and second classes (0 and 1).
Regression is a supervised learning problem where, given input examples, the model learns a mapping to suitable output quantities, such as “0.1”
The same function can be used to make a prediction for a single data instance as long as it is suitably wrapped in a surrounding list or array.
- On Wednesday, January 16, 2019
Predicting the Winning Team with Machine Learning
Can we predict the outcome of a football game given a dataset of past games? That's the question that we'll answer in this episode by using the scikit-learn ...
AI for Marketing & Growth #1 - Predictive Analytics in Marketing
AI for Marketing & Growth #1 - Predictive Analytics in Marketing Download our list of the world's best AI Newsletters Welcome to our ..
Improving customer value estimation by predicting conversion - Allan Dieguez
The objective of this presentation is to describe the challenges of modeling a customer conversion predictor using real leads data observed on different levels of ...
Office Tutorials - Determining the Concentration of an Unknown Sample (Microsoft Excel 2010)
In this tutorial I show you how to (again) generate a standard curve, and use that standard curve to determine the concentration of an unknown solution ...
Forecast Function in MS Excel
The forecast function in MS Excel can be used to forecast sales, consumer trends and even weight loss! For more details: ...
Categorical Variables or Factors in Linear Regression in R (R Tutorial 5.7)
Learn how to include a categorical variable (a factor or qualitative variable) in a regression model in R. You will also learn how to interpret the model coefficients.
Forecasting Time Series Data in R | Facebook's Prophet Package 2017 & Tom Brady's Wikipedia data
An example of using Facebook's recently released open source package prophet including, - data scraped from Tom Brady's Wikipedia page - getting Wikipedia ...
How to Create a Cash Flow Forecast using Microsoft Excel - Basic Cashflow Forecast
Create a basic cash flow forecast using excel. If you need help get in contact. Support this channel ..
convert annual data to quarterly in eviews (English)
Stats: Percentile, z-score, and Area Under Normal Curve
Finding the z-score that corresponds to a given Percentile (area shaded to the left of the z-score).