AI News, The Two Cultures: statistics vs. machine learning?
- On Friday, August 17, 2018
- By Read More
The Two Cultures: statistics vs. machine learning?
If somebody claims a particular estimator is an unbiased estimator for $\theta$, then we try many values of $\theta$ in turn, generate many samples from each based on some assumed model, push them through the estimator, and find the average estimated $\theta$.
If we can prove that the expected estimate equals the true value, for all values, then we say it's unbiased.'
The empirical data you use might have all sorts of problems with it, and might not behave according the model we agreed upon for evaluation.'
While your method might have worked on one dataset (the dataset with train and test data) that you used in your evaluation, I can prove that mine will always work.'
Your 'proof' is only valid if the entire dataset behaves according to the model you assumed.'
I'd love to step in and balance things up, perhaps demonstrating some other issues, but I really love watching my frequentist colleague squirm.'
Whereas I will do an evaluation that is more general (because it involves a broadly-applicable proof) and also more limited (because I don't know if your dataset is actually drawn from the modelling assumptions I use while designing my evaluation.)' ML: 'What evaluation do you use, B?'
Then we can use the idea that none of us care what's in the black box, we care only about different ways to evaluate.'
The frequentist will calculate these for each blood testing method that's under consideration and then recommend that we use the test that got the best pair of scores.'
They will want to know 'of those that get a Positive result, how many will get Sick?' and 'of those that get a Negative result, how many are Healthy?' ' ML: 'Ah yes, that seems like a better pair of questions to ask.'
One option is to run the tests on lots of people and just observe the relevant proportions.
Your 'proven' coverage probabilities won't stack up in the real world unless all your assumptions stand up.
You call me crazy, yet you pretend your assumptions are the work of a conservative, solid, assumption-free analysis.'
But the interesting thing is that, once we decide on this form of evaluation, and once we choose our prior, we have an automatic 'recipe' to create an appropriate estimator.
If he wants an unbiased estimator for a complex model, he doesn't have any automated way to build a suitable estimator.'
I don't have an automatic way to create an unbiased estimator, because I think bias is a bad way to evaluate an estimator.
But given the conditional-on-data estimation that I like, and the prior, I can connect the prior and the likelihood to give me the estimator.'
We all have different ways to evaluate our methods, and we'll probably never agree on which methods are best.'
And some 'frequentist' proofs might be fun too, predicting the performance under some presumed model of data generation.'
Sometimes, you have great difficulty finding unbiased estimators, and even when you do you have a stupid estimator (for some really complex model) that will say the variance is negative.
ML: 'The lesson here is that, while we disagree a little on evaluation, none of us has a monopoly on how to create estimator that have properties we want.'
- On Wednesday, June 26, 2019
Maximum Likelihood Estimation Examples
for more great signal processing content, including concept/screenshot files, quizzes, MATLAB and data files. Three examples of ..
Efficiency of estimators
This video details what is meant by the efficiency of an estimator, and why it is a desirable property for an econometric estimator to have. Check out ...
6. Maximum Likelihood Estimation (cont.) and the Method of Moments
MIT 18.650 Statistics for Applications, Fall 2016 View the complete course: Instructor: Philippe Rigollet In this lecture, Prof. Rigollet ..
Moment method estimation: Uniform distribution
estimation of parameters of uniform distribution using method of moments.
Instrumental-variables regression using Stata®
Learn how to fit instrumental-variables models for endogenous covariates using -ivregress-. Created using Stata 13; largely applicable to Stata 14. Copyright ...
Estimating production function - linear regression using R
Estimation of production function to illustrate computing linear regression. I have used R for this purpose. Specification comes from Greene W. H.(2003): ...
K-Fold Cross Validation - Intro to Machine Learning
This video is part of an online course, Intro to Machine Learning. Check out the course here: This course was designed ..
Video 8: Logistic Regression - Interpretation of Coefficients and Forecasting
This video discusses the interpretation of a logistic regression's coefficients and, more specifically, the slope of the independent variables when all other ...
Understanding the p-value - Statistics Help
With Spanish subtitles. This video explains how to use the p-value to draw conclusions from statistical output. It includes the story of Helen, making sure that the ...
Machine Learning - Supervised VS Unsupervised Learning
Enroll in the course for free at: Machine Learning can be an incredibly beneficial tool to ..