AI News, Machine Learning (Theory)
- On Thursday, June 7, 2018
- By Read More
Machine Learning (Theory)
I have recently completed a 500+ page-book on MDL, the first comprehensive overview of the field (yes, this is a sneak advertisement ). Chapter
One method (but by no means the only method) for designing a universal code relative to model M is by taking some prior W on M and using the corresponding Shannon-Fano code, i.e.
But if M is nonparametric (infinite dimensional, such as in Gaussian process regression, or histogram density estimation with an arbitrary nr of components) then many priors which are perfectly fine according to Bayesian theory are ruled out by MDL theory.
In essence, they imply that estimation based on universal codes must always be statistically consistent (the theorems also directly connect the convergence rates to the amount of compression obtained).
Answer: because the corresponding code has excellent universal coding properties, as shown by Kakade, Seeger and Foster (NIPS 2005): it has only logarithmic coding overhead if the underlying data generating process satisfies some smoothness properties;
Thus, Gaussian processes combined with RBF kernels lead to substantial compression of the data, and therefore, by Barron’s theorem, predictions based on such Gaussian processes converge fast to the optimal predictions that one could only make make if one had access to the unknown imagined “true”
This is not to say that all’s well for MDL in terms of consistency: as John and I showed in a paper that appeared earlier this year (but is really much older), if the true distribution P is not contained in the model class M under consideration but contains a good approximation P’
In our forthcoming NIPS 2007 paper, Steven de Rooij, Tim van Erven and I provide a universal-coding based procedure which converges faster than Bayes in those cases, but does not suffer from the disadvantages of leave-one-out-cross validation.
- On Sunday, July 21, 2019
Weka Tutorial 06: Discretization (Data Preprocessing)
An important feature of Weka is Discretization where you group your feature values into a defined set of interval values. Experiments showed that algorithms like ...
Minimum Description Length - Georgia Tech - Machine Learning
Watch on Udacity: Check out the full Advanced Operating Systems course for free ..
Lecture - 22 Bayesian Networks
Lecture Series on Artificial Intelligence by Prof. P. Dasgupta, Department of Computer Science & Engineering, IIT Kharagpur. For more Courses visit ...
Mod-07 Lec-19 Learning and Generalization; PAC learning framework
Pattern Recognition by Prof. P.S. Sastry, Department of Electronics & Communication Engineering, IISc Bangalore. For more details on NPTEL visit ...