AI News, Justin Domke's Weblog

Justin Domke's Weblog

There are many dimensions on which we might compare a machine learning or data mining algorithm.

A few of the first that come to mind are: 1) Sample complexity, convergence How much predictive power is the algorithm able to extract from a given number of examples?

However, if that faster algorithm comes at the expense of sample complexity, one would need to measure the expense of running a program longer against the expense of gathering more data.

Some examples of theory versus practice are 1) Upper-Confidence Bounds versus Thomson Sampling with bandit algorithms 2) Running a convex optimization algorithm with a theoretically derived Lipschitz constant versus a smaller one that still seems to work and 3) Doing model selection via VC-dimension generalization bounds versus using K-fold cross-validation.

work with a lot of compute-heavy applications where this is almost a wall: we don’t care about memory usage until we run out of it, after which we care a great deal.

In principle, for a wide range of inputs, convergence is guaranteed by iteratively setting where is a noisy unbiased estimate of the gradient at and is some sequence of step-sizes that obeys the Robbins-Monro conditions [1].

These days, I’d consider an algorithm that consists of a few moderate-dimensional matrix multiplications or singular value decompositions “simple”.

However, that’s due to a huge amount of effort designing reliable algorithms for these problems, and the ubiquity of easy to use libraries.

If a domain expert can understand the predictive mechanism, they may be able to assess if this will still hold in the future, or captures something true only in the training period.

However, this often comes at a cost— a more general-purpose algorithm cannot exploit the extra structure present in a specialized problem (or, at least, has more difficulty doing so).

Obviously, all else being equal, we would prefer an algorithm that still does something reasonable when the actual dependence when the expected value of is not linear in .

The original paper pointed out that this will have somewhat worse sample complexity that the full likelihood, and (much!) better time complexity.

Many papers seem to attribute the bad performance of the pseudolikelihood in practice to this sample complexity, when the true cause is that the likelihood does something reasonable (minimizes KL divergence) when there is model mis-specification, but the pseudolikelihood does not.

For example, take this decision tree for choosing an algorithm for unconstrained optimization, due to Dianne O’Leary: Essentially, this amounts to the principle that one should use the least general algorithm available, so that it can exploit as much structure of the problem as possible.

This doesn’t seem possible with machine learning, since there doesn’t exist a single hierarchy Rather, ML problems are a tangle of model specification, computational and architecture requirements, implementation constraints, user risk-tolerances and so on.

Big O Notations

Get the Code Here: Welcome to my Big O Notations tutorial. Big O notations are used to measure how well a computer algorithm scales as ..

Time complexity analysis: asymptotic notations - big oh, theta ,omega

See complete series on time complexity here In this lesson we will introduce ..

What is a HashTable Data Structure - Introduction to Hash Tables , Part 0

This tutorial is an introduction to hash tables. A hash table is a data structure that is used to implement an associative array. This video explains some of the ...

What is Time Complexity Analysis? - Basics of Algorithms

Time Complexity Analysis is a basic function that every computer science student should know about. This fundamental concept is often used to define the ...

What is Growth of a Function in Analysis of Algorithm ||

Growth of a Function in Analysis of Algorithm In computer science, the analysis of algorithms is the determination of the amount of resources (such as time and ...

Algorithms in Strategic or Noisy Environments

Algorithms are sometimes used in strategic or noisy environments. These factors can completely change the solutions of the problems. In this talk, I am going to ...

16. Complexity: P, NP, NP-completeness, Reductions

MIT 6.046J Design and Analysis of Algorithms, Spring 2015 View the complete course: Instructor: Erik Demaine In this lecture, ..

Data Structures Using C 102 Introduction to Depth First Search with examples

Follow me on Facebook Subscribe to our channel on youtube to get latest updates on Video lectures Our video lectures ...

12. Searching and Sorting

MIT 6.0001 Introduction to Computer Science and Programming in Python, Fall 2016 View the complete course: Instructor: Prof

9 4 Analysis of Contraction Algorithm 30 min