AI News, Machine Learning FAQ

Machine Learning FAQ

Index For practical reasons (combinatorial explosion) most libraries implement decision trees with binary splits.

“Constructing optimal binary decision trees is NP-complete.” Information Processing Letters 5.1 (1976): 15-17.) Our objective function (e.g., in CART) is to maximize the information gain (IG) at each split:

in practice both Gini Impurity and Entropy typically yield very similar results and it is often not worth spending much time on evaluating trees using different impurity criteria rather than experimenting with different pruning cut-offs. Another

which is a useful criterion for pruning but not recommend for growing a decision tree since it is less sensitive to changes in the class probabilities of the nodes.

So let me illustrate what I mean by “the classification error is less sensitive to changes in the class probabilities” by looking at the two possible splitting scenarios shown in the figure below.

We start with a data set D_p at the parent node that consists 40 samples from class 1 and 40 samples from class 2 that we split into two datasets D_left and D_right, respectively.

Decision Tree 1: how it works

Full lecture: A Decision Tree recursively splits training data into subsets based on the value of a single attribute. Each split corresponds to a ..