Last time

Decision Trees

First decision: at root of tree—which attribute to split?

Choose the one that gives you more information:

If a certain attribute gives the same probability of class, don’t choose it

Can use misclassification error to determine which gives more information; however, not entirely a great heuristic

Could have a definite answer for one feature, but not and still “just as error-like”

Entropy

Can use entropy to measure uncertainty:

$-\sum_{k=1}^K P(X=a_k)\log P(X=a_k)$

Example:

$X\in \{0,1\}$

$P(X=0) = p$

$P(X=1)= 1- p$

$H_2 (X) = p\cdot\log \dfrac{1}{p} + (1-p)\log \dfrac{1}{1-p}$

(Is a quadratic: closer to $p = 0.5$ more uncertainty.)

Max entropy: uniform distribution

Choosing feature over feature: