Overview

Decision Trees

Features: $(x, y)$ where $x \in \{0,\,1\}$ or $x\in\mathbb{R}$.

Can decide on features based on binary or threshold on a real value.

How to build Decision Tree

We want to build a tree such that we reduce the uncertainty in a label.

Mutual information measures how much a random variable $X$ helps us in finding out about the value of another random variable $Y$

$$ I(X;Y)=H(Y)-H(Y|X) $$

At each point in time, choose one that maximizes mutual information.

Entropy: If $X$ certainly takes a value, entropy of 0. (For binary, $P(X) = 0.5$, entropy is 1)

Example

$\mathcal{X}=\{0,1\}^3$ and $\mathcal{Y} = \{0,1\}$

$((x_1,x_2,x_3),y)$

$((1,1,1),1)$

$((1,0,0),1)$

$((1,1,0),0)$

$((0,0,1),0)$