
Last lecture

Empirical risk minimization

Supervised Learning

Aim to build a function $h(x)$ to predict the true value $y$ associated with $x$.

Example: Loss function $L(y, h(x))$ = $(y-h(x))^2$ $l_2$ norm

For classification: indicator function, logistic loss: $-y\log h(x) - (1-y)log[1-h(x)]$ (assumes $h$ is between 0 and 1)

Risk/expected loss: assume we know true $p(x,y)$. The risk is $R[h(x)] = \sum_{x,y}L(h(x),y) \cdot p(x,y)$

Claim: given an observation $x$, the optimal Bayes’ classifier is given by $h^{opt}(x) = \mathrm{arg\:max}_y p(y|x)$ (known as the maximum a-posteriori classifier)

However, this is only possible if we know the probability distribution.

Since we don’t know the underlying probability distribution, we use empirical risk/training error.