Announcements
- Makeup lecture on 4/14, Eng. IV, Shannon Room 1-3pm
- Reading: Course reader Bruinlearn
- Perception & logistic + linear regression
Last lecture
- Issue: don’t know the distribution $p(x,y)$
- Approach: Use a proxy for risk:
- $R_{emp}(h) = \dfrac{1}{N} \sum^N_{i=1} L(y_i, h(x_i))$
- Empirical Risk Min
- Intuitively, the empirical risk approaches the real risk as $N$ goes to infinity.
- Empirical Risk Minimization
- $\hat h_{emp} = \mathrm{arg min}{h \in \mathcal{H}} R{emp}(h) = \mathrm{arg min}{h \in \mathcal{H}} \dfrac{1}{N} \sum^N{i=1} L(y_i, h(x_i))$
- Where $\mathcal{H}$ is the set of all functions in the hypothesis class
- The function $f(x) = y$ where the domain is the latent space and the range is the observable space
Empirical risk minimization
Supervised Learning
Aim to build a function $h(x)$ to predict the true value $y$ associated with $x$.
Example: Loss function $L(y, h(x))$ = $(y-h(x))^2$ $l_2$ norm
For classification: indicator function, logistic loss: $-y\log h(x) - (1-y)log[1-h(x)]$ (assumes $h$ is between 0 and 1)
Risk/expected loss: assume we know true $p(x,y)$. The risk is $R[h(x)] = \sum_{x,y}L(h(x),y) \cdot p(x,y)$
Claim: given an observation $x$, the optimal Bayes’ classifier is given by $h^{opt}(x) = \mathrm{arg\:max}_y p(y|x)$ (known as the maximum a-posteriori classifier)
However, this is only possible if we know the probability distribution.
Since we don’t know the underlying probability distribution, we use empirical risk/training error.