Last time

How do we put soft-margin SVM into the empirical risk minimization framework?

Use hinge loss—is equivalent to convex problem.

$\mathrm{min}_{w,b} \{ \sum_n \ell^{\mathrm{hinge}}(y_n, h(x_n)) + \dfrac{\lambda}{2}||w||^2_2\}$

How do we kernelize SVM?

SVM primal problem: $\min_{w,b,\xi} \dfrac{1}{2}||w||^2_2 | + C\sum \xi_n$

Note the dual problem: swap the max and min.

$d^* \leq p^*$

The dual formation yields a kernalizable SVM

We can interpret the solution as the following:

Points that are on the margin, within the margin, and incorrect points all are the support vectors.

Summary: Primal and Dual for SVM

Dual: $\mathrm{max}_\alpha \sum_n \alpha_n - \dfrac{1}{2}y_my_n\alpha_m\alpha_n \phi(x_m)^T\phi(x_n)$

$w = \sum y_n\alpha_n \phi(x_n)$

$b = y_n - w^T\phi(x_n)$

Prediction: $h(x) = \mathrm{sgn}(\sum y_n\alpha_n k(x_n,x) + b)$

Ensemble Learning

Consider a set of predictors, combine them into a more accurate predictor.

Could do with majority vote or weighted majority vote.