How do we put soft-margin SVM into the empirical risk minimization framework?
Use hinge loss—is equivalent to convex problem.
$\mathrm{min}_{w,b} \{ \sum_n \ell^{\mathrm{hinge}}(y_n, h(x_n)) + \dfrac{\lambda}{2}||w||^2_2\}$
How do we kernelize SVM?
SVM primal problem: $\min_{w,b,\xi} \dfrac{1}{2}||w||^2_2 | + C\sum \xi_n$
Note the dual problem: swap the max and min.
$d^* \leq p^*$
The dual formation yields a kernalizable SVM
We can interpret the solution as the following:
Points that are on the margin, within the margin, and incorrect points all are the support vectors.
Dual: $\mathrm{max}_\alpha \sum_n \alpha_n - \dfrac{1}{2}y_my_n\alpha_m\alpha_n \phi(x_m)^T\phi(x_n)$
$w = \sum y_n\alpha_n \phi(x_n)$
$b = y_n - w^T\phi(x_n)$
Prediction: $h(x) = \mathrm{sgn}(\sum y_n\alpha_n k(x_n,x) + b)$
Consider a set of predictors, combine them into a more accurate predictor.
Could do with majority vote or weighted majority vote.