Last time

Kernel Methods

Mapped feature space
If dimension of mapping higher, it would increase computation and overfitting
How do we get advantages while doing same computation
Can we compute inner product easily?
If we can express computing just in inner product, we can get efficiency

How do we kernelize algorithms

Kernelized ridge regression: $\Phi$ matrix of transformed features
- We find that the solution is just a linear combination of the features
- So we can find the coefficients
- We still have weights in the terms of higher dimension
- But we can do prediction in terms of inner products as well

Other kernel functions

RBF has dimension 0
Polynomial has dimension D choose d
A kernel function needs to be positive semi definite
Also known as mercer’s theorem

Rules of composing kernel functions

Linearity (scalar greater than 0)
Multiplication
$e^{k}$ where $k$ is a kernel function

Support Vector Machines