Overview

Kernel methods

$\phi(x) = [1, \sqrt{2}x_1, …, x_1^2…x_1x_d,…,x_d^2…] = O(D^2)$, all pairs of $x$.

Would want to compute $\phi(u)^T\phi(x)$

Would need $O(M) = O(D^2)$, right?

How can we do this in $O(D), u^Tx$

$\phi(x) = [1, \sqrt{2}x_1, …, x_1^2…x_1x_d,…,x_d^2…]$

$\phi(u) = [1, \sqrt{2}u_1, …, u_1^2…u_1u_d,…,u_d^2…]$

$\phi(u)^T\phi(x) = 1 + 2u_1x_1 + 2u_2x_2 + ...+ x_1x_2 +...+ x_D$…

$=1 + 2\sum u_ix_i + \sum_i \sum_j x_ix_ju_iu_j$

$\sum_i \sum_j x_ix_ju_iu_j = (\sum_lx_lu_l)^2 = \sum x_iu_i \sum x_ju_j$

$=1 + 2\sum u_i x_i + (\sum u_i x_i)^2$

$= (1+ \sum u_i x_i) ^2$

$= (1+u^Tx)^2$

Constructed a particular feature map that we can compute in $O(D)$ instead of $O(M) = O(D^2)$

Kernelized Ridge Regression

$\tilde J(\theta) = ||y-\Phi\theta||^2 + \lambda ||\theta||^2$