Last time

Logistic Regression

$J(\theta) = -\sum y_n\log h_\theta (x_n) + (1-y_n)\log[1-h_\theta(x_n)]$ — empirical risk minimization

Numerical optimization: Stationary point → $\nabla J(\theta) = 0$

Gradient descent: $\theta_{n+1} = \theta_n - \eta\nabla J(\theta_n)$

Convex functions: $f(\lambda u + (1-\lambda)v) \leq \lambda f(u) + (1-\lambda) f(v), \, \lambda \in [0,1]$ (function always lies below any chord)

Sufficient condition to find convex functions: $\nabla^2 f \geq 0$ (positive semi-definite) $\forall z, \, z^TAz \geq 0$

Question: Is $J(\theta)$ convex? Yes, Hessian is positive semi definite.

Gradient descent: $\theta_{n+1} = \theta_n - \eta\nabla \sum [h_\theta(x_n)-y_n]x_n$

Now, the labels $y_n$ are real numbers—how close the predictions are to $y_n$ in terms of Euclidean distance.

Predictor: $h_\theta (x) = \theta^Tx$

$J(\theta) = \dfrac{1}{N}\sum |y_i - \theta^Tx_i|^2$ — quadratic cost criterion

Optimization: $\hat \theta = \mathrm{arg\,min}_\theta J(\theta)$, $J(\theta) = ||y-X\theta||_2^2$

Input: $x \in \mathbb{R}^D$

Output: $y\in\mathbb{R}$

Training data: $\mathcal{D} = \{(x_n, y_n),n = 1, 2,…,N\}$

Error: $(\hat y - y)^2$, $\hat y = \theta^Tx$