Probabilistic Generative Models (cont.)

To calculate $p(x|c_k)$, assume it has a particular distribution, parametrized by some variable:

$p(x|c_1) = \mathcal{N}(\mu_1, \Sigma_1)$

$p(x|c_2) = \mathcal{N}(\mu_2, \Sigma_2)$

Maximum Likelihood

Example: Coin, weighted? Determine weighting of the coin.

Flipping 10 times: H H T H T H H H H T (70%?)

Choose a value for $\pi$ ($\pi$ is the probability of H) that maximizes the likelihood (probability) of having seen this data.

$\mathcal{L}$ = $\mathrm{Pr}\{\mathrm{data}\} = \pi^7\cdot (1-\pi)^3$

$\log\mathcal{L}=7\log\pi+3\log(1-\pi)$ (note that maximizing $\log\mathcal{L}$ is the same as maximizing $\mathcal{L}$ since $\log$ is monotonic)

$\dfrac{\partial\log\mathcal{L}}{\partial\pi} = 7\cdot\dfrac{1}{\pi} + 3\cdot\dfrac{1}{1-\pi}(-1)$

$=0$

$7=10\pi$

$\pi = 0.7$

Assume two classes: plan right, plan left.

Training data: $N$ trials, the $i$th trial $(x^{(i)},\, t^{(i)})$

Where $x$ is the firing rate for $D$ neurons, $t=0$ is right (class $\mathcal{C}_0$) and $t=1$ is left (class $\mathcal{C}_1$).

$t^{(i)}\sim \mathrm{Bernoulli}(\pi)$, $\mathrm{Pr}(\mathcal{C}_1) = \pi$, $\mathrm{Pr}(\mathcal{C}_0) = 1 - \pi$