lecture6_nn+backprop.pdf
“The cost function must be large and predictable enough to serve as a guide for the learning algorithm” — Goodfellow
Nonlinear Units
- Sigmoid: $\sigma(x) = \dfrac{1}{1+e^{-x}}$, $\sigma’(x) = \sigma(x)(1-\sigma(x))$
- No negatives, so SGD “zigzags”
- Tanh: $\mathrm{tanh}(x) = 2\sigma(x) - 1$
- ReLU!!!!
- Leaky ReLU, PReLU ($\alpha$ learned)