Last time

Nearest Neighbor Classification

Idea: Close points have “same/similar” test points

|     .   .  X   X
|     .    X      X
|       .      X 
|   .     *    X
|     .     .
|
|      .
---------------------------

What label should * take?

Questions: What distance function to use? What features?

Select multiple features, measure them against each other, see which have greater clustering.

Inductive bias: label of point (instance) is similar to the label of nearby points.

Instance feature vectors: $x\in \mathbb{R}^D$

Label: $y\in [C] = \{1,2,\, …\,\}$

Function: $y=h(x)$

Training data:

N samples used for learning

Validation data:

M samples used for assessing how well function will do on unseen $x$

Training/test data should not overlap.

Stores entire dataset—no explicit model