Idea: Close points have “same/similar” test points
|     .   .  X   X
|     .    X      X
|       .      X 
|   .     *    X
|     .     .
|
|      .
---------------------------
What label should * take?
Questions: What distance function to use? What features?
Select multiple features, measure them against each other, see which have greater clustering.
Inductive bias: label of point (instance) is similar to the label of nearby points.
Instance feature vectors: $x\in \mathbb{R}^D$
Label: $y\in [C] = \{1,2,\, …\,\}$
Function: $y=h(x)$
Training data:
N samples used for learning
Validation data:
M samples used for assessing how well function will do on unseen $x$
Training/test data should not overlap.
Stores entire dataset—no explicit model