Lab 1: Nearest neighbors
in-class, to be turned in as part of Homework 1
Before starting this lab, make sure to finish Lab 0.
An important part of many machine learning methods is the concept of "distance" between examples. We often phrase this as a "metric" on our inputs. Create a Python function that takes as input two training examples (any two examples, although in this case we'll use it with one test and one train), and outputs the distance (we'll use Euclidean for now) between them.
Loop through all the training TEST (correction!) examples, using your classification function to predict the label for each one. During this loop, also create a way of determining if the prediction was correct or not, based on the labels of the TEST data. Compute the fraction or percentage of correctly predicted examples. How does this change as k varies? Try several values of k and record the accuracy. In the next part of Homework 1, these results can be directly compared with linear regression.
Make sure to save your work, since this lab will be turned in as part of Homework 1.
Credit: based on Exercise 2.8 from "The Elements of Statistical Learning"