Sara Sheehan

CSC 390: Topics in Artificial Intelligence

Homework 1: Supervised Learning

Due: Monday Sept 19, 11:59pm on Moodle

The goal of this homework is to briefly introduce two core methods in supervised learning. You'll start to learn numpy and sklearn (if you haven't already), as well as experience using data with labels.

Part 1: Complete Lab 1

Make sure you have completed Lab 1. Make sure then when I run your program on the command line, it prints the accuracy for several values of k in a readable format. Also comment on (and print out) what the best value of k was.

For this homework, I would recommend copying over your code from Lab 1 as "hw1.py" and building upon this code for the second part.

Part 2: Linear regression

We will be using linear regression as implemented in sklearn. First read the documentation here:

sklearn Linear Regression documentation

Then import the "linear_model" library:

from sklearn import linear_model

Now create an instance of the LinearRegression class. Then, using the training data you created for nearest neighbors, fit the model using the "fit" function.

Use the "predict" method to predict the labels of the TEST data. Make sure to note what this function returns. It might be helpful to create a prediction function of your own that wraps this function and outputs either a 0 or a 1. (For fun: alternatively, do the entire prediction step as one list comprehension.)

Finally, use the same method for accessing accuracy as you did for nearest neighbors (should take as input the predicted labels and true labels, both as numpy arrays). Print this result as well, distinguishing it from your nearest neighbor results. So when I run your program, I should get something like this:

Nearest Neighbors:
k=value1, xx%
k=value2, xx% (highest accuracy with this value)
...
k=value5, xx% 

Linear Regression:
xx%

Part 3: Reflection

For this homework, please submit a short reflection (called "readme.txt", similar to CSC 212). What parts of this homework went well? What parts did you find challenging? Do you feel that you have a good software development environment set up? If it's been a while since you used Python, is it feeling familiar?

Submit these two files (hw1.py and readme.txt) on Moodle.

Credit: based on Exercise 2.8 from "The Elements of Statistical Learning"