CS66: Machine Learning

(Spring 2019)

Course Info | Schedule | Grading
Academic Integrity | Piazza | Links
CS66 Machine Learning

Course Information

Course: MWF 10:30–11:20, Science Center 181
Professor: Sara Mathieson
Office: Science Center 249
Office hours: Monday 12:30-2pm and Friday 1-3pm
Piazza: CS66 Q&A forum

The prerequisite for this course is CS35. Machine Learning as a field has grown considerably over the past few decades. In this course, we will explore both classical and modern approaches, with an emphasis on theoretical understanding. There will be a significant math component (statistics and probability in particular), as well as a substantial implementation component (as opposed to using high-level libraries). However, during the last part of the course we will use a few modern libraries such as TensorFlow and Keras. By the end of this course, you should be able to form a hypothesis about a dataset of interest, use a variety of methods and approaches to test your hypothesis, and be able to interpret the results to form a meaningful conclusion. We will focus on real-world, publicly available datasets, not generating new data.

The language for this course is Python 3.

Textbook:

book pic We will primarily be using the book An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. It is free and available online.


See the Schedule for each week's reading assignment, which will often be supplemented with other material and optional research papers. The schedule is tentative and subject to change throughout the semester.

Schedule (Tentative)

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS
1

Jan 21

MLK Day - NO CLASS

Introduction to Machine Learning

  • Machine learning terminology
  • Notation
  • K-nearest neighbors

Reading:

Wed:

Fri:

Lab 1: K-nearest neighbors

Jan 23

 

Jan 25

 
2

Jan 28

 

Decision Trees

  • Decision trees

Reading:

Mon:

Mon:

Wed:

Fri:

Lab 2: Decision trees

Jan 30

Out of town - NO CLASS

Feb 01

Out of town - NO CLASS

Drop/add ends

3

Feb 04

 

Linear Regression

  • Linear regression

Reading:

  • ISL: Sections 3.1, 3.2, 3.5
  • (optional) ISL: Sections 3.3, 3.4
  • (optional) Simple linear regression by James Kirchner (2001)

Feb 06

 

Feb 08

 
4

Feb 11

 

Probabilistic Models 1

  • Introduction to probability
  • Logistic regression

Reading:

Mon:

Wed:

Fri:

Lab 3: Regression

Feb 13

 

Feb 15

 
5

Feb 18

 

Probabilistic Models 2

  • Naive Bayes

Reading:

Mon:

Wed:

Fri:

Lab 4: Probabilistic Models

Feb 20

 

Feb 22

 
6

Feb 25

 

Evaluation Metrics

  • Confusion matrices
  • Precision and recall
  • ROC curves
  • Relationship to probabilistic models
  • Cross-Validation

Reading:

Mon:

Wed:

Fri:

In-lab Midterm 1

Feb 27

 

Mar 01

 
7

Mar 04

 

Ensemble Methods

  • Bagging
  • Random forests
  • Boosting

Reading:

Mon:

Wed:

Fri:

Lab 4: (cont)

Mar 06

 

Mar 08

 
 

Mar 11

Spring Break

Mar 13

Mar 15

8

Mar 18

 

Support Vector Machines

  • Perceptron
  • Support vector machines

Reading:

Mon:

Wed:

Fri:

Lab 5: Ensemble methods

Mar 20

 

Mar 22

 
9

Mar 25

 

SVMs (continued)

  • Lagrange multipliers
  • SVM optimization problems
  • Kernels

Reading:

  • (see previous week)

Mon:

Wed:

Fri:

Lab 6: Support vector machines

Mar 27

 

Mar 29

CR/NC/W Deadline

10

Apr 01

 

Topics in Deep Learning 1

  • Introduction to neural networks
  • Fully connected architectures

Reading:

Mon:

Wed:

Fri:

Lab 6: (cont)

Apr 03

 

Apr 05

 
11

Apr 08

 

Topics in Deep Learning 2

  • Convolutional neural networks (CNNs)
  • Generative adversarial networks (GANs)

Reading:

Mon:

Wed:

Fri:

Lab 7: Neural Networks

Apr 10

 

Apr 12

 
12

Apr 15

 

Unsupervised Learning

  • K-means clustering
  • Gaussian mixture models
  • Hierarchical clustering
  • Dimensionality Reduction
  • Principal components analysis

Reading:

  • ISL: Sections 10.1-10.3

Mon:

Wed:

Fri:

Project: Proposal
Lab 8: (optional) Unsupervised Learning

Apr 17

 

Apr 19

 
13

Apr 22

 

Midterm Review and Special Topics

  • Midterm 2 review
  • Guest Lecture by Prof. Matt Zucker

Mon:

Wed:

Fri:

In-lab Midterm 2

Apr 24

 

Apr 26

 
14

Apr 29

 

Special Topic: Machine Learning and Ethics

  • Deep learning in biology
  • Learning from biased datasets

Reading:

Mon:

Wed:

Fri:

Project: Presentation

May 01

 

May 03

 

Grading Policies

Grades will be weighted as follows:
35%Lab assignments
40%In-class midterms (20% each)
15%Final Project
10%Participation

Exams

There will be two midterms, given in lab, as shown on the Schedule. Let me know as soon as possible if you have a conflict with one of the exams.

We will not have a final exam, but we will be using the final exam slot for project presentations. The final exam slot will be released later in the semester.

Lab Policy

Our labs are on Wednesdays, and lab assignments will be generally be due the following Tuesday at midnight. Lab attendance is required, and missing labs will quickly affect your participation grade. Note that Tuesday is my research day and I will be off campus and unable to answer lab questions. Make use of office hours on Monday and Piazza anytime.

Weekly Lab Sessions
CS66 A 1:15—2:45pm Wednesdays Mathieson Clothier 016
CS66 B 3—4:30pm Wednesdays Mathieson Clothier 016

Handing in labs: Lab assignments are submitted electronically and managed using git. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Most of the programming/lab assignments will be in pairs. There may also be some written assignments that will have specific instructions for handing in.


Late Policy: Each individual will be given 2 late days for the semester. A late day is a 24 hour extension from the original deadline. You can use one day on two assignments or both days on one assignment. This will encompass any reason - illness, interviews, many midterms in the same week, etc. Past these days, late assignments will not be accepted. You should budget your days to account for future illnesses or assignment deadlines for other courses. Even if you do not fully complete a lab assignment you should submit what you have done to receive partial credit. Late days count against both partners in a group lab.

For extensions beyond these 2 late days (in the case of an emergency or ongoing personal issue), please contact your Class Dean. If your Class Nean notifies me of the issues, then we can arrange an accommodation.


Academic Integrity

Academic honesty is required in all your work. Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.

Failure to abide by these rules constitutes academic dishonesty and will lead to a hearing of the College Judiciary Committee. According to the Faculty Handbook: "Because plagiarism is considered to be so serious a transgression, it is the opinion of the faculty that for the first offense, failure in the course and, as appropriate, suspension for a semester or deprivation of the degree in that year is suitable; for a second offense, the penalty should normally be expulsion."

The spirit of this policy applies to all course work, including code, homework solutions (e.g., proofs, analysis, written reports), and exams. Please contact me if you have any questions about what is permissible in this course.


Piazza

This semester we’ll be using Piazza, an online Q&A forum for class discussion, help with labs, clarifications, and announcements. You should have received an email invitation to join CS66 on Piazza. If you didn't, please let me know.

Piazza is meant for questions outside of regular meeting times such as office hours, class, and lab. Please do not hesitate to ask and answer questions on Piazza, but keep in mind the following guidelines:

  1. Piazza should be used for ALL content and logistics questions outside of class, lab, and office hours. Please do not email me your code or questions about the assignments.
  2. If there is a personal issue that relates only to you, please email me.
  3. We encourage non-anonymous posts, but you may post anonymously (to your classmates, not the instructors).
  4. Do NOT post long blocks of code on Piazza - if you can distill the problem to 1-2 lines of code and an error message, that’s fine, but try to avoid giving out key components of your work.
  5. By the same token, when answering a question, try to give some guiding help but do not post code fixes or explicit solutions to the problem.
  6. Posting on Piazza counts toward your participation grade, both asking and answering!

Academic Accommodations

If you believe that you need accommodations for a disability, please contact the Office of Student Disability Services (Parrish 113W) or email studentdisabilityservices at swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, the Office will issue students with documented disabilities a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact the Office as soon as possible. For details about the accommodations process, visit the Student Disability Service website.

To receive an accommodation for a course activity, you must have an Accommodation Authorization letter from the Office of Student Disability Services and you need to meet with me to work out the details of your accommodation at least one week prior to the activity.

You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged through the Office of Student Disability Services.


Python style guide From Prof. Tia Newhall
Official Python style guide
Python 3.5 Documentation
Atom editor
Remote access with atom