CMSC H360: Machine Learning

(Fall 2019)

Course Info | Schedule | Grading
Academic Integrity | Piazza | Accommodations | Links
CMSC H360 Machine Learning

Course Information

Course: TuTh 10–11:30am, Stokes 102
Professor: Sara Mathieson
Office: KINSC L302
Office hours: Mondays 2-3:45pm, Tuesdays 12:30-1:30pm, Fridays 4-5pm (all in H110)
TAs: Charles Marx, Pablo Thiel
TA Office hours: Sundays 7-8pm (Pablo), Mondays 7-8pm (Charlie), H204
Student Consultant: Mary Cott, mcott [at] haverford [dot] edu
Piazza: CMSC H360 Q&A forum

The prerequisites for this course are Data Structures, Discrete Mathematics, and Linear Algebra.

Machine Learning as a field has grown considerably over the past few decades. In this course, we will explore both classical and modern approaches, with an emphasis on theoretical understanding. There will be a significant math component (statistics and probability in particular), as well as a substantial implementation component (as opposed to using high-level libraries). However, during the last part of the course we will use a few modern libraries such as TensorFlow and Keras. By the end of this course, you should be able to form a hypothesis about a dataset of interest, use a variety of methods and approaches to test your hypothesis, and be able to interpret the results to form a meaningful conclusion. We will focus on real-world, publicly available datasets, not generating new data.

The language for this course is Python 3.

Textbook:

You do not need to purchase a textbook for this course. We will draw from two online textbooks, as well as supplemental online readings and research papers.
See the Schedule for each week's reading assignment. The schedule is tentative and subject to change throughout the semester.

Schedule (Tentative)

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS
1

Sep 03

 

Introduction to Machine Learning

  • Machine learning terminology
  • Notation
  • K-nearest neighbors
  • Featurization

Reading:

Tues:

Thurs:

Lab 1: K-nearest neighbors

Sep 05

 
2

Sep 10

 

Decision Trees

  • ML workflow
  • Entropy
  • Decision trees

Reading:

Tues:

Thurs:

Lab 2: Decision trees

Sep 12

 
3

Sep 17

 

Linear Regression

  • Loss functions
  • Bias-Variance tradeoff
  • Linear regression
  • Polynomial regression

Reading:

Tues:

Thurs:

Lab 3: Polynomial Regression

Sep 19

Last day to drop (Sep 20)

4

Sep 24

 

Probabilistic Models 1

  • Introduction to probability
  • Naive Bayes

Reading:

Tues:

Thurs:

Lab 4: Naive Bayes

Sep 26

 
5

Oct 01

 

Probabilistic Models 2

  • Likelihood functions
  • Logistic regression

Reading:

Tues:

Thurs:

Midterm 1 (in-lab and take-home)

Oct 03

 
6

Oct 08

 

Evaluation Metrics

  • Confusion matrices
  • Precision and recall
  • ROC curves
  • Relationship to probabilistic models
  • Cross-Validation

Reading:

Tues:

Thurs:

Lab 5: Logistic Regression

Oct 10

Last day to pass/fail (Oct 11)

 

Oct 15

Fall Break

Oct 17

7

Oct 22

 

Ensemble Methods

  • Bagging
  • Random forests
  • Boosting

Reading:

Tues:

Thurs:

Lab 6: Ensemble methods

Oct 24

 
8

Oct 29

 

Support Vector Machines

  • Perceptron
  • Support vector machines

Reading:

Tues:

Thurs:

Lab 6: (cont.)

Oct 31

 
9

Nov 05

 

SVMs (continued)

  • Lagrange multipliers
  • SVM optimization problems
  • Kernels

Reading:

  • (see previous week)

Tues:

Thurs:

Lab 7: Support vector machines
Project: Proposal

Nov 07

 
10

Nov 12

No class (at a conference)

Topics in Deep Learning 1

  • Introduction to neural networks
  • Fully connected architectures

Reading:

Thurs:

Lab 8: Neural Networks

Nov 14

 
11

Nov 19

 

Topics in Deep Learning 2

  • Convolutional neural networks (CNNs)
  • Midterm 2 review

Reading:

Tues:

Thurs:

Midterm 2 (in-lab and take-home)

Nov 21

 
12

Nov 26

 

Unsupervised Learning 1

  • K-means clustering

Reading:

  • ISL: Sections 10.1-10.3

Tues:

No lab: Thanksgiving

Nov 28

Thanksgiving (no class)

13

Dec 03

 

Unsupervised Learning 2

  • Gaussian mixture models
  • Hierarchical clustering
  • Dimensionality Reduction
  • Principal components analysis

Tues:

Thurs:

Tues:

Thurs:

Project: check-ins during lab
Presentation

Dec 05

 
14

Dec 10

 

Special Topic: Machine Learning and Ethics

  • Deep learning in biology
  • Learning from biased datasets

Reading:

Dec 12

 

Grading Policies

Grades will be weighted as follows:
35% Lab assignments
40% Midterms (20% each)
15% Final Project (including presentation)
10% Participation (including reading quizzes)

Quizzes and Exams

There will be occasional reading quizzes. These are graded largely on completion, not correctness. Reading quizzes count toward participation. If you have made a solid attempt at the reading and show up for class, you will likely receive full credit.

There will be two midterms. Each will have an in-lab portion (see Schedule) with shorter questions and a take-home portion with longer questions. Let me know as soon as possible if you have a conflict with one of the exams. In lieu of a final exam, there will be a final project and associated presentation (to be scheduled during the final exam period). You must pass at least one exam to pass the course overall.

Labs

Our labs are on Thursdays. Lab assignments will generally be released Tuesday night and due the following Tuesday at midnight. You are expected to read/begin the lab before your lab section on Thursday. Lab attendance is required, and missing labs will quickly affect your participation grade. There will often be pair-programming warm-up exercises as part of the lab, and lab in general is a time to build community around the course and the material. Note that Wednesday is my research day and I will be off campus and unable to answer lab questions. Make use of office hours (both mine and the TAs) and Piazza.

Weekly Lab Sessions
Lab A 11:30—12:30pm Thursdays Mathieson Hilles 110
Lab B 2:30—3:30pm Thursdays Mathieson Hilles 110

Handing in labs: Lab assignments are submitted electronically and managed using git. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Some of the programming/lab assignments may be in pairs. There may also be some written assignments that will have specific instructions for handing in.


Late Policy: Each individual will be given 2 late days for the semester. A late day is a 24 hour extension from the original deadline. You can use one day on two assignments or both days on one assignment. This will encompass any reason - illness, interviews, many midterms in the same week, etc. Past these days, late assignments will not be accepted. You should budget your days to account for future illnesses or assignment deadlines for other courses. Even if you do not fully complete a lab assignment you should submit what you have done to receive partial credit. Late days count against both partners in a group lab.

For extensions beyond these 2 late days (in the case of an emergency or ongoing personal issue), please contact your Class Dean. If your Class Dean notifies me of the issues, then we can arrange an accommodation.


Academic Integrity

From the faculty:

In a community that thrives on relationships between students and faculty that are based on trust and respect, it is crucial that students understand a professor's expectations and what it means to do academic work with integrity. Plagiarism and cheating, even if unintentional, undermine the values of the Honor Code and the ability of all students to benefit from the academic freedom and relationships of trust the Code facilitates. Plagiarism is using someone else's work or ideas and presenting them as your own without attribution. Plagiarism can also occur in more subtle forms, such as inadequate paraphrasing, failure to cite another person's idea even if not directly quoted, failure to attribute the synthesis of various sources in a review article to that author, or accidental incorporation of another's words into your own paper as a result of careless note-taking. Cheating is another form of academic dishonesty, and it includes not only copying, but also inappropriate collaboration, exceeding the time allowed, and discussion of the form, content, or degree of difficulty of an exam. Please be conscientious about your work, and check with me if anything is unclear.

Please also note the CS Department Collaboration Policy.

More details for this course:

Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.


Piazza

This semester we’ll be using Piazza, an online Q&A forum for class discussion, help with labs, clarifications, and announcements. You should have received an email invitation to join CMSC H360 on Piazza. If you didn't, please let me know.

Piazza is meant for questions outside of regular meeting times such as office hours, class, and lab. Please do not hesitate to ask and answer questions on Piazza, but keep in mind the following guidelines:

  1. Piazza should be used for ALL content and logistics questions outside of class, lab, and office hours. Please do not email me your code or extended questions about the assignments.
  2. If there is a personal issue that relates only to you, please email me.
  3. We encourage non-anonymous posts, but you may post anonymously (to your classmates, not the instructors).
  4. Do not post long blocks of code on Piazza - if you can distill the problem to 1-2 lines of code and an error message, that’s fine, but try to avoid giving out key components of your work.
  5. By the same token, when answering a question, try to give some guiding help but do not post code fixes or explicit solutions to the problem.
  6. Posting on Piazza counts toward your participation grade, both asking and answering!

Academic Accommodations

For details about the accommodations process, visit the Access and Disability Services website.

Haverford College is committed to providing equal access to students with a disability. If you have (or think you have) a learning difference or disability – including mental health, medical, or physical impairment - please contact the Office of Access and Disability Services (ADS) at hc-ads@haverford.edu. The Coordinator will confidentially discuss the process to establish reasonable accommodations.

Students who have already been approved to receive academic accommodations and want to use their accommodations in this course should share their verification letter with me and also make arrangements to meet with me as soon as possible to discuss their specific accommodations. Please note that accommodations are not retroactive and require advance notice to implement.

It is a state law in Pennsylvania that individuals must be given advance notice if they are to be recorded. Therefore, any student who has a disability-related need to audio record this class must first be approved for this accommodation from the Coordinator of Access and Disability Services and then must speak with me. Other class members will need to be aware that this class may be recorded.


Machine Learning notation (we will follow this loosely)
Official Python style guide
Python 3 Documentation
Atom editor