CMSC H360: Machine Learning

(Fall 2020)

Course Info | Schedule | Grading
Academic Integrity | Piazza | Accommodations | Links
CMSC H360 Machine Learning

Course Information

Course: TuFri 9:30–11am
Professor: Sara Mathieson
Office: KINSC L302
Office hours: Mon 9:45-11am, Tues 4:30-6pm
TAs: Jason Ngo, Fiona Xu
TA hours: Sun 8:30-10pm (Fiona), Thurs 8-9:30pm (Jason)

The prerequisites for this course are Data Structures, Discrete Mathematics, and Linear Algebra.

Machine Learning as a field has grown considerably over the past few decades. In this course, we will explore both classical and modern approaches, with an emphasis on theoretical understanding. There will be a significant math component (statistics and probability in particular), as well as a substantial implementation component (as opposed to using high-level libraries). However, during the last part of the course we will use a few modern libraries such as TensorFlow and Keras. By the end of this course, you should be able to form a hypothesis about a dataset of interest, use a variety of methods and approaches to test your hypothesis, and be able to interpret the results to form a meaningful conclusion. We will focus on real-world, publicly available datasets, not generating new data.

The language for this course is Python 3.

Textbook:

You do not need to purchase a textbook for this course. We will draw from two online textbooks, as well as supplemental online readings and research papers.

Format:

See the Computer Science department teaching plans for Fall 2020.


See the Schedule for each week's reading assignment. The schedule is tentative and subject to change throughout the semester.

Schedule (Tentative)

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS
1

Sep 08

 

Introduction to Machine Learning

  • Machine learning terminology
  • Notation
  • K-nearest neighbors
  • Featurization

Reading:

Tues:

Fri:

Lab 1: K-nearest neighbors

Sep 11

 
2

Sep 15

 

Decision Trees

  • ML workflow
  • Entropy
  • Decision trees

Reading:

Tues:

Fri:

Lab 2: Decision trees

Sep 18

Last day to drop (Sep 20)

3

Sep 22

 

Linear Regression

  • Loss functions
  • Bias-Variance tradeoff
  • Linear regression
  • Polynomial regression

Reading:

Tues:

Fri:

Lab 3: Polynomial Regression

Sep 25

 
4

Sep 29

 

Probabilistic Models 1

  • Introduction to probability
  • Naive Bayes

Reading:

Tues:

Fri:

Lab 4: Naive Bayes

Oct 02

 
5

Oct 06

 

Probabilistic Models 2

  • Likelihood functions
  • Logistic regression

Reading:

Tues:

Fri:

Midterm 1

Oct 09

Last day to pass/fail (Oct 11)

6

Oct 13

 

Evaluation Metrics

  • Confusion matrices
  • Precision and recall
  • ROC curves
  • Relationship to probabilistic models
  • Cross-Validation

Reading:

Tues:

Fri:

Lab 5: Logistic Regression

Oct 16

 
7

Oct 20

 

Ensemble Methods

  • Bagging
  • Random forests
  • Boosting

Reading:

Tues:

Fri:

Lab 6: Ensemble methods

Oct 23

 
8

Oct 27

 

Support Vector Machines

  • Perceptron
  • Support vector machines
  • Lagrange multipliers
  • SVM optimization problems
  • Kernels

Reading:

Tues:

Optional:

Lab 7: sklearn

Oct 30

 
9

Nov 03

 

Topics in Deep Learning 1

  • Introduction to neural networks
  • Fully connected architectures

Reading:

Optional:

Nov 06

 
10

Nov 10

 

Topics in Deep Learning 2

  • Convolutional neural networks (CNNs)
  • Midterm 2 review

Reading:

Tues:

Fri:

Lab 8: Neural Networks

Nov 13

 
11

Nov 17

 

Unsupervised Learning 1

  • K-means clustering

Reading:

  • ISL: Sections 10.1-10.3

Tues:

Fri:

Lab 8 (cont)

Nov 20

 
 

Nov 24

Thanksgiving Break

Nov 27

12

Dec 01

 

Unsupervised Learning 2

  • Gaussian mixture models
  • Hierarchical clustering
  • Dimensionality Reduction
  • Principal components analysis

Tues:

Fri:

Tues:

Midterm 2
Project

Dec 04

 
13

Dec 08

 

Special Topic: Machine Learning and Ethics

  • Deep learning in biology
  • Learning from biased datasets

Reading:

Dec 11

 
14

Dec 15

 

 

 

Dec 18

 

Grading Policies

Grades will be weighted as follows (UPDATED BELOW):
40% Lab assignments
25% Midterm I
25% Midterm II *OR* Final Project (including presentation)
10% Participation

Quizzes and Exams

In lieu of reading quizzes this semester, we will have short excercises during class (to work on and discuss, not turn in). Be ready to work on these exercises by completing the weekly reading before class on Fridays.

There will be two midterms (with limited time, but you will have several days to choose a window). In lieu of a final exam, there will be a final project and associated presentation. You must pass at least one exam to pass the course overall.

Labs

Our labs are on Thursdays. Lab assignments will generally be released Tuesday night and due the following Tuesday at midnight (with a grace period before collection due to timezone differences). You are expected to read/begin the lab before your lab section on Thursday. Lab attendance is required, and missing labs will quickly affect your participation grade. There will often be pair-programming warm-up exercises as part of the lab, and lab in general is a time to build community around the course and the material. Note that Wednesday is my research day and I will be off campus and unable to answer lab questions. Make use of office hours (both mine and the TAs) and Piazza.

Weekly Lab Sessions
Lab A 9:30—10:30am Thursdays Mathieson Zoom
Lab B 11am—12pm Thursdays Mathieson Zoom

Handing in labs: Lab assignments are submitted electronically and managed using git. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Some of the programming/lab assignments may be in pairs. There may also be some written assignments that will have specific instructions for handing in.


Late Policy: Each individual will be given 2 late days for the semester. A late day is a 24 hour extension from the original deadline + grace period. You can use one day on two assignments or both days on one assignment. This will encompass any reason - illness, interviews, many midterms in the same week, etc. Past these days, late assignments will not be accepted. You should budget your days to account for future illnesses or assignment deadlines for other courses. Even if you do not fully complete a lab assignment you should submit what you have done to receive partial credit. Late days count against both partners in a group lab.

For extensions beyond these 2 late days (in the case of an emergency or ongoing personal issue), please contact your Class Dean. If your Class Dean notifies me of the issues, then we can arrange an accommodation.


Academic Integrity

From the faculty:

In a community that thrives on relationships between students and faculty that are based on trust and respect, it is crucial that students understand a professor's expectations and what it means to do academic work with integrity. Plagiarism and cheating, even if unintentional, undermine the values of the Honor Code and the ability of all students to benefit from the academic freedom and relationships of trust the Code facilitates. Plagiarism is using someone else's work or ideas and presenting them as your own without attribution. Plagiarism can also occur in more subtle forms, such as inadequate paraphrasing, failure to cite another person's idea even if not directly quoted, failure to attribute the synthesis of various sources in a review article to that author, or accidental incorporation of another's words into your own paper as a result of careless note-taking. Cheating is another form of academic dishonesty, and it includes not only copying, but also inappropriate collaboration, exceeding the time allowed, and discussion of the form, content, or degree of difficulty of an exam. Please be conscientious about your work, and check with me if anything is unclear.

Please also note the CS Department Collaboration Policy.

More details for this course:

Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.


Piazza

This semester we’ll be using Piazza, an online Q&A forum for class discussion, help with labs, clarifications, and announcements. You should have received an email invitation to join CMSC H360 on Piazza. If you didn't, please let me know.

Piazza is meant for questions outside of regular meeting times such as office hours, class, and lab. Please do not hesitate to ask and answer questions on Piazza, but keep in mind the following guidelines:

  1. Piazza should be used for ALL content and logistics questions outside of class, lab, and office hours. Please do not email me your code or extended questions about the assignments.
  2. If there is a personal issue that relates only to you, please email me.
  3. We encourage non-anonymous posts, but you may post anonymously (to your classmates, not the instructors).
  4. Do not post long blocks of code on Piazza - if you can distill the problem to 1-2 lines of code and an error message, that’s fine, but try to avoid giving out key components of your work.
  5. By the same token, when answering a question, try to give some guiding help but do not post code fixes or explicit solutions to the problem.
  6. Posting on Piazza counts toward your participation grade, both asking and answering!

Academic Accommodations

For details about the accommodations process, visit the Access and Disability Services website.

Haverford College is committed to providing equal access to students with a disability. If you have (or think you have) a learning difference or disability – including mental health, medical, or physical impairment - please contact the Office of Access and Disability Services (ADS) at hc-ads@haverford.edu. The Coordinator will confidentially discuss the process to establish reasonable accommodations.

Students who have already been approved to receive academic accommodations and want to use their accommodations in this course should share their verification letter with me and also make arrangements to meet with me as soon as possible to discuss their specific accommodations. Please note that accommodations are not retroactive and require advance notice to implement.

It is a state law in Pennsylvania that individuals must be given advance notice if they are to be recorded. Therefore, any student who has a disability-related need to audio record this class must first be approved for this accommodation from the Coordinator of Access and Disability Services and then must speak with me. Other class members will need to be aware that this class may be recorded.


Machine Learning notation (we will follow this loosely)
Official Python style guide
Python 3 Documentation
Atom editor