CMSC H360: Machine Learning

(Spring 2024)

Course Info | Schedule | Grading
Academic Integrity | Piazza | Accommodations | Title IX | Links
CMSC H360 Machine Learning

Course Information

Course: TuTh 11:30am–1pm, H108
Professors: Sara Mathieson, Sorelle Friedler
Office: KINSC L302
Office hours: Mondays 4-5pm (Mathieson), Thursdays 4-5pm (Friedler)
TAs: Trinity Kleckner, Ben Menko, Grace Proebsting
TA hours: TBD
Peer tutors: Wahub Ahmed, Seun Eisape

The prerequisite for this course is CS260.

Machine Learning as a field has grown considerably over the past few decades. In this course, we will explore both classical and modern approaches, with an emphasis on theoretical understanding. There will be a significant math component (statistics and probability in particular), as well as a substantial implementation component (as opposed to using high-level libraries). However, during the last part of the course we will use a few modern libraries such as TensorFlow and Keras. By the end of this course, you should be able to form a hypothesis about a dataset of interest, use a variety of methods and approaches to test your hypothesis, and be able to interpret the results to form a meaningful conclusion. We will focus on real-world, publicly available datasets, not generating new data.

The language for this course is Python 3.

Textbook:

There is one required textbook for this course: We will also draw from several online textbooks, as well as supplemental online readings and research papers.


See the Schedule for each week's reading assignment. The schedule is tentative and subject to change throughout the semester.

Schedule (Tentative)

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS
1

Jan 23

 

Review of CS260

  • Classification vs. regression
  • Evaluation metrics
  • Custom data structures and OOP
  • sklearn review

Reading:

  • Geron Chap 3 through pg 119 (binary classification and evaluation metrics)
  • Geron Chap 4 through pg 151 (linear regression and gradient descent)
  • Geron Chap 4 pg 164-169 (logistic regression)

Tues:

Thurs:

Jan 25

 
2

Jan 30

 

Nearest Neighbors and KD Trees

  • K-nearest neighbors
  • Finish KNN with KD trees
  • Evaluation beyond 260: AUC, precision-recall curves

Reading:

Tues:

Thurs:

Lab 1: Classification Review
Due Monday Jan 29

Feb 01

 
3

Feb 06

 

Evaluation, Error, and Data

  • Machine learning pipeline wholistically
  • Source of data, documentation, training vs. testing data
  • Different sources of error in a pipeline
  • Model cards
  • Begin decision trees
  • Review entropy and information gain
  • Recursive implementation of decision trees

Reading:

Tues:

Thurs:

Lab 2: KD-trees
Due Thursday Feb 8

Feb 08

Last day to drop (Feb 09)

4

Feb 13

 

Ensemble Learning

  • Finish decision trees
  • Bagging
  • Random forests
  • AdaBoost
  • Gradient Boosting

Reading:

Tues:

Thurs:

Lab 3: Decision Trees
Due Thursday Feb 15

Feb 15

 
5

Feb 20

 

Advanced regression

  • Review: logistic regression
  • Review: gradient descent
  • Softmax regression
  • Regularization
  • Fairness regularization

Reading:

Tues:

Thurs:

Lab 4: Ensemble Methods
Due Thursday Feb 22

Feb 22

 
6

Feb 27

 

Midterm 1 review

  • Midterm 1 review

Tues:

Thurs:

Lab 5: Advanced regression
Due Thursday Feb 29

Feb 29

 
7

Mar 05

 

Fairness and Ethics

  • Fairness in ML
  • Ethics in ML
  • Explainability

Reading:

Thurs:

Midterm 1 (March 5 in-class)

Mar 07

 
 

Mar 12

Spring Break

Mar 14

8

Mar 19

 

Support Vector Machines

  • Perceptron
  • SVMs
  • Kernels
  • VC-dimension

Reading:

Tues:

Thurs:

Mar 21

 
9

Mar 26

 

Neural Nets 1

  • Introduction to neural networks
  • Backpropagation
  • Fully connected architectures
  • CNNs

Reading:

Tues:

Thurs:

Lab 6: Support Vector Machines
Due Monday March 25

Mar 28

 
10

Apr 02

 

Neural Nets 2

  • GANs
  • RNNs
  • Transfer learning
  • Diffusion

Reading:

  • Geron Chap 14 (Deep Computer Vision Using Convolutional Neural Networks)

Tues:

Thurs:

Lab 7: Neural Networks
Due Thursday April 4

Apr 04

 
11

Apr 09

 

Transformers

  • Overview of ML in NLP
  • Attention
  • Transfomers

Reading:

  • Geron Chap 16 (Natural Language Processing with RNNs and Attention)

Tues:

Thurs:

Project Proposal
Due Monday April 8

Apr 11

 
12

Apr 16

 

Unsupervised Learning 1

  • Gaussian Mixture Models, EM
  • Dimensionality Reduction (t-SNE, VAE)
  • Midterm 2 review

Reading:

  • Geron Chap 9 (Unsupervised Learning Techniques)

Tues:

Thurs:

Lab 8: Transformers and NLP
Due Thursday April 18

Apr 18

 
13

Apr 23

 

Unsupervised Learning 2

  • K-means, K-median, K-center
  • Hierarchical clustering
  • Clustering visualization techniques

Reading:

  • Geron Chap 9 (Unsupervised Learning Techniques)

Tues:

Midterm 2 (April 25 in-class)

Apr 25

 
14

Apr 30

 

Project Presentations

  • Final project presentations

Final Project

May 02

Last day to pass/fail (May 03)


Grading Policies

Grades will be weighted as follows:
35% Lab assignments
20% Midterm I
20% Midterm II
15% Final Project (including presentation)
10% Participation (including attendance)

Quizzes and Exams

In lieu of reading quizzes this semester, we will have short exercises during class (to work on and discuss, not turn in). Be ready to work on these exercises by completing the weekly reading before class on Thursdays.

There will be two midterms (in-class). In lieu of a final exam, there will be a final project, with an associated presentation and writeup. You must pass at least one exam to pass the course overall.

Labs

Our labs are on Tuesdays in H204. The machines in this classroom are equipped with the necessary software for this course. You are welcome to use your own machine, but we will not have the bandwidth to trouble-shoot personal laptop issues. Lab assignments will generally be released Thursday night and due the following Thursday at midnight. There will be an introduction to the assignment on Tuesday during lab. Lab attendance is required, and missing labs will quickly affect your participation grade. There will sometimes be pair-programming warm-up exercises as part of the lab, and lab in general is a time to build community around the course and the material. Note that Fridays I will be doing research off campus and unable to answer lab questions. Make use of office hours (both mine and the TAs) and Piazza.

Weekly Lab Sessions
Lab A 1:30—2:30pm Tuesdays Friedler H110
Lab B 2:30—3:30pm Tuesdays Friedler H110

Handing in labs: Lab assignments are submitted electronically and managed using github classroom. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Some of the programming/lab assignments may be in pairs. There may also be some written assignments that will have specific instructions for handing in.


Late Policy: Each individual will be given 2 late days for the semester. A late day is a 24 hour extension from the original deadline. You can use up to one late days on any one assignment. This will encompass any reason - illness, interviews, many midterms in the same week, etc. Past these days, late assignments will not be accepted. You should budget your days to account for future illnesses or assignment deadlines for other courses. Even if you do not fully complete a lab assignment you should submit what you have done to receive partial credit. Late days count against both partners in a group lab.

For extensions beyond these 4 late days (in the case of an emergency or ongoing personal issue), please contact your Class Dean. If your Class Dean notifies me of the issues, then we can arrange an accommodation.


Academic Integrity

From the faculty:

In a community that thrives on relationships between students and faculty that are based on trust and respect, it is crucial that students understand a professor's expectations and what it means to do academic work with integrity. Plagiarism and cheating, even if unintentional, undermine the values of the Honor Code and the ability of all students to benefit from the academic freedom and relationships of trust the Code facilitates. Plagiarism is using someone else's work or ideas and presenting them as your own without attribution. Plagiarism can also occur in more subtle forms, such as inadequate paraphrasing, failure to cite another person's idea even if not directly quoted, failure to attribute the synthesis of various sources in a review article to that author, or accidental incorporation of another's words into your own paper as a result of careless note-taking. Cheating is another form of academic dishonesty, and it includes not only copying, but also inappropriate collaboration, exceeding the time allowed, and discussion of the form, content, or degree of difficulty of an exam. Please be conscientious about your work, and check with me if anything is unclear.

Please also note the CS Department Collaboration Policy.

More details for this course:

Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.

Github copilot (or any other software for automaticallly generating code) is not allowed for this course, until the final project. The reasoning behind this decision is that code generation tools often create code that is not well understood by the user. Often this code becomes incorrect in the larger context of the program. However, for the final project you are welcome to use Github copilot, and you'll be asked to reflect on your experience.


Piazza

This semester we'll be using Piazza, an online Q&A forum for class discussion, help with labs, clarifications, and announcements. You will receive an email invitation to join CMSC H260 on Piazza. If you don't, please let me know.

Piazza is meant for questions outside of regular meeting times such as office hours, class, and lab. Please do not hesitate to ask and answer questions on Piazza, but keep in mind the following guidelines:

  1. Piazza should be used for ALL content and logistics questions outside of class, lab, and office hours. Please do not email me your code or extended questions about the assignments.
  2. If there is a personal issue that relates only to you, please email me.
  3. We encourage non-anonymous posts, but you may post anonymously (to your classmates, not the instructors).
  4. Do not post long blocks of code on Piazza - if you can distill the problem to 1-2 lines of code and an error message, that’s fine, but try to avoid giving out key components of your work.
  5. By the same token, when answering a question, try to give some guiding help but do not post code fixes or explicit solutions to the problem.
  6. Posting on Piazza counts toward your participation grade, both asking and answering!

Haverford Academic Accommodations Statement

For details about the accommodations process, visit the Access and Disability Services website.

We are committed to partnering with you on your academic and intellectual journey. We also recognize that your ability to thrive academically can be impacted by your personal well-being and that stressors may impact you over the course of the semester. If the stressors are academic, we welcome the opportunity to discuss and address those stressors with you in order to find solutions together. If you are experiencing challenges or questions related to emotional health, finances, physical health, relationships, learning strategies or differences, or other potential stressors, we hope you will consider reaching out to the many resources available on campus. These resources include CAPS (free and unlimited counseling is available), the Office of Academic Resources, Health Services, Professional Health Advocate, Religious and Spiritual Life, the Office of Multicultural Affairs, the GRASE Center, and the Dean’s Office. Additional information can be found here.

Additionally, Haverford College is committed to creating a learning environment that meets the needs of its diverse student body and providing equal access to students with a disability. If you have (or think you have) a learning difference or disability – including mental health, medical, or physical impairment – please contact the Office of Access and Disability Services (ADS) at hc-ads@haverford.edu. The Director will confidentially discuss the process to establish reasonable accommodations. It is never too late to request accommodations – our bodies and circumstances are continuously changing. Students who have already been approved to receive academic accommodations and want to use their accommodations in this course should share their accommodation letter and make arrangements to meet with me as soon as possible to discuss how their accommodations will be implemented in this course. Please note that accommodations are not retroactive and require advance notice in order to successfully implement.

If, at any point in the semester, a disability or personal circumstances affect your learning in this course or if there are ways in which the overall structure of the course and general classroom interactions could be adapted to facilitate full participation, please do not hesitate to reach out to us.

It is a state law in Pennsylvania that individuals must be given advance notice that they may be recorded. Therefore, any student who has a disability-related need to audio record this class must first be approved for this accommodation from the Director of Access and Disability Services and then must speak to me. Other class members need to be aware that this class may be recorded.

Haverford Title IX Statement

Haverford College is committed to fostering a safe and inclusive living and learning environment where all can feel secure and free from harassment. All forms of sexual misconduct, including sexual assault, sexual harassment, stalking, domestic violence, and dating violence are violations of Haverford’s policies, whether they occur on or off campus. Haverford faculty are committed to helping to create a safe learning environment for all students and for the College community as a whole. If you have experienced any form of gender or sex-based discrimination, harassment, or violence, know that help and support are available. Staff members are trained to support students in navigating campus life, accessing health and counseling services, providing academic and housing accommodations, and more.

The College strongly encourages all students to report any incidents of sexual misconduct. Please be aware that all Haverford employees (other than those designated as confidential resources such as counselors, clergy, and healthcare providers) are required to report information about such discrimination and harassment to the Bi-College Title IX Coordinator.

Information about the College’s Sexual Misconduct policy, reporting options, and a list of campus and local resources can be found on the College’s website here.


Official Python style guide
Python 3 Documentation
Atom editor