CMSC H260: Foundations of Data Science

(Fall 2021)

Course Info | Schedule | Grading
Academic Integrity | Piazza | Accommodations | Title IX | Links
CMSC H260 Machine Learning

Course Information

Course: TuTh 10–11:30am
Professor: Sara Mathieson
Office: KINSC L302
Office hours: Tuesdays 3:30-5pm in H204
TAs: Trang Dang, Yuxuan Sun, Nasanbayar Ulzii-Orshikh
TA hours: TBD

The prerequisites for this course are Calculus I, Data Structures, and Discrete Mathematics (this last is a co-req).

This course will introduce core principles of learning from data. More and more decisions are being made by algorithms that operate on large datasets, and this course will give students the tools to understand and contribute to this process. Throughout we will emphasize the ethical use of data and analyze case studies of how data science has intersected with society. This course will have a significant theory component, covering introductory linear algebra, probability, statistics, modeling, information theory, and optimization. However, we will also implement these ideas (in Python) and apply them to concrete datasets from a variety of fields (including images, video, text, DNA, music, art, etc).

The language for this course is Python 3.

Textbook:

You do not need to purchase a textbook for this course. We will draw from several online textbooks, as well as supplemental online readings and research papers.


See the Schedule for each week's reading assignment. The schedule is tentative and subject to change throughout the semester.

Schedule (Tentative)

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS
1

Aug 31

 

Introduction to Data Science and Python

  • What can we learn from data?
  • Representing data
  • Crash course on Python
  • Numpy
  • Matplotlib (plotting in Python)
  • Classes and objects in Python
  • Dictionaries

Reading:

  • MML Chap 1

Tues:

Thurs:

Lab 1: Computing and plotting in Python

Sep 02

 
2

Sep 07

 

Introduction to Modeling

  • What is a model?
  • Linear models
  • Polynomial models
  • Assessing model fit and complexity
  • Using models for prediction

Reading:

  • MML Chap 8.1, 8.2.1-8.2.3

Tues:

Thurs:

Lab 2: Modeling climate change

Sep 09

 
3

Sep 14

 

Applied Linear Algebra and Optimization

  • Matrices and vectors
  • Representing data as a matrix
  • Matrix operations including dot products
  • Analytic solution for linear regression
  • Model fitting as a numerical optimization problem
  • Introduction to gradients
  • Gradient descent
  • Application to linear regression
  • Discussion of optimization in other contexts

Reading:

  • Duame Chap 7.1-7.6
  • (optional) MML Chap 2.1-2.2, 2.5, 2.7.1
  • (optional) MML Chap 9-9.2
  • (optional) MML Chap 7-7.1

Tues:

Thurs:

Lab 3: Gradient descent

Sep 16

Last day to drop (Sep 17)

4

Sep 21

 

Evaluation Metrics

  • Precision and recall
  • Specificity and sensitivity
  • Confusion matrices
  • ROC curves
  • Introduction to probability
  • Bayes rule

Reading:

Tues:

Thurs:

Lab 4: Evaluation Metrics

Sep 23

 
5

Sep 28

 

Ethics: Disparate Impact (+ review)

  • Probability in clinical trials
  • Introduction to algorithmic bias
  • Redundant encoding of protected features
  • Midterm I review

Reading:

Tues:

Thurs:

Midterm 1

Sep 30

 
6

Oct 05

 

Probabilistic modeling I

  • Continue review
  • Begin: Naive Bayes algorithm

Reading:

  • MML Chap 6.1-6.3
  • Duame Chap 9.1-9.4

Tues:

Thurs:

Lab 5: Naive Bayes

Oct 07

Last day to pass/fail (Oct 08)

 

Oct 12

Fall Break

Oct 14

7

Oct 19

 

Probabilistic modeling II

  • Continue Naive Bayes
  • Logistic regression

Reading:

Tues:

Thurs:

Lab 5: (cont)

Oct 21

 
8

Oct 26

 

Information theory

  • Introduction to information theory
  • Entropy
  • Coding theory
  • Discuss applications of entropy in machine learning

Reading:

Tues:

Thurs:

Lab 6: Information Theory

Oct 28

 
9

Nov 02

 

Visualization

  • Principles of visualization data
  • Discrete vs. continuous data
  • Types of graphs (bar chart, scatter plot, heatmap, etc)
  • Visualizing graphs
  • Principal components analysis
  • Interactive visualization basics

Reading:

Tues:

Thurs:

Lab 7: Visualization + Project Proposal

Nov 04

 
10

Nov 09

 

Introduction to statistics

  • Introduction to statistics
  • Hypothesis testing
  • p-values
  • Normal distributions

Reading:

Tues:

Thurs:

Lab 8: Statistics

Nov 11

 
11

Nov 16

 

Midterm II review

  • Review

Tues:

Midterm 2

Nov 18

 
12

Nov 23

 

Unsupervised learning

  • Clustering
  • Dimensionality Reduction

Reading:

Tues:

Nov 25

Thanksgiving (no class)

13

Nov 30

 

Intro to neural networks

  • Neural networks
  • Deep learning
  • Applications

Reading:

Final Project

Dec 02

 
14

Dec 07

 

Project Presentations

  • Final project presentations

Dec 09

 

Grading Policies

Grades will be weighted as follows:
35% Lab assignments
20% Midterm I
20% Midterm II
15% Final Project (including presentation)
10% Participation (including attendance and note-taking)

Quizzes and Exams

In lieu of reading quizzes this semester, we will have short excercises during class (to work on and discuss, not turn in). Be ready to work on these exercises by completing the weekly reading before class on Thursdays.

There will be two midterms (with limited time, but you will have several days to choose a window). In lieu of a final exam, there will be a final project and associated presentation. You must pass at least one exam to pass the course overall.

Labs

Our labs are on Thursdays. Lab assignments will generally be released Tuesday night and due the following Tuesday at midnight. You are expected to read/begin the lab before your lab section on Thursday. Lab attendance is required, and missing labs will quickly affect your participation grade. There will sometimes be pair-programming warm-up exercises as part of the lab, and lab in general is a time to build community around the course and the material. Note that Wednesday is my research day and I will be off campus and unable to answer lab questions. Make use of office hours (both mine and the TAs) and Piazza.

Weekly Lab Sessions
Lab A 12—1pm Thursdays Mathieson H110
Lab B 1am—2pm Thursdays Mathieson H110

Handing in labs: Lab assignments are submitted electronically and managed using git. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Some of the programming/lab assignments may be in pairs. There may also be some written assignments that will have specific instructions for handing in.


Late Policy: Each individual will be given 4 late days for the semester. A late day is a 24 hour extension from the original deadline. You can use up to two late days on any one assignment. This will encompass any reason - illness, interviews, many midterms in the same week, etc. Past these days, late assignments will not be accepted. You should budget your days to account for future illnesses or assignment deadlines for other courses. Even if you do not fully complete a lab assignment you should submit what you have done to receive partial credit. Late days count against both partners in a group lab.

For extensions beyond these 4 late days (in the case of an emergency or ongoing personal issue), please contact your Class Dean. If your Class Dean notifies me of the issues, then we can arrange an accommodation.


Academic Integrity

From the faculty:

In a community that thrives on relationships between students and faculty that are based on trust and respect, it is crucial that students understand a professor's expectations and what it means to do academic work with integrity. Plagiarism and cheating, even if unintentional, undermine the values of the Honor Code and the ability of all students to benefit from the academic freedom and relationships of trust the Code facilitates. Plagiarism is using someone else's work or ideas and presenting them as your own without attribution. Plagiarism can also occur in more subtle forms, such as inadequate paraphrasing, failure to cite another person's idea even if not directly quoted, failure to attribute the synthesis of various sources in a review article to that author, or accidental incorporation of another's words into your own paper as a result of careless note-taking. Cheating is another form of academic dishonesty, and it includes not only copying, but also inappropriate collaboration, exceeding the time allowed, and discussion of the form, content, or degree of difficulty of an exam. Please be conscientious about your work, and check with me if anything is unclear.

Please also note the CS Department Collaboration Policy.

More details for this course:

Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.


Piazza

This semester we'll be using Piazza, an online Q&A forum for class discussion, help with labs, clarifications, and announcements. You will receive an email invitation to join CMSC H260 on Piazza. If you don't, please let me know.

Piazza is meant for questions outside of regular meeting times such as office hours, class, and lab. Please do not hesitate to ask and answer questions on Piazza, but keep in mind the following guidelines:

  1. Piazza should be used for ALL content and logistics questions outside of class, lab, and office hours. Please do not email me your code or extended questions about the assignments.
  2. If there is a personal issue that relates only to you, please email me.
  3. We encourage non-anonymous posts, but you may post anonymously (to your classmates, not the instructors).
  4. Do not post long blocks of code on Piazza - if you can distill the problem to 1-2 lines of code and an error message, that’s fine, but try to avoid giving out key components of your work.
  5. By the same token, when answering a question, try to give some guiding help but do not post code fixes or explicit solutions to the problem.
  6. Posting on Piazza counts toward your participation grade, both asking and answering!

Haverford Academic Accommodations Statement

For details about the accommodations process, visit the Access and Disability Services website.

We are committed to partnering with you on your academic and intellectual journey. We also recognize that your ability to thrive academically can be impacted by your personal well-being and that stressors may impact you over the course of the semester. If the stressors are academic, we welcome the opportunity to discuss and address those stressors with you in order to find solutions together. If you are experiencing challenges or questions related to emotional health, finances, physical health, relationships, learning strategies or differences, or other potential stressors, we hope you will consider reaching out to the many resources available on campus. These resources include CAPS (free and unlimited counseling is available), the Office of Academic Resources, Health Services, Professional Health Advocate, Religious and Spiritual Life, the Office of Multicultural Affairs, the GRASE Center, and the Dean’s Office. Additional information can be found here.

Additionally, Haverford College is committed to creating a learning environment that meets the needs of its diverse student body and providing equal access to students with a disability. If you have (or think you have) a learning difference or disability – including mental health, medical, or physical impairment – please contact the Office of Access and Disability Services (ADS) at hc-ads@haverford.edu. The Director will confidentially discuss the process to establish reasonable accommodations. It is never too late to request accommodations – our bodies and circumstances are continuously changing. Students who have already been approved to receive academic accommodations and want to use their accommodations in this course should share their accommodation letter and make arrangements to meet with me as soon as possible to discuss how their accommodations will be implemented in this course. Please note that accommodations are not retroactive and require advance notice in order to successfully implement.

If, at any point in the semester, a disability or personal circumstances affect your learning in this course or if there are ways in which the overall structure of the course and general classroom interactions could be adapted to facilitate full participation, please do not hesitate to reach out to us.

It is a state law in Pennsylvania that individuals must be given advance notice that they may be recorded. Therefore, any student who has a disability-related need to audio record this class must first be approved for this accommodation from the Director of Access and Disability Services and then must speak to me. Other class members need to be aware that this class may be recorded.

Haverford Title IX Statement

Haverford College is committed to fostering a safe and inclusive living and learning environment where all can feel secure and free from harassment. All forms of sexual misconduct, including sexual assault, sexual harassment, stalking, domestic violence, and dating violence are violations of Haverford’s policies, whether they occur on or off campus. Haverford faculty are committed to helping to create a safe learning environment for all students and for the College community as a whole. If you have experienced any form of gender or sex-based discrimination, harassment, or violence, know that help and support are available. Staff members are trained to support students in navigating campus life, accessing health and counseling services, providing academic and housing accommodations, and more.

The College strongly encourages all students to report any incidents of sexual misconduct. Please be aware that all Haverford employees (other than those designated as confidential resources such as counselors, clergy, and healthcare providers) are required to report information about such discrimination and harassment to the Bi-College Title IX Coordinator.

Information about the College’s Sexual Misconduct policy, reporting options, and a list of campus and local resources can be found on the College’s website here.


Official Python style guide
Python 3 Documentation
Atom editor