CMSC H364: Computational Biology

(Fall 2024)

Course Info | Schedule | Grading
Academic Integrity | Piazza | Accommodations | Title IX | Links
CMSC H364 Computational Biology

Course Information

Course: TuTh 1–2:30pm
Professor: Sara Mathieson
Office: KINSC L302
Office hours: Mondays 4-5pm in H110
TA: Ella Manning

The prerequisite for this course is CS260 Foundations of Data Science. There are no biology prerequisites for this course. The goal of this course is to introduce foundational algorithms that have become essential for learning from biological data. With the genome sequencing revolution of the last 20 years, it has become easier and cheaper to obtain genetic data, but often overwhelming to store, analyze, and make sense of this data. These issues have both driven new algorithm development and repurposed existing algorithms for biology.

We will study both types of algorithms, with a focus on the scientific method. By the end of this course, you should be able to ask a biological question, form a hypothesis about the answer, design a computational experiment to test your hypothesis, implement and execute the experiment, iterate your design and implementation based on the results, and finally interpret and visualize the results to form a biologically relevant conclusion. We will focus on synthetic and publicly available datasets, not generating new data.

The language for this course is Python 3.

Textbook:

book pic We will primarily be using the book Biological Sequence Analysis by Durbin, Eddy, Krogh, and Mitchison.



Goals for the course:


See the Schedule for each week's reading assignment, which will often be supplemented with other material and optional research papers. The schedule is subject to change throughout the semester.

Schedule (Tentative)

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS     
1

Sep 03

 

Introduction to Bioinformatics and Molecular Biology

  • Central Dogma of molecular biology
  • Basics of evolution
  • History of sequencing
  • Example applications and goals of computational biology
  • Begin: string search

Reading:

Tues:

Thurs:

Sep 05

 
2

Sep 10

 

BWT and Read Mapping

  • Burrows-Wheeler transform
  • Application to read mapping
  • Variant calling

Reading:

Tues:

Thurs:

Lab 1: String search
Due Monday Sept 9 at 10pm

Sep 12

 
3

Sep 17

 

Genome Assembly

  • Graph algorithms for genome assembly
  • Overlap graphs
  • De Bruijn graphs

Reading:

Tues:

Thurs:

Lab 2: BWT and read mapping
Due Monday Sept 16 at 10pm

Sep 19

Drop ends (Sep 20)

4

Sep 24

 

Pairwise Sequence Alignment

  • Dynamic programming
  • Global pairwise sequence alignment
  • Local pairwise sequence alignment

Reading:

  • (required) Durbin: 2.1-2.3
  • (optional) MUMmer paper
  • (optional) Durbin: rest of Ch. 2

Tues:

Thurs:

Lab 3: Genome assembly
Due Monday Sept 23 at 10pm

Sep 26

 
5

Oct 01

 

Phylogenetic Trees 1

  • Phylogenetic trees
  • Mutation generates variation
  • UPGMA algorithm

Reading:

  • (required) Durbin: Chap 7 (p. 161-170)

Tues:

Thurs:

Lab 4: Pairwise sequence alignment
Due Tuesday Oct 1 at 11:59pm

Oct 03

 
6

Oct 08

 

Midterm Review

  • Midterm review

Tues:

Thurs:

  • Midterm 1

In-class Midterm 1
Thursday Oct 10

Oct 10

 
 

Oct 15

Fall Break

Oct 17

7

Oct 22

 

Phylogenetic Trees 2

  • Neighbor joining algorithm
  • Bayesian phylogenetic methods

Reading

  • (required) Durbin: Chap 7 (p. 170-173)

Oct 24

 
8

Oct 29

 

Ancestral State Reconstruction

  • Fitch's algorithm
  • Sankoff's algorithm
  • Perfect phylogeny

Reading

Lab 5: Phylogenetic trees
Due Monday Oct 28 at 11:59pm

Oct 31

 
9

Nov 05

 

Population Genetics

  • Wright-Fisher model
  • Measures of sequence diversity
  • The Coalescent
  • Natural selection inference

Reading

Lab 6: Perfect phylogeny
Due Monday Nov 4

Nov 07

 
10

Nov 12

 

Hidden Markov Models 1

  • Markov chains
  • Conditional probability
  • Viterbi algorithm

Reading

  • (required) Durbin: Chap 3 (p. 47-58)

Lab 7: Population genetics
Due Monday Nov 11

Nov 14

 
11

Nov 19

 

Hidden Markov Models 2

  • Forward-backward algorithm
  • EM for HMMs (Baum-Welch algorithm)
  • Applications of HMMs in biology

Reading

  • (required) Durbin: Chap 3 (p. 58-73)

Lab 8: Hidden Markov Models
Due Monday Nov 18

Nov 21

 
12

Nov 26

 

Visualizing Genomes

  • Principal Components Analysis (PCA)
  • Variational Autoencoders (VAE)
  • Applications of PCA and VAE in genomics

In-class midterm 2

Nov 28

Thanksgiving (no class)

13

Dec 03

 

Deep Learning in Genomics

  • Applications of neural networks in genomics
  • Inferring evolutionary parameters
  • Approximate Bayesian Computation (ABC)

Final Project

Dec 05

 
14

Dec 10

 

Ethics and the Genome + Project Presentations

  • Genomic privacy
  • Genome sequencing companies
  • Pre-natal genomic testing
  • Final project presentations

 

Dec 12

 

Grading Policies

Grades will be weighted as follows:
35% Lab assignments
20% Midterm I
20% Midterm II
15% Final Project (including presentation)
10% Participation (including attendance and note-taking)

Quizzes and Exams

In lieu of reading quizzes this semester, we will have short exercises during class (to work on and discuss, not turn in). Be ready to work on these exercises by completing the weekly reading.

There will be two midterms (in-class). In lieu of a final exam, there will be a final project, with an associated presentation and writeup. You must pass at least one exam to pass the course overall.

Labs

Our labs are on Thursdays in H110. The machines in this classroom are equipped with the necessary software for this course. You are welcome to use your own machine, but we will not have the bandwidth to trouble-shoot personal laptop issues. Lab assignments will generally be released Monday night and due the following Monday at 10pm, though there may be some variation. There will be an introduction to the assignment on Thursday during lab. Lab attendance is required, and missing labs will quickly affect your participation grade. There will sometimes be pair-programming warm-up exercises as part of the lab, and lab in general is a time to build community around the course and the material. Note that Fridays I will be doing research off campus and unable to answer lab questions. Make use of office hours and Piazza.

Weekly Lab Sessions
Lab A 10:30—11:30am Thursdays Mathieson H110
Lab B 11:30—12:30pm Thursdays Mathieson H110

Handing in labs: Lab assignments are submitted electronically and managed using GitHub Classroom. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Some of the programming/lab assignments may be in pairs. There may also be some written assignments that will have specific instructions for handing in.


Late Policy: Each individual will be given 2 late days for the semester. A late day is a 24 hour extension from the original deadline. You can use up to one late days on any one assignment. This will encompass any reason - illness, interviews, many midterms in the same week, etc. Past these days, late assignments will not be accepted. You should budget your days to account for future illnesses or assignment deadlines for other courses. Even if you do not fully complete a lab assignment you should submit what you have done to receive partial credit. Late days count against both partners in a group lab.

For extensions beyond these 2 late days (in the case of an emergency or ongoing personal issue), please contact your Class Dean. If your Class Dean notifies me of the issues, then we can arrange an accommodation.


Academic Integrity

From the faculty:

In a community that thrives on relationships between students and faculty that are based on trust and respect, it is crucial that students understand a professor's expectations and what it means to do academic work with integrity. Plagiarism and cheating, even if unintentional, undermine the values of the Honor Code and the ability of all students to benefit from the academic freedom and relationships of trust the Code facilitates. Plagiarism is using someone else's work or ideas and presenting them as your own without attribution. Plagiarism can also occur in more subtle forms, such as inadequate paraphrasing, failure to cite another person's idea even if not directly quoted, failure to attribute the synthesis of various sources in a review article to that author, or accidental incorporation of another's words into your own paper as a result of careless note-taking. Cheating is another form of academic dishonesty, and it includes not only copying, but also inappropriate collaboration, exceeding the time allowed, and discussion of the form, content, or degree of difficulty of an exam. Please be conscientious about your work, and check with me if anything is unclear.

Please also note the CS Department Collaboration Policy.

More details for this course:

Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.

GitHub copilot (or any other software for automatically generating code) *is allowed* for this course, but you must still understand the code you are submitting. You should also include a comment in your code indicating any AI tools you used. We will be talking about how to best make use of these types of tools, and I recommend using them to help complete short code fragments, not generate entire solutions. All submitted code must be thoroughly understood, and exams will include demonstrating that you deeply understand the algorithms we're implementing.


Piazza

This semester we'll be using Piazza, an online Q&A forum for class discussion, help with labs, clarifications, and announcements. You will receive an email invitation to join CMSC H260 on Piazza. If you don't, please let me know.

Piazza is meant for questions outside of regular meeting times such as office hours, class, and lab. Please do not hesitate to ask and answer questions on Piazza, but keep in mind the following guidelines:

  1. Piazza should be used for ALL content and logistics questions outside of class, lab, and office hours. Please do not email me your code or extended questions about the assignments.
  2. If there is a personal issue that relates only to you, please email me.
  3. We encourage non-anonymous posts, but you may post anonymously (to your classmates, not the instructors).
  4. Do not post long blocks of code on Piazza - if you can distill the problem to 1-2 lines of code and an error message, that’s fine, but try to avoid giving out key components of your work.
  5. By the same token, when answering a question, try to give some guiding help but do not post code fixes or explicit solutions to the problem.
  6. Posting on Piazza counts toward your participation grade, both asking and answering!

Haverford Academic Accommodations Statement

For details about the accommodations process, visit the Access and Disability Services website.

We are committed to partnering with you on your academic and intellectual journey. We also recognize that your ability to thrive academically can be impacted by your personal well-being and that stressors may impact you over the course of the semester. If the stressors are academic, we welcome the opportunity to discuss and address those stressors with you in order to find solutions together. If you are experiencing challenges or questions related to emotional health, finances, physical health, relationships, learning strategies or differences, or other potential stressors, we hope you will consider reaching out to the many resources available on campus. These resources include CAPS (free and unlimited counseling is available), the Office of Academic Resources, Health Services, Professional Health Advocate, Religious and Spiritual Life, the Office of Multicultural Affairs, the GRASE Center, and the Dean’s Office. Additional information can be found here.

Additionally, Haverford College is committed to creating a learning environment that meets the needs of its diverse student body and providing equal access to students with a disability. If you have (or think you have) a learning difference or disability – including mental health, medical, or physical impairment – please contact the Office of Access and Disability Services (ADS) at hc-ads@haverford.edu. The Director will confidentially discuss the process to establish reasonable accommodations. It is never too late to request accommodations – our bodies and circumstances are continuously changing. Students who have already been approved to receive academic accommodations and want to use their accommodations in this course should share their accommodation letter and make arrangements to meet with me as soon as possible to discuss how their accommodations will be implemented in this course. Please note that accommodations are not retroactive and require advance notice in order to successfully implement.

If, at any point in the semester, a disability or personal circumstances affect your learning in this course or if there are ways in which the overall structure of the course and general classroom interactions could be adapted to facilitate full participation, please do not hesitate to reach out to us.

It is a state law in Pennsylvania that individuals must be given advance notice that they may be recorded. Therefore, any student who has a disability-related need to audio record this class must first be approved for this accommodation from the Director of Access and Disability Services and then must speak to me. Other class members need to be aware that this class may be recorded.

Haverford Title IX Statement

Haverford College is committed to fostering a safe and inclusive living and learning environment where all can feel secure and free from harassment. All forms of sexual misconduct, including sexual assault, sexual harassment, stalking, domestic violence, and dating violence are violations of Haverford’s policies, whether they occur on or off campus. Haverford faculty are committed to helping to create a safe learning environment for all students and for the College community as a whole. If you have experienced any form of gender or sex-based discrimination, harassment, or violence, know that help and support are available. Staff members are trained to support students in navigating campus life, accessing health and counseling services, providing academic and housing accommodations, and more.

The College strongly encourages all students to report any incidents of sexual misconduct. Please be aware that all Haverford employees (other than those designated as confidential resources such as counselors, clergy, and healthcare providers) are required to report information about such discrimination and harassment to the Bi-College Title IX Coordinator.

Information about the College’s Sexual Misconduct policy, reporting options, and a list of campus and local resources can be found on the College’s website here.


Official Python style guide
Python 3 Documentation
Atom editor