CS68: Bioinformatics

(Spring 2018)

Course Info | Schedule | Grading
Academic Integrity | Piazza | Links
CS68 Bioinformatics

Course Information

Course: MWF 10:30–11:20, Science Center 181
Professor: Sara Mathieson
Office: Science Center 260
Office hours: Monday 3-5pm and Wednesday 1-3pm
Piazza: CS68 Q&A forum

The prerequisite for this course is CS35. There are no biology prerequisites for this course. The goal of this course is to introduce foundational algorithms that have become essential for learning from biological data. With the genome sequencing revolution of the last 20 years, it has become easier and cheaper to obtain genetic data, but often overwhelming to store, analyze, and make sense of this data. These issues have both driven new algorithm development and repurposed existing algorithms for biology.

We will study both types of algorithms, with a focus on the scientific method. By the end of this course, you should be able to ask a biological question, form a hypothesis about the answer, design a computational experiment to test your hypothesis, implement and execute the experiment, iterate your design and implementation based on the results, and finally interpret the results to form a biologically relevant conclusion. We will focus on synthetic and publicly available datasets, not generating new data.

The language for this course is Python 3.

Textbook:

book pic We will primarily be using the book Biological Sequence Analysis by Durbin, Eddy, Krogh, and Mitchison.


See the Schedule for each week's reading assignment, which will often be supplemented with other material and optional research papers.

Goals for the course:


Schedule (Tentative)

WEEK DAY ANNOUNCEMENTS TOPIC & READING LABS     
1

Jan 22

 

Introduction to Bioinformatics and Molecular Biology

  • Central Dogma of molecular biology
  • Basics of evolution
  • History of sequencing
  • Example applications and goals of computational biology

Reading:

Mon:

Wed:

Fri:

Lab 1: Working with sequences

Jan 24

 

Jan 26

 
2

Jan 29

 

Genome Assembly

  • Graph algorithms for genome assembly

Reading:

Mon:

Wed:

Fri:

Lab 2: Genome assembly

Jan 31

 

Feb 02

Drop/add ends

3

Feb 05

 

Pairwise Sequence Alignment

  • Dynamic programming
  • Global pairwise sequence alignment
  • Local pairwise sequence alignment

Reading:

  • (required) Durbin: 2.1-2.3
  • (optional) MUMmer paper
  • (optional) Durbin: rest of Ch. 2

Mon:

Wed:

Fri:

Lab 3: Pairwise sequence alignment

Feb 07

 

Feb 09

 
4

Feb 12

 

BWT and Read Mapping

  • Burrows-Wheeler transform
  • Application to read mapping
  • Variant calling

Reading:

Mon:

Wed:

Fri:

Lab 4: BWT and read mapping

Feb 14

 

Feb 16

 
5

Feb 19

 

Phylogenetic Trees

  • Phylogenetic trees
  • Mutation generates variation
  • UPGMA

Reading:

  • (required) Durbin: Chap 7 (p. 161-170)

Mon:

Wed:

Fri:

  • No class: at conference

In-lab practice midterm

Feb 21

 

Feb 23

 
6

Feb 26

 

Phylogenetic Trees

  • Neighbor joining

Reading

  • (required) Durbin: Chap 7 (p. 170-173)

Mon:

Wed:

Fri:

In-lab Midterm 1

Feb 28

 

Mar 02

 
7

Mar 05

 

Ancestral state reconstruction

  • Fitch's algorithm
  • Sankoff's algorithm
  • Perfect phylogeny

Reading

Mon:

Thurs:

Fri:

Lab 5: Phylogenetic Trees

Mar 07

 

Mar 09

 
 

Mar 12

Spring Break

Mar 14

Mar 16

8

Mar 19

 

Population Genetics

  • Wright-Fisher model
  • Measures of sequence diversity
  • The Coalescent
  • Tajima's D

Reading

Mon:

Wed:

Fri:

Lab 6: Perfect Phylogeny

Mar 21

 

Mar 23

 
9

Mar 26

 

Hidden Markov Models 1

  • Markov chains
  • Conditional probability
  • Viterbi algorithm

Reading

  • (required) Durbin: Chap 3 (p. 47-58)

Mon:

Wed:

Fri:

Lab 7: Population Genetics

Mar 28

 

Mar 30

CR/NC/W Deadline

10

Apr 02

 

Hidden Markov Models 2

  • Forward-backward algorithm
  • EM for HMMs (Baum-Welch algorithm)
  • Applications of HMMs in biology

Reading

  • (required) Durbin: Chap 3 (p. 58-73)

Mon:

Wed:

Fri:

Lab 8: Hidden Markov Models
In-lab notes

Apr 04

 

Apr 06

 
11

Apr 09

 

Principal Components Analysis

  • Human evolution overview
  • PCA method details
  • Application of PCA to human data

Reading:

Mon:

Wed:

Fri:

Lab 9: PCA
In-lab notes

Apr 11

 

Apr 13

 
12

Apr 16

 

Midterm Review

  • Midterm 2 review

Mon:

Wed:

Fri:

Project: Proposal

Apr 18

 

Apr 20

 
13

Apr 23

 

Special topics: GWAS and Deep Learning

  • Disease association studies
  • Applications of neural networks in genomics
  • Inferring evolutionary parameters
  • Approximate Bayesian Computation (ABC)

Mon:

Wed:

Fri:

In-lab Midterm 2

Apr 25

 

Apr 27

 
14

Apr 30

 

Special topic: Ethics and the Genome

  • Prenatal genomic testing
  • Genomic privacy
  • Genome sequencing companies

Reading:

Mon:

Wed:

Fri:

Project: Presentation

May 02

 

May 04

 

Grading Policies

Grades will be weighted as follows:
35%Lab assignments
40%In-class midterms (20% each)
15%Final Project
10%Participation

Exams

There will be two midterms, given in lab, as shown on the Schedule. Let me know as soon as possible if you have a conflict with one of the exams.

We will not have a final exam, but we will be using the final exam slot for project presentations. The final exam slot will be released later in the semester.

Lab Policy

Our labs are on Thursdays, and lab assignments will be due the following Wednesday at midnight. Lab attendance is required, and missing labs will quickly affect your participation grade.

Weekly Lab Sessions
CS68 A 1:05—2:35pm Thursdays Mathieson Clothier 016
CS68 B 2:45—4:15pm Thursdays Mathieson Clothier 016

Handing in labs: Lab assignments are submitted electronically and managed using git. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Most of the programming/lab assignments will be in pairs. There may also be some written assignments that will have specific instructions for handing in.


Late Policy: Each individual will be given 2 late days for the semester, as per the CS department policy. A late day is a 24 hour extension from the original deadline. You can use one day on two assignments or both days on one assignment. This will encompass any reason - illness, interviews, many midterms in the same week, etc. Past these days, late assignments will not be accepted. You should budget your days to account for future illnesses or assignment deadlines for other courses. Even if you do not fully complete a lab assignment you should submit what you have done to receive partial credit. Late days count against both partners in a group lab.

For extensions beyond these 2 late days (in the case of an emergency or ongoing personal issue), please contact your Class Dean. If your Class Nean notifies me of the issues, then we can arrange an accommodation.


Academic Integrity

Academic honesty is required in all your work. Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.

Failure to abide by these rules constitutes academic dishonesty and will lead to a hearing of the College Judiciary Committee. According to the Faculty Handbook: "Because plagiarism is considered to be so serious a transgression, it is the opinion of the faculty that for the first offense, failure in the course and, as appropriate, suspension for a semester or deprivation of the degree in that year is suitable; for a second offense, the penalty should normally be expulsion."

The spirit of this policy applies to all course work, including code, homework solutions (e.g., proofs, analysis, written reports), and exams. Please contact me if you have any questions about what is permissible in this course.


Piazza

This semester we’ll be using Piazza, an online Q&A forum for class discussion, help with labs, clarifications, and announcements. You should have received an email invitation to join CS68 on Piazza. If you didn't, please let me know.

Piazza is meant for questions outside of regular meeting times such as office hours, class, and lab. Please do not hesitate to ask and answer questions on Piazza, but keep in mind the following guidelines:

  1. Piazza should be used for ALL content and logistics questions outside of class, lab, and office hours. Please do not email me your code or questions about the assignments.
  2. If there is a personal issue that relates only to you, please email me.
  3. We encourage non-anonymous posts, but you may post anonymously (to your classmates, not the instructors).
  4. Do NOT post long blocks of code on Piazza - if you can distill the problem to 1-2 lines of code and an error message, that’s fine, but try to avoid giving out key components of your work.
  5. By the same token, when answering a question, try to give some guiding help but do not post code fixes or explicit solutions to the problem.
  6. Posting on Piazza counts toward your participation grade, both asking and answering!

Academic Accommodations

If you believe that you need accommodations for a disability, please contact the Office of Student Disability Services (Parrish 113W) or email studentdisabilityservices at swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, the Office will issue students with documented disabilities a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact the Office as soon as possible. For details about the accommodations process, visit the Student Disability Service website.

To receive an accommodation for a course activity, you must have an Accommodation Authorization letter from the Office of Student Disability Services and you need to meet with me to work out the details of your accommodation at least one week prior to the activity.

You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged through the Office of Student Disability Services.


Python style guide From Prof. Tia Newhall
Official Python style guide
Python 3.5 Documentation
Atom editor
Remote access with atom