CS68: Bioinformatics

(Spring 2018)

Course Info | Schedule | Grading
Academic Integrity | Piazza | Links
CS68 Bioinformatics

Course Information

Course: MWF 10:30–11:20, Science Center 181
Professor: Sara Mathieson
Office: Science Center 260
Office hours: Monday 3-5pm and Wednesday 1-3pm
Piazza: CS68 Q&A forum

The prerequisite for this course is CS35. There are no biology prerequisites for this course. The goal of this course is to introduce foundational algorithms that have become essential for learning from biological data. With the genome sequencing revolution of the last 20 years, it has become easier and cheaper to obtain genetic data, but often overwhelming to store, analyze, and make sense of this data. These issues have both driven new algorithm development and repurposed existing algorithms for biology.

We will study both types of algorithms, with a focus on the scientific method. By the end of this course, you should be able to ask a biological question, form a hypothesis about the answer, design a computational experiment to test your hypothesis, implement and execute the experiment, iterate your design and implementation based on the results, and finally interpret the results to form a biologically relevant conclusion. We will focus on synthetic and publicly available datasets, not generating new data.

The language for this course is Python 3.


book pic We will primarily be using the book Biological Sequence Analysis by Durbin, Eddy, Krogh, and Mitchison.

See the Schedule for each week's reading assignment, which will often be supplemented with other material and optional research papers.

Goals for the course:

Schedule (Tentative)


Jan 22


Introduction to Bioinformatics and Molecular Biology

  • Central Dogma of molecular biology
  • Basics of evolution
  • History of sequencing
  • Example applications and goals of computational biology





Lab 1: Working with sequences

Jan 24


Jan 26


Jan 29


Genome Assembly

  • Graph algorithms for genome assembly





Lab 2: Genome assembly

Jan 31


Feb 02

Drop/add ends


Feb 05


Pairwise Sequence Alignment

  • Dynamic programming
  • Global pairwise sequence alignment
  • Local pairwise sequence alignment


  • (required) Durbin: 2.1-2.3
  • (optional) MUMmer paper
  • (optional) Durbin: rest of Ch. 2




Lab 3: Pairwise sequence alignment

Feb 07


Feb 09


Feb 12


BWT and Read Mapping

  • Burrows-Wheeler transform
  • Application to read mapping
  • Variant calling





Lab 4: BWT and read mapping

Feb 14


Feb 16


Feb 19


Phylogenetic Trees

  • Phylogenetic trees
  • Mutation generates variation


  • (required) Durbin: Chap 7 (p. 161-170)




  • No class: at conference

In-lab practice midterm

Feb 21


Feb 23


Feb 26


Phylogenetic Trees

  • Neighbor joining


  • (required) Durbin: Chap 7 (p. 170-173)




In-lab Midterm 1

Feb 28


Mar 02


Mar 05


Ancestral state reconstruction

  • Fitch's algorithm
  • Sankoff's algorithm
  • Perfect phylogeny





Lab 5: Phylogenetic Trees

Mar 07


Mar 09


Mar 12

Spring Break

Mar 14

Mar 16


Mar 19


Population Genetics

  • Wright-Fisher model
  • Measures of sequence diversity
  • The Coalescent
  • Tajima's D





Lab 6: Perfect Phylogeny

Mar 21


Mar 23


Mar 26


Hidden Markov Models 1

  • Markov chains
  • Conditional probability
  • Viterbi algorithm


  • (required) Durbin: Chap 3 (p. 47-58)




Lab 7: Population Genetics

Mar 28


Mar 30

CR/NC/W Deadline


Apr 02


Hidden Markov Models 2

  • Forward-backward algorithm
  • EM for HMMs (Baum-Welch algorithm)
  • Applications of HMMs in biology


  • (required) Durbin: Chap 3 (p. 58-73)




Lab 8: Hidden Markov Models
In-lab notes

Apr 04


Apr 06


Apr 09


Principal Components Analysis

  • Human evolution overview
  • PCA method details
  • Application of PCA to human data





Lab 9: PCA
In-lab notes

Apr 11


Apr 13


Apr 16


Midterm Review

  • Midterm 2 review




Project: Proposal

Apr 18


Apr 20


Apr 23


Special topics: GWAS and Deep Learning

  • Disease association studies
  • Applications of neural networks in genomics
  • Inferring evolutionary parameters
  • Approximate Bayesian Computation (ABC)




In-lab Midterm 2

Apr 25


Apr 27


Apr 30


Special topic: Ethics and the Genome

  • Prenatal genomic testing
  • Genomic privacy
  • Genome sequencing companies





Project: Presentation

May 02


May 04


Grading Policies

Grades will be weighted as follows:
35%Lab assignments
40%In-class midterms (20% each)
15%Final Project


There will be two midterms, given in lab, as shown on the Schedule. Let me know as soon as possible if you have a conflict with one of the exams.

We will not have a final exam, but we will be using the final exam slot for project presentations. The final exam slot will be released later in the semester.

Lab Policy

Our labs are on Thursdays, and lab assignments will be due the following Wednesday at midnight. Lab attendance is required, and missing labs will quickly affect your participation grade.

Weekly Lab Sessions
CS68 A 1:05—2:35pm Thursdays Mathieson Clothier 016
CS68 B 2:45—4:15pm Thursdays Mathieson Clothier 016

Handing in labs: Lab assignments are submitted electronically and managed using git. You may submit your assignment multiple times, but each submission overwrites the previous one and only the final submission will be graded. Most of the programming/lab assignments will be in pairs. There may also be some written assignments that will have specific instructions for handing in.

Late Policy: Each individual will be given 2 late days for the semester, as per the CS department policy. A late day is a 24 hour extension from the original deadline. You can use one day on two assignments or both days on one assignment. This will encompass any reason - illness, interviews, many midterms in the same week, etc. Past these days, late assignments will not be accepted. You should budget your days to account for future illnesses or assignment deadlines for other courses. Even if you do not fully complete a lab assignment you should submit what you have done to receive partial credit. Late days count against both partners in a group lab.

For extensions beyond these 2 late days (in the case of an emergency or ongoing personal issue), please contact your Class Dean. If your Class Nean notifies me of the issues, then we can arrange an accommodation.

Academic Integrity

Academic honesty is required in all your work. Under no circumstances may you hand in work done with (or by) someone else under your own name. Your code should never be shared with anyone; you may not examine or use code belonging to someone else, nor may you let anyone else look at or make a copy of your code. This includes, but is not limited to, obtaining solutions from students who previously took the course or code that can be found online. You may not share solutions after the due date of the assignment.

Discussing ideas and approaches to problems with others on a general level is fine (in fact, we encourage you to discuss general strategies with each other), but you should never read anyone else's code or let anyone else read your code. All code you submit must be your own with the following permissible exceptions: code distributed in class, code found in the course text book, and code worked on with an assigned partner. In these cases, you should always include detailed comments that indicates on which parts of the assignment you received help, and what your sources were.

Failure to abide by these rules constitutes academic dishonesty and will lead to a hearing of the College Judiciary Committee. According to the Faculty Handbook: "Because plagiarism is considered to be so serious a transgression, it is the opinion of the faculty that for the first offense, failure in the course and, as appropriate, suspension for a semester or deprivation of the degree in that year is suitable; for a second offense, the penalty should normally be expulsion."

The spirit of this policy applies to all course work, including code, homework solutions (e.g., proofs, analysis, written reports), and exams. Please contact me if you have any questions about what is permissible in this course.


This semester we’ll be using Piazza, an online Q&A forum for class discussion, help with labs, clarifications, and announcements. You should have received an email invitation to join CS68 on Piazza. If you didn't, please let me know.

Piazza is meant for questions outside of regular meeting times such as office hours, class, and lab. Please do not hesitate to ask and answer questions on Piazza, but keep in mind the following guidelines:

  1. Piazza should be used for ALL content and logistics questions outside of class, lab, and office hours. Please do not email me your code or questions about the assignments.
  2. If there is a personal issue that relates only to you, please email me.
  3. We encourage non-anonymous posts, but you may post anonymously (to your classmates, not the instructors).
  4. Do NOT post long blocks of code on Piazza - if you can distill the problem to 1-2 lines of code and an error message, that’s fine, but try to avoid giving out key components of your work.
  5. By the same token, when answering a question, try to give some guiding help but do not post code fixes or explicit solutions to the problem.
  6. Posting on Piazza counts toward your participation grade, both asking and answering!

Academic Accommodations

If you believe that you need accommodations for a disability, please contact the Office of Student Disability Services (Parrish 113W) or email studentdisabilityservices at swarthmore.edu to arrange an appointment to discuss your needs. As appropriate, the Office will issue students with documented disabilities a formal Accommodations Letter. Since accommodations require early planning and are not retroactive, please contact the Office as soon as possible. For details about the accommodations process, visit the Student Disability Service website.

To receive an accommodation for a course activity, you must have an Accommodation Authorization letter from the Office of Student Disability Services and you need to meet with me to work out the details of your accommodation at least one week prior to the activity.

You are also welcome to contact me privately to discuss your academic needs. However, all disability-related accommodations must be arranged through the Office of Student Disability Services.

Python style guide From Prof. Tia Newhall
Official Python style guide
Python 3.5 Documentation
Atom editor
Remote access with atom