CSC 334: Topics in Computational Biology

"Algorithms for Genomic Data"

Fall 2015


  • CSC 111: Intro to Computer Science
  • MATH 111: Calculus I
  • CSC 212: Data Structures (preferred but not required)
There is no biology prerequisite. Majors outside CS with some programming experience are encouraged to participate (with instructor's permission).

Course Description and Goals

This course will begin with an overview of how DNA sequence data is generated, assembled, and corrected for use in downstream analysis. We will then focus on what can be learned from such data, with an emphasis on population genetics. At each stage of the genomic pipeline, novel algorithms have been developed to handle the size and special properties of sequencing data. During the first part of this course, we will learn about these methods, with additional time spent on algorithms that are particularly relevant to the undergraduate computer science curriculum. There will be some interactive lecture and some discussion of research papers. Homeworks will be a mix of small programming assignments, pencil-and-paper exercises, and understanding of the biological challenges in recent literature. There will be options to extend homework in a more CS direction or a more biological direction. During the last third of the course, students will explore a topic of their choice, which will include an oral presentation to the class and a written report. In all aspects of the course, there will be a focus on good communication of ideas and questions in an interdisciplinary context.

Assignment Notes

The programming aspects of assignments will generally be in python, but any language is welcome for the final project. Homeworks will be submitted online through Moodle. There is no required textbook for this course, any readings will be made available online.

Software Links

Tentative Topics

  • Biology overview and goals of computational biology
  • Sequencing data and genome assembly
  • Alignment and string matching with dynamic programming
  • Burrows-Wheeler transform (BWT)
  • Phylogenetics and tree building
  • Ancestral inference using Fitch's algorithm
  • Measures of sequence diversity and population analysis
  • PCA in genomics
  • Modeling ancient populations
  • Signatures of natural selection
  • Population growth and decay
  • Genome-wide association studies (GWAS)
  • Ethics and the genome

Collaboration and the Honor Code

Collaboration is encouraged in this course, especially because different backgrounds and skill sets are necessary for an interdisciplinary field like computational biology. Additionally, for this capstone course, one of the goals is to learn where to look for information and how to use available resources. For each assignment, please cite your classmate collaborators, books, and online resources, as per the Smith College honor code:

"Smith College expects all students to be honest and committed to the principles of academic and intellectual integrity in their preparation and submission of course work and examinations. All submitted work of any kind must be the original work of the student who must cite all the sources used in its preparation."


  • Homeworks: 35%
  • Midterm assignment: 20%
  • Project presentation: 20%
  • Project writeup: 15%
  • Participation: 10%

Additional Resources