CSC 334: Topics in Computational Biology

Homework 4: Phylogenetic Tree Building

Due: Friday, Oct. 16, 11:59pm on Moodle

The goal of this assignment is to implement an algorithm for creating phylogenetic trees (UPGMA in this case), and use it on a real-world example. In this assignment we will also be using/learning numpy. There are download instructions and tutorials on this site: numpy.

  1. Download the current version of Velvet from hw4.py and make sure you can run
    python hw4.py 
    Nothing should happen if you have numpy installed correctly. In this file is a skeleton for UPGMA. The first step is to implement
    def min_dist(distance): 
    which should return a tuple (i,j) of the cell with the minimum distance (i.e. the clusters to merge first).
  2. The next step is to implement
    def merge_clusters(clusters, distance): 
    There is a step-by-step outline in the code, but you can do this however you like. The output should be a set of clusters that is one less than the input clusters, and a square distance matrix that had one less row and column.
  3. The last step is to iteratively call merge_clusters in a loop, until there are no more clusters to merge. At each interation of the algorithm, print out the clusters and the distance matrix.
  4. Finally, for the three examples in the code (two we have done in class, so you can check your work), use the output of your program to draw out a tree for each example, labeling all the branch lengths. You can turn this part in on paper, or scan it, or draw out the tree in a different program like power point or ms paint. Submit your code on Moodle.
EXTRA CREDIT 1: Write another function to return the distance matrix that is created by UPGMA. In which cases is this matrix similar to the input matrix? In which cases is this matrix very different?

EXTRA CREDIT 2: Implement Neighbor-Joining in a similar framework (i.e. using numpy arrays for the distance matrices and Si,j tables). Apply NJ to the same three examples, draw out the trees, and compare the results.