Saturday, 17th December 2011

# Convert distance matrix to phylip format

I wrote a function to create a Eucliean distance matrix of some amino acid substitution matrices and I wanted to find a built-in method find the Spearman's rank of two lists to create a distance matrix that way. I found that BioPython actually has a method that builds distance matrices using various different distance metric, including Euclidean and Spearman's rank:

import Bio.Cluster dm = Bio.Cluster.distancematrix(data, dist="s")

If you change the dist to "e", then it will calculate the Euclidean distance.

I thought there might be a way to output this in phylip format so I could use quicktree, but if there is, I wasn't able to find it. So here's mine:

fout = open(filename, 'w') fout.write('%d\n' % len(names)) for name, row in zip(names, dm): fout.write(name) for value in row: fout.write('\t%s' % value) fout.write('\n')

It assumes you have the distance matrix in the format created by the Bio.Cluster distancematrix function, and have a list of names for the sequences or matrices.

An example output would be:

3 A B 1.2 0.8 C 3.2 1.6 2.0

The first value is the number of sequences in the distance matrix and the following lines are the lower triangle of a distance matrix, not including the diagonal (for which all the values would be 0).

