Constructing and counting phylogenetic invariants

Constructing and counting phylogenetic invariants

Report Number
519
Authors
Ben Hansen and Jim Pitman
Citation
Electronic Journal of Probability</em>, Vol. 5 (2000) Paper no. 2, pages 1-18
Abstract

The method of invariants is an approach to the problem of reconstructing the phylogenetic tree of a collection of $m$ taxa using nucleotide sequence data. Models for the respective probabilities of the $4^m$ possible vectors of bases at a given site will have unknown parameters that describe the random mechanism by which substitution occurs along the branches of a putative phylogenetic tree. An invariant is a polynomial in these probabilities that, for a given phylogeny, is zero for all choices of the substitution mechanism parameters. If the invariant is typically non--zero for another phylogenetic tree, then estimates of the invariant can be used as evidence to support one phylogeny over another.

Previous work of Evans and Speed showed that, for certain commonly used substitution models, the problem of finding a minimal generating set for the ideal of invariants can be reduced to the linear algebra problem of finding a basis for a certain lattice (that is, a free $\bZ$-module). They also conjectured that the cardinality of such a generating set can be computed using a simple ``degrees of freedom'' formula. We verify this conjecture. Along the way, we explain in detail how the observations of Evans and Speed lead to a simple, computationally feasible algorithm for constructing a minimal generating set.

PDF File
Postscript File