Groves of phylogenetic trees

Cécile Ané, Oliver Eulenstein, Raul Piaggio-Talice, Michael J. Sanderson

Research output: Contribution to journalArticle

5 Scopus citations

Abstract

A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves - sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.

Original languageEnglish (US)
Pages (from-to)139-167
Number of pages29
JournalAnnals of Combinatorics
Volume13
Issue number2
DOIs
StatePublished - Jun 26 2009

Keywords

  • Clustering
  • Evolution
  • Supermatrix
  • Supertree
  • Triplets

ASJC Scopus subject areas

  • Discrete Mathematics and Combinatorics

Fingerprint Dive into the research topics of 'Groves of phylogenetic trees'. Together they form a unique fingerprint.

  • Cite this