Groves of phylogenetic trees

Cécile Ané, Oliver Eulenstein, Raul Piaggio-Talice, Michael Sanderson

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves - sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.

Original languageEnglish (US)
Pages (from-to)139-167
Number of pages29
JournalAnnals of Combinatorics
Volume13
Issue number2
DOIs
StatePublished - 2009

Fingerprint

Phylogenetic Tree
Phylogeny
Genomics
Overlap
Upper and Lower Bounds
Cover

Keywords

  • Clustering
  • Evolution
  • Supermatrix
  • Supertree
  • Triplets

ASJC Scopus subject areas

  • Discrete Mathematics and Combinatorics

Cite this

Ané, C., Eulenstein, O., Piaggio-Talice, R., & Sanderson, M. (2009). Groves of phylogenetic trees. Annals of Combinatorics, 13(2), 139-167. https://doi.org/10.1007/s00026-009-0017-x

Groves of phylogenetic trees. / Ané, Cécile; Eulenstein, Oliver; Piaggio-Talice, Raul; Sanderson, Michael.

In: Annals of Combinatorics, Vol. 13, No. 2, 2009, p. 139-167.

Research output: Contribution to journalArticle

Ané, C, Eulenstein, O, Piaggio-Talice, R & Sanderson, M 2009, 'Groves of phylogenetic trees', Annals of Combinatorics, vol. 13, no. 2, pp. 139-167. https://doi.org/10.1007/s00026-009-0017-x
Ané, Cécile ; Eulenstein, Oliver ; Piaggio-Talice, Raul ; Sanderson, Michael. / Groves of phylogenetic trees. In: Annals of Combinatorics. 2009 ; Vol. 13, No. 2. pp. 139-167.
@article{45b1fbf847d1455ea5721e12a5157281,
title = "Groves of phylogenetic trees",
abstract = "A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves - sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.",
keywords = "Clustering, Evolution, Supermatrix, Supertree, Triplets",
author = "C{\'e}cile An{\'e} and Oliver Eulenstein and Raul Piaggio-Talice and Michael Sanderson",
year = "2009",
doi = "10.1007/s00026-009-0017-x",
language = "English (US)",
volume = "13",
pages = "139--167",
journal = "Annals of Combinatorics",
issn = "0218-0006",
publisher = "Birkhauser Verlag Basel",
number = "2",

}

TY - JOUR

T1 - Groves of phylogenetic trees

AU - Ané, Cécile

AU - Eulenstein, Oliver

AU - Piaggio-Talice, Raul

AU - Sanderson, Michael

PY - 2009

Y1 - 2009

N2 - A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves - sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.

AB - A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves - sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.

KW - Clustering

KW - Evolution

KW - Supermatrix

KW - Supertree

KW - Triplets

UR - http://www.scopus.com/inward/record.url?scp=70349895336&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349895336&partnerID=8YFLogxK

U2 - 10.1007/s00026-009-0017-x

DO - 10.1007/s00026-009-0017-x

M3 - Article

AN - SCOPUS:70349895336

VL - 13

SP - 139

EP - 167

JO - Annals of Combinatorics

JF - Annals of Combinatorics

SN - 0218-0006

IS - 2

ER -