L. A. S. Johnson review no. 9. Construction and annotation of large phylogenetic trees

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Broad availability of molecular sequence data allows construction of phylogenetic trees with 1000s or even 10 000s of taxa. This paper reviews methodological, technological and empirical issues raised in phylogenetic inference at this scale. Numerous algorithmic and computational challenges have been identified surrounding the core problem of reconstructing large trees accurately from sequence data, but many other obstacles, both upstream and downstream of this step, are less well understood. Before phylogenetic analysis, data must be generated de novo or extracted from existing databases, compiled into blocks of homologous data with controlled properties, aligned, examined for the presence of gene duplications or other kinds of complicating factors, and finally, combined with other evidence via supermatrix or supertree approaches. After phylogenetic analysis, confidence assessments are usually reported, along with other kinds of annotations, such as clade names, or annotations requiring additional inference procedures, such as trait evolution or divergence time estimates. Prospects for partial automation of large-tree construction are also discussed, as well as risks associated with 'outsourcing' phylogenetic inference beyond the systematics community.

Original languageEnglish (US)
Pages (from-to)287-301
Number of pages15
JournalAustralian Systematic Botany
Volume20
Issue number4
DOIs
StatePublished - 2007

Fingerprint

phylogenetics
phylogeny
outsourcing
automation
gene duplication
divergence
taxonomy
gene

ASJC Scopus subject areas

  • Plant Science

Cite this

L. A. S. Johnson review no. 9. Construction and annotation of large phylogenetic trees. / Sanderson, Michael.

In: Australian Systematic Botany, Vol. 20, No. 4, 2007, p. 287-301.

Research output: Contribution to journalArticle

@article{fce3e5971bd84b51bd58580b7ea6e68e,
title = "L. A. S. Johnson review no. 9. Construction and annotation of large phylogenetic trees",
abstract = "Broad availability of molecular sequence data allows construction of phylogenetic trees with 1000s or even 10 000s of taxa. This paper reviews methodological, technological and empirical issues raised in phylogenetic inference at this scale. Numerous algorithmic and computational challenges have been identified surrounding the core problem of reconstructing large trees accurately from sequence data, but many other obstacles, both upstream and downstream of this step, are less well understood. Before phylogenetic analysis, data must be generated de novo or extracted from existing databases, compiled into blocks of homologous data with controlled properties, aligned, examined for the presence of gene duplications or other kinds of complicating factors, and finally, combined with other evidence via supermatrix or supertree approaches. After phylogenetic analysis, confidence assessments are usually reported, along with other kinds of annotations, such as clade names, or annotations requiring additional inference procedures, such as trait evolution or divergence time estimates. Prospects for partial automation of large-tree construction are also discussed, as well as risks associated with 'outsourcing' phylogenetic inference beyond the systematics community.",
author = "Michael Sanderson",
year = "2007",
doi = "10.1071/SB07006",
language = "English (US)",
volume = "20",
pages = "287--301",
journal = "Australian Systematic Botany",
issn = "1030-1887",
publisher = "CSIRO",
number = "4",

}

TY - JOUR

T1 - L. A. S. Johnson review no. 9. Construction and annotation of large phylogenetic trees

AU - Sanderson, Michael

PY - 2007

Y1 - 2007

N2 - Broad availability of molecular sequence data allows construction of phylogenetic trees with 1000s or even 10 000s of taxa. This paper reviews methodological, technological and empirical issues raised in phylogenetic inference at this scale. Numerous algorithmic and computational challenges have been identified surrounding the core problem of reconstructing large trees accurately from sequence data, but many other obstacles, both upstream and downstream of this step, are less well understood. Before phylogenetic analysis, data must be generated de novo or extracted from existing databases, compiled into blocks of homologous data with controlled properties, aligned, examined for the presence of gene duplications or other kinds of complicating factors, and finally, combined with other evidence via supermatrix or supertree approaches. After phylogenetic analysis, confidence assessments are usually reported, along with other kinds of annotations, such as clade names, or annotations requiring additional inference procedures, such as trait evolution or divergence time estimates. Prospects for partial automation of large-tree construction are also discussed, as well as risks associated with 'outsourcing' phylogenetic inference beyond the systematics community.

AB - Broad availability of molecular sequence data allows construction of phylogenetic trees with 1000s or even 10 000s of taxa. This paper reviews methodological, technological and empirical issues raised in phylogenetic inference at this scale. Numerous algorithmic and computational challenges have been identified surrounding the core problem of reconstructing large trees accurately from sequence data, but many other obstacles, both upstream and downstream of this step, are less well understood. Before phylogenetic analysis, data must be generated de novo or extracted from existing databases, compiled into blocks of homologous data with controlled properties, aligned, examined for the presence of gene duplications or other kinds of complicating factors, and finally, combined with other evidence via supermatrix or supertree approaches. After phylogenetic analysis, confidence assessments are usually reported, along with other kinds of annotations, such as clade names, or annotations requiring additional inference procedures, such as trait evolution or divergence time estimates. Prospects for partial automation of large-tree construction are also discussed, as well as risks associated with 'outsourcing' phylogenetic inference beyond the systematics community.

UR - http://www.scopus.com/inward/record.url?scp=34548403121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548403121&partnerID=8YFLogxK

U2 - 10.1071/SB07006

DO - 10.1071/SB07006

M3 - Article

AN - SCOPUS:34548403121

VL - 20

SP - 287

EP - 301

JO - Australian Systematic Botany

JF - Australian Systematic Botany

SN - 1030-1887

IS - 4

ER -