Prospects for building the tree of life from large sequence databases

Amy C. Driskell, Cécile Ané, J. Gordon Burleig, Michelle M Mcmahon, Brian C. O'Meara, Michael Sanderson

Research output: Contribution to journalArticle

187 Citations (Scopus)

Abstract

We assess the phylogenetic potential of ∼300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two "supermatrices" suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.

Original languageEnglish (US)
Pages (from-to)1172-1174
Number of pages3
JournalScience
Volume306
Issue number5699
DOIs
StatePublished - Nov 12 2004
Externally publishedYes

Fingerprint

Databases
Selection Bias
Nucleic Acid Databases
Proteins
Datasets

ASJC Scopus subject areas

  • General

Cite this

Prospects for building the tree of life from large sequence databases. / Driskell, Amy C.; Ané, Cécile; Burleig, J. Gordon; Mcmahon, Michelle M; O'Meara, Brian C.; Sanderson, Michael.

In: Science, Vol. 306, No. 5699, 12.11.2004, p. 1172-1174.

Research output: Contribution to journalArticle

Driskell, Amy C. ; Ané, Cécile ; Burleig, J. Gordon ; Mcmahon, Michelle M ; O'Meara, Brian C. ; Sanderson, Michael. / Prospects for building the tree of life from large sequence databases. In: Science. 2004 ; Vol. 306, No. 5699. pp. 1172-1174.
@article{cf2dddee9fdb4797a75f8a5e9300cc85,
title = "Prospects for building the tree of life from large sequence databases",
abstract = "We assess the phylogenetic potential of ∼300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two {"}supermatrices{"} suggests that even data sets with as much as 92{\%} missing data can provide insights into broad sections of the tree of life.",
author = "Driskell, {Amy C.} and C{\'e}cile An{\'e} and Burleig, {J. Gordon} and Mcmahon, {Michelle M} and O'Meara, {Brian C.} and Michael Sanderson",
year = "2004",
month = "11",
day = "12",
doi = "10.1126/science.1102036",
language = "English (US)",
volume = "306",
pages = "1172--1174",
journal = "Science",
issn = "0036-8075",
publisher = "American Association for the Advancement of Science",
number = "5699",

}

TY - JOUR

T1 - Prospects for building the tree of life from large sequence databases

AU - Driskell, Amy C.

AU - Ané, Cécile

AU - Burleig, J. Gordon

AU - Mcmahon, Michelle M

AU - O'Meara, Brian C.

AU - Sanderson, Michael

PY - 2004/11/12

Y1 - 2004/11/12

N2 - We assess the phylogenetic potential of ∼300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two "supermatrices" suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.

AB - We assess the phylogenetic potential of ∼300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two "supermatrices" suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.

UR - http://www.scopus.com/inward/record.url?scp=8444222732&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=8444222732&partnerID=8YFLogxK

U2 - 10.1126/science.1102036

DO - 10.1126/science.1102036

M3 - Article

C2 - 15539599

AN - SCOPUS:8444222732

VL - 306

SP - 1172

EP - 1174

JO - Science

JF - Science

SN - 0036-8075

IS - 5699

ER -