Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants

Michael Sanderson, M. F. Wojciechowski, J. M. Hu, T. Sher Khan, S. G. Brady

Research output: Contribution to journalArticle

139 Citations (Scopus)

Abstract

Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modem 'anthophyte hypothesis,' which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups.

Original languageEnglish (US)
Pages (from-to)782-797
Number of pages16
JournalMolecular Biology and Evolution
Volume17
Issue number5
StatePublished - May 2000
Externally publishedYes

Fingerprint

Chloroplast Genes
Spermatophytina
chloroplast
Seed
Seeds
chloroplasts
Genes
seed
Gymnosperms
gymnosperm
gene
genes
Random errors
Modems
Coniferophyta
vascular plant
Radiation
angiosperm
codons
vascular plants

Keywords

  • Maximum likelihood
  • Parsimony
  • Statistical consistency

ASJC Scopus subject areas

  • Genetics
  • Biochemistry
  • Genetics(clinical)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Ecology, Evolution, Behavior and Systematics
  • Agricultural and Biological Sciences (miscellaneous)
  • Molecular Biology

Cite this

Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. / Sanderson, Michael; Wojciechowski, M. F.; Hu, J. M.; Khan, T. Sher; Brady, S. G.

In: Molecular Biology and Evolution, Vol. 17, No. 5, 05.2000, p. 782-797.

Research output: Contribution to journalArticle

Sanderson, Michael ; Wojciechowski, M. F. ; Hu, J. M. ; Khan, T. Sher ; Brady, S. G. / Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. In: Molecular Biology and Evolution. 2000 ; Vol. 17, No. 5. pp. 782-797.
@article{cedd82419bdf46c4987c6c1e379210c6,
title = "Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants",
abstract = "Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modem 'anthophyte hypothesis,' which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups.",
keywords = "Maximum likelihood, Parsimony, Statistical consistency",
author = "Michael Sanderson and Wojciechowski, {M. F.} and Hu, {J. M.} and Khan, {T. Sher} and Brady, {S. G.}",
year = "2000",
month = "5",
language = "English (US)",
volume = "17",
pages = "782--797",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants

AU - Sanderson, Michael

AU - Wojciechowski, M. F.

AU - Hu, J. M.

AU - Khan, T. Sher

AU - Brady, S. G.

PY - 2000/5

Y1 - 2000/5

N2 - Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modem 'anthophyte hypothesis,' which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups.

AB - Sequences of two chloroplast photosystem genes, psaA and psbB, together comprising about 3,500 bp, were obtained for all five major groups of extant seed plants and several outgroups among other vascular plants. Strongly supported, but significantly conflicting, phylogenetic signals were obtained in parsimony analyses from partitions of the data into first and second codon positions versus third positions. In the former, both genes agreed on a monophyletic gymnosperms, with Gnetales closely related to certain conifers. In the latter, Gnetales are inferred to be the sister group of all other seed plants, with gymnosperms paraphyletic. None of the data supported the modem 'anthophyte hypothesis,' which places Gnetales as the sister group of flowering plants. A series of simulation studies were undertaken to examine the error rate for parsimony inference. Three kinds of errors were examined: random error, systematic bias (both properties of finite data sets), and statistical inconsistency owing to long-branch attraction (an asymptotic property). Parsimony reconstructions were extremely biased for third-position data for psbB. Regardless of the true underlying tree, a tree in which Gnetales are sister to all other seed plants was likely to be reconstructed for these data. None of the combinations of genes or partitions permits the anthophyte tree to be reconstructed with high probability. Simulations of progressively larger data sets indicate the existence of long-branch attraction (statistical inconsistency) for third-position psbB data if either the anthophyte tree or the gymnosperm tree is correct. This is also true for the anthophyte tree using either psaA third positions or psbB first and second positions. A factor contributing to bias and inconsistency is extremely short branches at the base of the seed plant radiation, coupled with extremely high rates in Gnetales and nonseed plant outgroups.

KW - Maximum likelihood

KW - Parsimony

KW - Statistical consistency

UR - http://www.scopus.com/inward/record.url?scp=0034069504&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034069504&partnerID=8YFLogxK

M3 - Article

C2 - 10779539

AN - SCOPUS:0034069504

VL - 17

SP - 782

EP - 797

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 5

ER -