Missing data and the accuracy of Bayesian phylogenetics

John J Wiens, Daniel S. Moen

Research output: Contribution to journalArticle

130 Citations (Scopus)

Abstract

The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters for all taxa (e.g., fossils), missing data might seriously impede efforts to reconstruct a comprehensive phylogeny that includes all species. Fortunately, recent simulations and empirical analyses suggest that missing data cells are not themselves problematic, and that incomplete taxa can be accurately placed as long as the overall number of characters in the analysis is large. However, these studies have so far only been conducted on parsimony, likelihood, and neighbor-joining methods. Although Bayesian phylogenetic methods have become widely used in recent years, the effects of missing data on Bayesian analysis have not been adequately studied. Here, we conduct simulations to test whether Bayesian analyses can accurately place incomplete taxa despite extensive missing data. In agreement with previous studies of other methods, we find that Bayesian analyses can accurately reconstruct the position of highly incomplete taxa (i.e., 95% missing data), as long as the overall number of characters in the analysis is large. These results suggest that highly incomplete taxa can be safely included in many Bayesian phylogenetic analyses.

Original languageEnglish (US)
Pages (from-to)307-314
Number of pages8
JournalJournal of Systematics and Evolution
Volume46
Issue number3
DOIs
StatePublished - May 2008
Externally publishedYes

Fingerprint

phylogenetics
phylogeny
Bayesian theory
methodology
data analysis
fossils
Bayesian analysis
testing
simulation
cells
fossil
method
analysis

Keywords

  • Accuracy
  • Bayesian analysis
  • Missing data
  • Phylogenetic analysis

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Plant Science

Cite this

Missing data and the accuracy of Bayesian phylogenetics. / Wiens, John J; Moen, Daniel S.

In: Journal of Systematics and Evolution, Vol. 46, No. 3, 05.2008, p. 307-314.

Research output: Contribution to journalArticle

@article{624ca6d1b1484f539dfd05cb2e2910e3,
title = "Missing data and the accuracy of Bayesian phylogenetics",
abstract = "The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters for all taxa (e.g., fossils), missing data might seriously impede efforts to reconstruct a comprehensive phylogeny that includes all species. Fortunately, recent simulations and empirical analyses suggest that missing data cells are not themselves problematic, and that incomplete taxa can be accurately placed as long as the overall number of characters in the analysis is large. However, these studies have so far only been conducted on parsimony, likelihood, and neighbor-joining methods. Although Bayesian phylogenetic methods have become widely used in recent years, the effects of missing data on Bayesian analysis have not been adequately studied. Here, we conduct simulations to test whether Bayesian analyses can accurately place incomplete taxa despite extensive missing data. In agreement with previous studies of other methods, we find that Bayesian analyses can accurately reconstruct the position of highly incomplete taxa (i.e., 95{\%} missing data), as long as the overall number of characters in the analysis is large. These results suggest that highly incomplete taxa can be safely included in many Bayesian phylogenetic analyses.",
keywords = "Accuracy, Bayesian analysis, Missing data, Phylogenetic analysis",
author = "Wiens, {John J} and Moen, {Daniel S.}",
year = "2008",
month = "5",
doi = "10.3724/SP.J.1002.2008.08040",
language = "English (US)",
volume = "46",
pages = "307--314",
journal = "Journal of Systematics and Evolution",
issn = "1674-4918",
publisher = "Ke xue chu ban she",
number = "3",

}

TY - JOUR

T1 - Missing data and the accuracy of Bayesian phylogenetics

AU - Wiens, John J

AU - Moen, Daniel S.

PY - 2008/5

Y1 - 2008/5

N2 - The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters for all taxa (e.g., fossils), missing data might seriously impede efforts to reconstruct a comprehensive phylogeny that includes all species. Fortunately, recent simulations and empirical analyses suggest that missing data cells are not themselves problematic, and that incomplete taxa can be accurately placed as long as the overall number of characters in the analysis is large. However, these studies have so far only been conducted on parsimony, likelihood, and neighbor-joining methods. Although Bayesian phylogenetic methods have become widely used in recent years, the effects of missing data on Bayesian analysis have not been adequately studied. Here, we conduct simulations to test whether Bayesian analyses can accurately place incomplete taxa despite extensive missing data. In agreement with previous studies of other methods, we find that Bayesian analyses can accurately reconstruct the position of highly incomplete taxa (i.e., 95% missing data), as long as the overall number of characters in the analysis is large. These results suggest that highly incomplete taxa can be safely included in many Bayesian phylogenetic analyses.

AB - The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters for all taxa (e.g., fossils), missing data might seriously impede efforts to reconstruct a comprehensive phylogeny that includes all species. Fortunately, recent simulations and empirical analyses suggest that missing data cells are not themselves problematic, and that incomplete taxa can be accurately placed as long as the overall number of characters in the analysis is large. However, these studies have so far only been conducted on parsimony, likelihood, and neighbor-joining methods. Although Bayesian phylogenetic methods have become widely used in recent years, the effects of missing data on Bayesian analysis have not been adequately studied. Here, we conduct simulations to test whether Bayesian analyses can accurately place incomplete taxa despite extensive missing data. In agreement with previous studies of other methods, we find that Bayesian analyses can accurately reconstruct the position of highly incomplete taxa (i.e., 95% missing data), as long as the overall number of characters in the analysis is large. These results suggest that highly incomplete taxa can be safely included in many Bayesian phylogenetic analyses.

KW - Accuracy

KW - Bayesian analysis

KW - Missing data

KW - Phylogenetic analysis

UR - http://www.scopus.com/inward/record.url?scp=56849107234&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56849107234&partnerID=8YFLogxK

U2 - 10.3724/SP.J.1002.2008.08040

DO - 10.3724/SP.J.1002.2008.08040

M3 - Article

VL - 46

SP - 307

EP - 314

JO - Journal of Systematics and Evolution

JF - Journal of Systematics and Evolution

SN - 1674-4918

IS - 3

ER -