Does adding characters with missing data increase or decrease phylogenetic accuracy?

Research output: Contribution to journalArticlepeer-review

225 Scopus citations

Abstract

Missing data are a widely recognized nuisance factor in phylogenetic analyses, and the fear of missing data may deter systematists from including characters that are highly incomplete. In this paper, I used simulations to explore the consequences of inch ding sets of characters that contain missing data. More specifically, I tested whether the benefits of increasing the number of characters outweigh the costs of adding missing data cells to a matrix. The results show that the addition of a set of characters with missing data is generally more likely to increase phylogenetic accuracy than decrease it, but the potential benefits of adding these characters quickly disappear as the proportion of missing data increases. Furthermore, despite the overall trend, adding characters with missing data does decrease accuracy in some cases. In these situations, the missing data entries are not themselves misleading, but their presence may mimic the effects of limited taxon sampling, which can positively mislead. Criteria are discussed for predicting whether adding characters with missing data may increase or decrease accuracy. The results of this study also suggest that accuracy can be increased to a surprising degree by (1) "filling the holes" in a data matrix as much as possible (even when relatively few taxa are missing data), and (2) adding fewer characters scored for all taxa rather than adding a larger number of characters known for fewer taxa. Missing data can also be eliminated from an analysis through the exclusion of incomplete taxa rather than incomplete characters, but this approach may reduce the usefulness of the analysis and (in some cases) the accuracy of the estimated trees.

Original languageEnglish (US)
Pages (from-to)625-640
Number of pages16
JournalSystematic biology
Volume47
Issue number4
DOIs
StatePublished - Dec 1998
Externally publishedYes

Keywords

  • Accuracy
  • Missing data
  • Parsimony
  • Simulations

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Fingerprint

Dive into the research topics of 'Does adding characters with missing data increase or decrease phylogenetic accuracy?'. Together they form a unique fingerprint.

Cite this