Combining data sets with different phylogenetic histories

Research output: Contribution to journalArticle

451 Scopus citations

Abstract

The possibility that two data sets may have different underlying phylogenetic histories (such as gene trees that deviate from species trees) has become an important argument against combining data in phylogenetic analysis. However, two data sets sampled for a large number of taxa may differ in only part of their histories. This is a realistic scenario and one in which the relative advantages of combined, separate, and consensus analysis become much less clear. I propose a simple methodology for dealing with this situation that involves (1) partitioning the available data to maximize detection of different histories, (2) performing separate analyses of the data sets, and (3) combining the data but considering questionable or unresolved those parts of the combined tree that are strongly contested in the separate analyses (and which therefore may have different histories) until a majority of unlinked data sets support one resolution over another. In support of this methodology, computer simulations suggest that (1) the accuracy of combined analysis for recovering the true species phylogeny may exceed that of either of two separately analyzed data sets under some conditions, particularly when the mismatch between phylogenetic histories is small and the estimates of the underlying histories are imperfect (few characters, high homoplasy, or both) and (2) combined analysis provides a poor estimate of the species tree in areas of the phylogenies with different histories but gives an improved estimate in regions that share the same history. Thus, when there is a localized mismatch between the histories of two data sets, the separate, consensus, and combined analyses may all give unsatisfactory results in certain parts of the phylogeny. Similarly, approaches that allow data combination only after a global test of heterogeneity will suffer from the potential failings of either separate or combined analysis, depending on the outcome of the test. Excision of conflicting taxa is also problematic, in that doing so may obfuscate the position of conflicting taxa within a larger tree, even when their placement is congruent between data sets. Application of the proposed methodology to molecular and morphological data sets for Sceloporus lizards is discussed.

Original languageEnglish (US)
Pages (from-to)568-581
Number of pages14
JournalSystematic biology
Volume47
Issue number4
DOIs
StatePublished - Dec 1998
Externally publishedYes

Keywords

  • Combined analysis
  • Computer simulation
  • Consensus analysis
  • Phylogenetic accuracy
  • Sceloporus
  • Separate analysis

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Fingerprint Dive into the research topics of 'Combining data sets with different phylogenetic histories'. Together they form a unique fingerprint.

  • Cite this