How many taxa must be sampled to identify the root node of a large clade?

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

The importance of choice of taxa in phylogenetic analysis has been explored mainly with reference to its effect on the accuracy of tree estimation. Taxon sampling can also introduce other kinds of errors. Even if the sampled topology agrees with the true topology, it may not include the true root node of a clade, a node that is of interest for many reasons. Using a simple Yule model for the diversification process, the probability of identifying this node is derived under random sampling of taxa. For large clades, the minimum sample size needed to be 95% confident of identifying the root node is approximately 40 and is independent of the size of the clade. If rates of diversification differ in the two sister groups descended from the root node, the minimum sample size needed increases markedly. If these two sister groups are so different in diversity that a Yule model would be rejected by conventional diversification tests, then the necessary sample size is an order of magnitude greater than when diversification is homogeneous.

Original languageEnglish (US)
Pages (from-to)168-173
Number of pages6
JournalSystematic Biology
Volume45
Issue number2
StatePublished - Jun 1996
Externally publishedYes

Fingerprint

Sample Size
topology
sampling
phylogenetics
phylogeny
testing

Keywords

  • Branching
  • Diversification
  • Phylogeny
  • Speciation
  • Taxon sampling
  • Yule model

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics

Cite this

How many taxa must be sampled to identify the root node of a large clade? / Sanderson, Michael.

In: Systematic Biology, Vol. 45, No. 2, 06.1996, p. 168-173.

Research output: Contribution to journalArticle

@article{1187e698b3d84f31bf8b0f7aea0f8b41,
title = "How many taxa must be sampled to identify the root node of a large clade?",
abstract = "The importance of choice of taxa in phylogenetic analysis has been explored mainly with reference to its effect on the accuracy of tree estimation. Taxon sampling can also introduce other kinds of errors. Even if the sampled topology agrees with the true topology, it may not include the true root node of a clade, a node that is of interest for many reasons. Using a simple Yule model for the diversification process, the probability of identifying this node is derived under random sampling of taxa. For large clades, the minimum sample size needed to be 95{\%} confident of identifying the root node is approximately 40 and is independent of the size of the clade. If rates of diversification differ in the two sister groups descended from the root node, the minimum sample size needed increases markedly. If these two sister groups are so different in diversity that a Yule model would be rejected by conventional diversification tests, then the necessary sample size is an order of magnitude greater than when diversification is homogeneous.",
keywords = "Branching, Diversification, Phylogeny, Speciation, Taxon sampling, Yule model",
author = "Michael Sanderson",
year = "1996",
month = "6",
language = "English (US)",
volume = "45",
pages = "168--173",
journal = "Systematic Biology",
issn = "1063-5157",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - How many taxa must be sampled to identify the root node of a large clade?

AU - Sanderson, Michael

PY - 1996/6

Y1 - 1996/6

N2 - The importance of choice of taxa in phylogenetic analysis has been explored mainly with reference to its effect on the accuracy of tree estimation. Taxon sampling can also introduce other kinds of errors. Even if the sampled topology agrees with the true topology, it may not include the true root node of a clade, a node that is of interest for many reasons. Using a simple Yule model for the diversification process, the probability of identifying this node is derived under random sampling of taxa. For large clades, the minimum sample size needed to be 95% confident of identifying the root node is approximately 40 and is independent of the size of the clade. If rates of diversification differ in the two sister groups descended from the root node, the minimum sample size needed increases markedly. If these two sister groups are so different in diversity that a Yule model would be rejected by conventional diversification tests, then the necessary sample size is an order of magnitude greater than when diversification is homogeneous.

AB - The importance of choice of taxa in phylogenetic analysis has been explored mainly with reference to its effect on the accuracy of tree estimation. Taxon sampling can also introduce other kinds of errors. Even if the sampled topology agrees with the true topology, it may not include the true root node of a clade, a node that is of interest for many reasons. Using a simple Yule model for the diversification process, the probability of identifying this node is derived under random sampling of taxa. For large clades, the minimum sample size needed to be 95% confident of identifying the root node is approximately 40 and is independent of the size of the clade. If rates of diversification differ in the two sister groups descended from the root node, the minimum sample size needed increases markedly. If these two sister groups are so different in diversity that a Yule model would be rejected by conventional diversification tests, then the necessary sample size is an order of magnitude greater than when diversification is homogeneous.

KW - Branching

KW - Diversification

KW - Phylogeny

KW - Speciation

KW - Taxon sampling

KW - Yule model

UR - http://www.scopus.com/inward/record.url?scp=0030310770&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030310770&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0030310770

VL - 45

SP - 168

EP - 173

JO - Systematic Biology

JF - Systematic Biology

SN - 1063-5157

IS - 2

ER -