A novel approach to detecting and measuring recombination

New insights into evolution in viruses, bacteria, and mitochondria

Research output: Contribution to journalArticle

97 Citations (Scopus)

Abstract

An accurate estimate of the extent of recombination is important whenever phylogenetic methods are applied to potentially recombining nucleotide sequences. Here, data sets from viruses, bacteria, and mitochondria were examined for deviations from clonality using a new approach for detecting and measuring recombination. The apparent rate heterogeneity (ARH) among sites in a sequence alignment can be inflated as an artifact of recombination. However, the composition of polymorphic sites will differ in a data set with recombination-generated ARH versus a clonal data set that exhibits the equivalent degree of rate heterogeneity. This is because recombinant data sets, encompassing regions of conflicting phylogenetic history, tend to yield "starlike" trees that are superficially similar to those inferred from clonal data sets with weak phylogenetic signal throughout. Specifically, a recombinant data set will be unexpectedly rich in conflicting phylogenetic information compared with clonally generated data sets supporting the same tree shape. Its value of q - Defined as the proportion of two-state parsimony-informative sites to all polymorphic sites - Will be greater than that expected for nonrecombinant data. The method proposed here, the informative-sites test, compares the value of q against a null distribution of values found using Monte Carlo-simulated data evolved under the null hypothesis of clonality. A significant excess of q indicates that the assumption of clonality is not valid and hence that the ARH in the data is at least partly an artifact of recombination. Investigations of the procedure using simulated sequences indicated that it can successfully detect and measure recombination and that it is unlikely to produce "false positives." Simulations also showed that for recombinant data, naïve use of maximum-likelihood models incorporating rate heterogeneity can lead to overestimation of the time to the most recent common ancestor. Application of the test to real data revealed for the first time that populations of viruses, like those of bacteria, can be brought close to complete linkage equilibrium by pervasive recombination. On the other hand, the test did not reject the hypothesis of clonality when applied to a data set from the coding region of human mitochondrial DNA, despite its high level of ARH and homoplasy.

Original languageEnglish (US)
Pages (from-to)1425-1434
Number of pages10
JournalMolecular Biology and Evolution
Volume18
Issue number8
StatePublished - 2001
Externally publishedYes

Fingerprint

Mitochondria
mitochondrion
Viruses
Genetic Recombination
recombination
Bacteria
virus
mitochondria
viruses
bacterium
phylogeny
bacteria
Mitochondrial DNA
Maximum likelihood
Nucleotides
phylogenetics
tree yields
testing
sequence alignment
Chemical analysis

Keywords

  • Clonal
  • GB virus C
  • Maximum likelihood
  • Mitochondria
  • Rate heterogeneity
  • Recombination

ASJC Scopus subject areas

  • Genetics
  • Biochemistry
  • Genetics(clinical)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Ecology, Evolution, Behavior and Systematics
  • Agricultural and Biological Sciences (miscellaneous)
  • Molecular Biology

Cite this

@article{452d0a94b6da4488b0fd1beaa515d7f6,
title = "A novel approach to detecting and measuring recombination: New insights into evolution in viruses, bacteria, and mitochondria",
abstract = "An accurate estimate of the extent of recombination is important whenever phylogenetic methods are applied to potentially recombining nucleotide sequences. Here, data sets from viruses, bacteria, and mitochondria were examined for deviations from clonality using a new approach for detecting and measuring recombination. The apparent rate heterogeneity (ARH) among sites in a sequence alignment can be inflated as an artifact of recombination. However, the composition of polymorphic sites will differ in a data set with recombination-generated ARH versus a clonal data set that exhibits the equivalent degree of rate heterogeneity. This is because recombinant data sets, encompassing regions of conflicting phylogenetic history, tend to yield {"}starlike{"} trees that are superficially similar to those inferred from clonal data sets with weak phylogenetic signal throughout. Specifically, a recombinant data set will be unexpectedly rich in conflicting phylogenetic information compared with clonally generated data sets supporting the same tree shape. Its value of q - Defined as the proportion of two-state parsimony-informative sites to all polymorphic sites - Will be greater than that expected for nonrecombinant data. The method proposed here, the informative-sites test, compares the value of q against a null distribution of values found using Monte Carlo-simulated data evolved under the null hypothesis of clonality. A significant excess of q indicates that the assumption of clonality is not valid and hence that the ARH in the data is at least partly an artifact of recombination. Investigations of the procedure using simulated sequences indicated that it can successfully detect and measure recombination and that it is unlikely to produce {"}false positives.{"} Simulations also showed that for recombinant data, na{\"i}ve use of maximum-likelihood models incorporating rate heterogeneity can lead to overestimation of the time to the most recent common ancestor. Application of the test to real data revealed for the first time that populations of viruses, like those of bacteria, can be brought close to complete linkage equilibrium by pervasive recombination. On the other hand, the test did not reject the hypothesis of clonality when applied to a data set from the coding region of human mitochondrial DNA, despite its high level of ARH and homoplasy.",
keywords = "Clonal, GB virus C, Maximum likelihood, Mitochondria, Rate heterogeneity, Recombination",
author = "Michael Worobey",
year = "2001",
language = "English (US)",
volume = "18",
pages = "1425--1434",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "8",

}

TY - JOUR

T1 - A novel approach to detecting and measuring recombination

T2 - New insights into evolution in viruses, bacteria, and mitochondria

AU - Worobey, Michael

PY - 2001

Y1 - 2001

N2 - An accurate estimate of the extent of recombination is important whenever phylogenetic methods are applied to potentially recombining nucleotide sequences. Here, data sets from viruses, bacteria, and mitochondria were examined for deviations from clonality using a new approach for detecting and measuring recombination. The apparent rate heterogeneity (ARH) among sites in a sequence alignment can be inflated as an artifact of recombination. However, the composition of polymorphic sites will differ in a data set with recombination-generated ARH versus a clonal data set that exhibits the equivalent degree of rate heterogeneity. This is because recombinant data sets, encompassing regions of conflicting phylogenetic history, tend to yield "starlike" trees that are superficially similar to those inferred from clonal data sets with weak phylogenetic signal throughout. Specifically, a recombinant data set will be unexpectedly rich in conflicting phylogenetic information compared with clonally generated data sets supporting the same tree shape. Its value of q - Defined as the proportion of two-state parsimony-informative sites to all polymorphic sites - Will be greater than that expected for nonrecombinant data. The method proposed here, the informative-sites test, compares the value of q against a null distribution of values found using Monte Carlo-simulated data evolved under the null hypothesis of clonality. A significant excess of q indicates that the assumption of clonality is not valid and hence that the ARH in the data is at least partly an artifact of recombination. Investigations of the procedure using simulated sequences indicated that it can successfully detect and measure recombination and that it is unlikely to produce "false positives." Simulations also showed that for recombinant data, naïve use of maximum-likelihood models incorporating rate heterogeneity can lead to overestimation of the time to the most recent common ancestor. Application of the test to real data revealed for the first time that populations of viruses, like those of bacteria, can be brought close to complete linkage equilibrium by pervasive recombination. On the other hand, the test did not reject the hypothesis of clonality when applied to a data set from the coding region of human mitochondrial DNA, despite its high level of ARH and homoplasy.

AB - An accurate estimate of the extent of recombination is important whenever phylogenetic methods are applied to potentially recombining nucleotide sequences. Here, data sets from viruses, bacteria, and mitochondria were examined for deviations from clonality using a new approach for detecting and measuring recombination. The apparent rate heterogeneity (ARH) among sites in a sequence alignment can be inflated as an artifact of recombination. However, the composition of polymorphic sites will differ in a data set with recombination-generated ARH versus a clonal data set that exhibits the equivalent degree of rate heterogeneity. This is because recombinant data sets, encompassing regions of conflicting phylogenetic history, tend to yield "starlike" trees that are superficially similar to those inferred from clonal data sets with weak phylogenetic signal throughout. Specifically, a recombinant data set will be unexpectedly rich in conflicting phylogenetic information compared with clonally generated data sets supporting the same tree shape. Its value of q - Defined as the proportion of two-state parsimony-informative sites to all polymorphic sites - Will be greater than that expected for nonrecombinant data. The method proposed here, the informative-sites test, compares the value of q against a null distribution of values found using Monte Carlo-simulated data evolved under the null hypothesis of clonality. A significant excess of q indicates that the assumption of clonality is not valid and hence that the ARH in the data is at least partly an artifact of recombination. Investigations of the procedure using simulated sequences indicated that it can successfully detect and measure recombination and that it is unlikely to produce "false positives." Simulations also showed that for recombinant data, naïve use of maximum-likelihood models incorporating rate heterogeneity can lead to overestimation of the time to the most recent common ancestor. Application of the test to real data revealed for the first time that populations of viruses, like those of bacteria, can be brought close to complete linkage equilibrium by pervasive recombination. On the other hand, the test did not reject the hypothesis of clonality when applied to a data set from the coding region of human mitochondrial DNA, despite its high level of ARH and homoplasy.

KW - Clonal

KW - GB virus C

KW - Maximum likelihood

KW - Mitochondria

KW - Rate heterogeneity

KW - Recombination

UR - http://www.scopus.com/inward/record.url?scp=0034897826&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034897826&partnerID=8YFLogxK

M3 - Article

VL - 18

SP - 1425

EP - 1434

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 8

ER -