Assessing the Performance of Ks Plots for Detecting Ancient Whole Genome Duplications

George P. Tiley, Michael S Barker, J. Gordon Burleigh

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Genomic data have provided evidence of previously unknown ancient whole genome duplications (WGDs) and highlighted the role of WGDs in the evolution of many eukaryotic lineages. Ancient WGDs often are detected by examining distributions of synonymous substitutions per site (Ks) within a genome, or "Ks plots." For example, WGDs can be detected from Ks plots by using univariate mixture models to identify peaks in Ks distributions. We performed gene family simulation experiments to evaluate the effects of different Ks estimation methods and mixture models on our ability to detect ancient WGDs from Ks plots. The simulation experiments, which accounted for variation in substitution rates and gene duplication and loss rates across gene families, tested the effects of WGD age and gene retention rates following WGD on inferring WGDs from Ks plots. Our simulations reveal limitations of Ks plot analyses. Strict interpretations of mixture model analyses often overestimate the number of WGD events, and Ks plot analyses typically fail to detect WGDs when ≤10% of the duplicated genes are retained following the WGD. However, WGDs can accurately be characterized over an intermediate range of Ks. The simulation results are supported by empirical analyses of transcriptomic data, which also suggest that biases in gene retention likely affect our ability to detect ancient WGDs. Although our results indicate mixture model results should be interpreted with great caution, using node-averaged Ks estimates and applying more appropriate mixture models can improve the accuracy of detecting WGDs.

Original languageEnglish (US)
Pages (from-to)2882-2898
Number of pages17
JournalGenome Biology and Evolution
Volume10
Issue number11
DOIs
StatePublished - Nov 1 2018

Fingerprint

genome
Genome
gene
Gene Duplication
genes
Genes
simulation
substitution
gene duplication
estimation method
transcriptomics
genomics
experiment

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Cite this

Assessing the Performance of Ks Plots for Detecting Ancient Whole Genome Duplications. / Tiley, George P.; Barker, Michael S; Burleigh, J. Gordon.

In: Genome Biology and Evolution, Vol. 10, No. 11, 01.11.2018, p. 2882-2898.

Research output: Contribution to journalArticle

@article{364767ebbd324720934e22abe5592268,
title = "Assessing the Performance of Ks Plots for Detecting Ancient Whole Genome Duplications",
abstract = "Genomic data have provided evidence of previously unknown ancient whole genome duplications (WGDs) and highlighted the role of WGDs in the evolution of many eukaryotic lineages. Ancient WGDs often are detected by examining distributions of synonymous substitutions per site (Ks) within a genome, or {"}Ks plots.{"} For example, WGDs can be detected from Ks plots by using univariate mixture models to identify peaks in Ks distributions. We performed gene family simulation experiments to evaluate the effects of different Ks estimation methods and mixture models on our ability to detect ancient WGDs from Ks plots. The simulation experiments, which accounted for variation in substitution rates and gene duplication and loss rates across gene families, tested the effects of WGD age and gene retention rates following WGD on inferring WGDs from Ks plots. Our simulations reveal limitations of Ks plot analyses. Strict interpretations of mixture model analyses often overestimate the number of WGD events, and Ks plot analyses typically fail to detect WGDs when ≤10{\%} of the duplicated genes are retained following the WGD. However, WGDs can accurately be characterized over an intermediate range of Ks. The simulation results are supported by empirical analyses of transcriptomic data, which also suggest that biases in gene retention likely affect our ability to detect ancient WGDs. Although our results indicate mixture model results should be interpreted with great caution, using node-averaged Ks estimates and applying more appropriate mixture models can improve the accuracy of detecting WGDs.",
author = "Tiley, {George P.} and Barker, {Michael S} and Burleigh, {J. Gordon}",
year = "2018",
month = "11",
day = "1",
doi = "10.1093/gbe/evy200",
language = "English (US)",
volume = "10",
pages = "2882--2898",
journal = "Genome Biology and Evolution",
issn = "1759-6653",
publisher = "Oxford University Press",
number = "11",

}

TY - JOUR

T1 - Assessing the Performance of Ks Plots for Detecting Ancient Whole Genome Duplications

AU - Tiley, George P.

AU - Barker, Michael S

AU - Burleigh, J. Gordon

PY - 2018/11/1

Y1 - 2018/11/1

N2 - Genomic data have provided evidence of previously unknown ancient whole genome duplications (WGDs) and highlighted the role of WGDs in the evolution of many eukaryotic lineages. Ancient WGDs often are detected by examining distributions of synonymous substitutions per site (Ks) within a genome, or "Ks plots." For example, WGDs can be detected from Ks plots by using univariate mixture models to identify peaks in Ks distributions. We performed gene family simulation experiments to evaluate the effects of different Ks estimation methods and mixture models on our ability to detect ancient WGDs from Ks plots. The simulation experiments, which accounted for variation in substitution rates and gene duplication and loss rates across gene families, tested the effects of WGD age and gene retention rates following WGD on inferring WGDs from Ks plots. Our simulations reveal limitations of Ks plot analyses. Strict interpretations of mixture model analyses often overestimate the number of WGD events, and Ks plot analyses typically fail to detect WGDs when ≤10% of the duplicated genes are retained following the WGD. However, WGDs can accurately be characterized over an intermediate range of Ks. The simulation results are supported by empirical analyses of transcriptomic data, which also suggest that biases in gene retention likely affect our ability to detect ancient WGDs. Although our results indicate mixture model results should be interpreted with great caution, using node-averaged Ks estimates and applying more appropriate mixture models can improve the accuracy of detecting WGDs.

AB - Genomic data have provided evidence of previously unknown ancient whole genome duplications (WGDs) and highlighted the role of WGDs in the evolution of many eukaryotic lineages. Ancient WGDs often are detected by examining distributions of synonymous substitutions per site (Ks) within a genome, or "Ks plots." For example, WGDs can be detected from Ks plots by using univariate mixture models to identify peaks in Ks distributions. We performed gene family simulation experiments to evaluate the effects of different Ks estimation methods and mixture models on our ability to detect ancient WGDs from Ks plots. The simulation experiments, which accounted for variation in substitution rates and gene duplication and loss rates across gene families, tested the effects of WGD age and gene retention rates following WGD on inferring WGDs from Ks plots. Our simulations reveal limitations of Ks plot analyses. Strict interpretations of mixture model analyses often overestimate the number of WGD events, and Ks plot analyses typically fail to detect WGDs when ≤10% of the duplicated genes are retained following the WGD. However, WGDs can accurately be characterized over an intermediate range of Ks. The simulation results are supported by empirical analyses of transcriptomic data, which also suggest that biases in gene retention likely affect our ability to detect ancient WGDs. Although our results indicate mixture model results should be interpreted with great caution, using node-averaged Ks estimates and applying more appropriate mixture models can improve the accuracy of detecting WGDs.

UR - http://www.scopus.com/inward/record.url?scp=85056403615&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056403615&partnerID=8YFLogxK

U2 - 10.1093/gbe/evy200

DO - 10.1093/gbe/evy200

M3 - Article

C2 - 30239709

AN - SCOPUS:85056403615

VL - 10

SP - 2882

EP - 2898

JO - Genome Biology and Evolution

JF - Genome Biology and Evolution

SN - 1759-6653

IS - 11

ER -