Demographic history and rare allele sharing among human populations

Simon Gravel, Brenna M. Henn, Ryan N Gutenkunst, Amit R. Indap, Gabor T. Marth, Andrew G. Clark, Fuli Yu, Richard A. Gibbs, Carlos D. Bustamante

Research output: Contribution to journalArticle

296 Citations (Scopus)

Abstract

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted highcoverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including wholegenome 2-4x coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.

Original languageEnglish (US)
Pages (from-to)11983-11988
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume108
Issue number29
DOIs
StatePublished - Jul 19 2011

Fingerprint

Alleles
Demography
Population
Genome
HapMap Project
Chromosomes, Human, Pair 2
Population Growth
Gene Frequency
Sample Size
Exons
Chromosomes
Joints
Technology
Genes

Keywords

  • Demographic inference
  • Genetic drift
  • Human evolution
  • Population genetics

ASJC Scopus subject areas

  • General

Cite this

Demographic history and rare allele sharing among human populations. / Gravel, Simon; Henn, Brenna M.; Gutenkunst, Ryan N; Indap, Amit R.; Marth, Gabor T.; Clark, Andrew G.; Yu, Fuli; Gibbs, Richard A.; Bustamante, Carlos D.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 108, No. 29, 19.07.2011, p. 11983-11988.

Research output: Contribution to journalArticle

Gravel, S, Henn, BM, Gutenkunst, RN, Indap, AR, Marth, GT, Clark, AG, Yu, F, Gibbs, RA & Bustamante, CD 2011, 'Demographic history and rare allele sharing among human populations', Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 29, pp. 11983-11988. https://doi.org/10.1073/pnas.1019276108
Gravel, Simon ; Henn, Brenna M. ; Gutenkunst, Ryan N ; Indap, Amit R. ; Marth, Gabor T. ; Clark, Andrew G. ; Yu, Fuli ; Gibbs, Richard A. ; Bustamante, Carlos D. / Demographic history and rare allele sharing among human populations. In: Proceedings of the National Academy of Sciences of the United States of America. 2011 ; Vol. 108, No. 29. pp. 11983-11988.
@article{0db88976e90746b896e885965321cf28,
title = "Demographic history and rare allele sharing among human populations",
abstract = "High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted highcoverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including wholegenome 2-4x coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.",
keywords = "Demographic inference, Genetic drift, Human evolution, Population genetics",
author = "Simon Gravel and Henn, {Brenna M.} and Gutenkunst, {Ryan N} and Indap, {Amit R.} and Marth, {Gabor T.} and Clark, {Andrew G.} and Fuli Yu and Gibbs, {Richard A.} and Bustamante, {Carlos D.}",
year = "2011",
month = "7",
day = "19",
doi = "10.1073/pnas.1019276108",
language = "English (US)",
volume = "108",
pages = "11983--11988",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "29",

}

TY - JOUR

T1 - Demographic history and rare allele sharing among human populations

AU - Gravel, Simon

AU - Henn, Brenna M.

AU - Gutenkunst, Ryan N

AU - Indap, Amit R.

AU - Marth, Gabor T.

AU - Clark, Andrew G.

AU - Yu, Fuli

AU - Gibbs, Richard A.

AU - Bustamante, Carlos D.

PY - 2011/7/19

Y1 - 2011/7/19

N2 - High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted highcoverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including wholegenome 2-4x coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.

AB - High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted highcoverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including wholegenome 2-4x coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.

KW - Demographic inference

KW - Genetic drift

KW - Human evolution

KW - Population genetics

UR - http://www.scopus.com/inward/record.url?scp=79961091828&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79961091828&partnerID=8YFLogxK

U2 - 10.1073/pnas.1019276108

DO - 10.1073/pnas.1019276108

M3 - Article

C2 - 21730125

AN - SCOPUS:79961091828

VL - 108

SP - 11983

EP - 11988

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 29

ER -