Assessment of whole-genome regression for type II diabetes

Ana I. Vazquez, Yann C Klimentidis, Emily J. Dhurandhar, Yogasudha C. Veturi, Paulino Paérez-Rodríguez

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Lifestyle and genetic factors play a large role in the development of Type 2 Diabetes (T2D). Despite the important role of genetic factors, genetic information is not incorporated into the clinical assessment of T2D risk. We assessed and compared Whole Genome Regression methods to predict the T2D status of 5,245 subjects from the Framingham Heart Study. For evaluating each method we constructed the following set of regression models: A clinical baseline model (CBM) which included non-genetic covariates only. CBM was extended by adding the first two marker-derived principal components and 65 SNPs identified by a recent GWAS consortium for T2D (M-65SNPs). Subsequently, it was further extended by adding 249,798 genome-wide SNPs from a high-density array. The Bayesian models used to incorporate genome-wide marker information as predictors were: Bayes A, Bayes CΠ, Bayesian LASSO (BL), and the Genomic Best Linear Unbiased Prediction (G-BLUP). Results included estimates of the genetic variance and heritability, genetic scores for T2D, and predictive ability evaluated in a 10-fold cross-validation. The predictive AUC estimates for CBM and M-65SNPs were: 0.668 and 0.684, respectively. We found evidence of contribution of genetic effects in T2D, as reflected in the genomic heritability estimates (0.492±0.066). The highest predictive AUC among the genome-wide marker Bayesian models was 0.681 for the Bayesian LASSO. Overall, the improvement in predictive ability was moderate and did not differ greatly among models that included genetic information. Approximately 58% of the total number of genetic variants was found to contribute to the overall genetic variation, indicating a complex genetic architecture for T2D. Our results suggest that the Bayes CΠ and the G-BLUP models with a large set of genome-wide markers could be used for predicting risk to T2D, as an alternative to using high-density arrays when selected markers from large consortiums for a given complex trait or disease are unavailable.

Original languageEnglish (US)
Article numbere0123818
JournalPLoS One
Volume10
Issue number4
DOIs
StatePublished - Apr 17 2015

Fingerprint

Medical problems
noninsulin-dependent diabetes mellitus
Type 2 Diabetes Mellitus
Genes
Genome
genome
Aptitude
genomics
Area Under Curve
Single Nucleotide Polymorphism
heritability
prediction
Genome-Wide Association Study
Genetic Models
genetic variance
lifestyle
Life Style
heart
genetic variation
methodology

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Vazquez, A. I., Klimentidis, Y. C., Dhurandhar, E. J., Veturi, Y. C., & Paérez-Rodríguez, P. (2015). Assessment of whole-genome regression for type II diabetes. PLoS One, 10(4), [e0123818]. https://doi.org/10.1371/journal.pone.0123818

Assessment of whole-genome regression for type II diabetes. / Vazquez, Ana I.; Klimentidis, Yann C; Dhurandhar, Emily J.; Veturi, Yogasudha C.; Paérez-Rodríguez, Paulino.

In: PLoS One, Vol. 10, No. 4, e0123818, 17.04.2015.

Research output: Contribution to journalArticle

Vazquez, AI, Klimentidis, YC, Dhurandhar, EJ, Veturi, YC & Paérez-Rodríguez, P 2015, 'Assessment of whole-genome regression for type II diabetes', PLoS One, vol. 10, no. 4, e0123818. https://doi.org/10.1371/journal.pone.0123818
Vazquez AI, Klimentidis YC, Dhurandhar EJ, Veturi YC, Paérez-Rodríguez P. Assessment of whole-genome regression for type II diabetes. PLoS One. 2015 Apr 17;10(4). e0123818. https://doi.org/10.1371/journal.pone.0123818
Vazquez, Ana I. ; Klimentidis, Yann C ; Dhurandhar, Emily J. ; Veturi, Yogasudha C. ; Paérez-Rodríguez, Paulino. / Assessment of whole-genome regression for type II diabetes. In: PLoS One. 2015 ; Vol. 10, No. 4.
@article{7aef86a6f72344afa80c051a8193cb18,
title = "Assessment of whole-genome regression for type II diabetes",
abstract = "Lifestyle and genetic factors play a large role in the development of Type 2 Diabetes (T2D). Despite the important role of genetic factors, genetic information is not incorporated into the clinical assessment of T2D risk. We assessed and compared Whole Genome Regression methods to predict the T2D status of 5,245 subjects from the Framingham Heart Study. For evaluating each method we constructed the following set of regression models: A clinical baseline model (CBM) which included non-genetic covariates only. CBM was extended by adding the first two marker-derived principal components and 65 SNPs identified by a recent GWAS consortium for T2D (M-65SNPs). Subsequently, it was further extended by adding 249,798 genome-wide SNPs from a high-density array. The Bayesian models used to incorporate genome-wide marker information as predictors were: Bayes A, Bayes CΠ, Bayesian LASSO (BL), and the Genomic Best Linear Unbiased Prediction (G-BLUP). Results included estimates of the genetic variance and heritability, genetic scores for T2D, and predictive ability evaluated in a 10-fold cross-validation. The predictive AUC estimates for CBM and M-65SNPs were: 0.668 and 0.684, respectively. We found evidence of contribution of genetic effects in T2D, as reflected in the genomic heritability estimates (0.492±0.066). The highest predictive AUC among the genome-wide marker Bayesian models was 0.681 for the Bayesian LASSO. Overall, the improvement in predictive ability was moderate and did not differ greatly among models that included genetic information. Approximately 58{\%} of the total number of genetic variants was found to contribute to the overall genetic variation, indicating a complex genetic architecture for T2D. Our results suggest that the Bayes CΠ and the G-BLUP models with a large set of genome-wide markers could be used for predicting risk to T2D, as an alternative to using high-density arrays when selected markers from large consortiums for a given complex trait or disease are unavailable.",
author = "Vazquez, {Ana I.} and Klimentidis, {Yann C} and Dhurandhar, {Emily J.} and Veturi, {Yogasudha C.} and Paulino Pa{\'e}rez-Rodr{\'i}guez",
year = "2015",
month = "4",
day = "17",
doi = "10.1371/journal.pone.0123818",
language = "English (US)",
volume = "10",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "4",

}

TY - JOUR

T1 - Assessment of whole-genome regression for type II diabetes

AU - Vazquez, Ana I.

AU - Klimentidis, Yann C

AU - Dhurandhar, Emily J.

AU - Veturi, Yogasudha C.

AU - Paérez-Rodríguez, Paulino

PY - 2015/4/17

Y1 - 2015/4/17

N2 - Lifestyle and genetic factors play a large role in the development of Type 2 Diabetes (T2D). Despite the important role of genetic factors, genetic information is not incorporated into the clinical assessment of T2D risk. We assessed and compared Whole Genome Regression methods to predict the T2D status of 5,245 subjects from the Framingham Heart Study. For evaluating each method we constructed the following set of regression models: A clinical baseline model (CBM) which included non-genetic covariates only. CBM was extended by adding the first two marker-derived principal components and 65 SNPs identified by a recent GWAS consortium for T2D (M-65SNPs). Subsequently, it was further extended by adding 249,798 genome-wide SNPs from a high-density array. The Bayesian models used to incorporate genome-wide marker information as predictors were: Bayes A, Bayes CΠ, Bayesian LASSO (BL), and the Genomic Best Linear Unbiased Prediction (G-BLUP). Results included estimates of the genetic variance and heritability, genetic scores for T2D, and predictive ability evaluated in a 10-fold cross-validation. The predictive AUC estimates for CBM and M-65SNPs were: 0.668 and 0.684, respectively. We found evidence of contribution of genetic effects in T2D, as reflected in the genomic heritability estimates (0.492±0.066). The highest predictive AUC among the genome-wide marker Bayesian models was 0.681 for the Bayesian LASSO. Overall, the improvement in predictive ability was moderate and did not differ greatly among models that included genetic information. Approximately 58% of the total number of genetic variants was found to contribute to the overall genetic variation, indicating a complex genetic architecture for T2D. Our results suggest that the Bayes CΠ and the G-BLUP models with a large set of genome-wide markers could be used for predicting risk to T2D, as an alternative to using high-density arrays when selected markers from large consortiums for a given complex trait or disease are unavailable.

AB - Lifestyle and genetic factors play a large role in the development of Type 2 Diabetes (T2D). Despite the important role of genetic factors, genetic information is not incorporated into the clinical assessment of T2D risk. We assessed and compared Whole Genome Regression methods to predict the T2D status of 5,245 subjects from the Framingham Heart Study. For evaluating each method we constructed the following set of regression models: A clinical baseline model (CBM) which included non-genetic covariates only. CBM was extended by adding the first two marker-derived principal components and 65 SNPs identified by a recent GWAS consortium for T2D (M-65SNPs). Subsequently, it was further extended by adding 249,798 genome-wide SNPs from a high-density array. The Bayesian models used to incorporate genome-wide marker information as predictors were: Bayes A, Bayes CΠ, Bayesian LASSO (BL), and the Genomic Best Linear Unbiased Prediction (G-BLUP). Results included estimates of the genetic variance and heritability, genetic scores for T2D, and predictive ability evaluated in a 10-fold cross-validation. The predictive AUC estimates for CBM and M-65SNPs were: 0.668 and 0.684, respectively. We found evidence of contribution of genetic effects in T2D, as reflected in the genomic heritability estimates (0.492±0.066). The highest predictive AUC among the genome-wide marker Bayesian models was 0.681 for the Bayesian LASSO. Overall, the improvement in predictive ability was moderate and did not differ greatly among models that included genetic information. Approximately 58% of the total number of genetic variants was found to contribute to the overall genetic variation, indicating a complex genetic architecture for T2D. Our results suggest that the Bayes CΠ and the G-BLUP models with a large set of genome-wide markers could be used for predicting risk to T2D, as an alternative to using high-density arrays when selected markers from large consortiums for a given complex trait or disease are unavailable.

UR - http://www.scopus.com/inward/record.url?scp=84929486170&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84929486170&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0123818

DO - 10.1371/journal.pone.0123818

M3 - Article

VL - 10

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 4

M1 - e0123818

ER -