High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis

Zhongyin J Daye, Jinbo Chen, Hongzhe Li

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

We consider the problem of high-dimensional regression under nonconstant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows nonconstant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.

Original languageEnglish (US)
Pages (from-to)316-326
Number of pages11
JournalBiometrics
Volume68
Issue number1
DOIs
StatePublished - Mar 2012
Externally publishedYes

Fingerprint

heteroskedasticity
Heteroscedastic Regression
Quantitative Trait Loci
Heteroscedasticity
quantitative trait loci
data analysis
Data analysis
High-dimensional
Biological Phenomena
Gene expression
Yeast
Variance Components
Dimensional Analysis
Prediction Error
Variable Selection
methodology
Model Selection
Yeasts
Demonstrate
Outlier

Keywords

  • Generalized least squares
  • Heteroscedasticity
  • Largepsmalln
  • Model selection
  • Sparse regression
  • Variance estimation

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability
  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Medicine(all)

Cite this

High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis. / Daye, Zhongyin J; Chen, Jinbo; Li, Hongzhe.

In: Biometrics, Vol. 68, No. 1, 03.2012, p. 316-326.

Research output: Contribution to journalArticle

Daye, Zhongyin J ; Chen, Jinbo ; Li, Hongzhe. / High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis. In: Biometrics. 2012 ; Vol. 68, No. 1. pp. 316-326.
@article{4f3100278d924e8ba5f17a68c6dde312,
title = "High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis",
abstract = "We consider the problem of high-dimensional regression under nonconstant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows nonconstant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.",
keywords = "Generalized least squares, Heteroscedasticity, Largepsmalln, Model selection, Sparse regression, Variance estimation",
author = "Daye, {Zhongyin J} and Jinbo Chen and Hongzhe Li",
year = "2012",
month = "3",
doi = "10.1111/j.1541-0420.2011.01652.x",
language = "English (US)",
volume = "68",
pages = "316--326",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis

AU - Daye, Zhongyin J

AU - Chen, Jinbo

AU - Li, Hongzhe

PY - 2012/3

Y1 - 2012/3

N2 - We consider the problem of high-dimensional regression under nonconstant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows nonconstant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.

AB - We consider the problem of high-dimensional regression under nonconstant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows nonconstant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.

KW - Generalized least squares

KW - Heteroscedasticity

KW - Largepsmalln

KW - Model selection

KW - Sparse regression

KW - Variance estimation

UR - http://www.scopus.com/inward/record.url?scp=84858863029&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858863029&partnerID=8YFLogxK

U2 - 10.1111/j.1541-0420.2011.01652.x

DO - 10.1111/j.1541-0420.2011.01652.x

M3 - Article

VL - 68

SP - 316

EP - 326

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 1

ER -