Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provide robust and computationally efficient analysis of both gene expression means and variability on RNA-seq counts. TheMDSeq utilizes a novel reparametrization of the negative binomial to provide flexible generalized linear models (GLMs) on both the mean and dispersion. We address challenges of analyzing largescale RNA-seq data via several new developments to provide a comprehensive toolset that models technical excess zeros, identifies outliers efficiently, and evaluates differential expressions at biologically interesting levels. We evaluated performances of the MDSeq using simulated data when the ground truths are known. Results suggest that the MDSeq often outperforms current methods for the analysis of gene expression mean and variability. Moreover, the MDSeq is applied in two real RNA-seq studies, in which we identified functionally relevant genes and gene pathways. Specifically, the analysis of gene expression variability with the MDSeq on the GTEx human brain tissue data has identified pathways associated with common neurodegenerative disorders when gene expression means were conserved.

Original languageEnglish (US)
Article numbere127
JournalNucleic Acids Research
Volume45
Issue number13
DOIs
StatePublished - Jul 1 2017
Externally publishedYes

Fingerprint

RNA
Gene Expression
Neurodegenerative Diseases
Genes
Linear Models
Costs and Cost Analysis
Brain

ASJC Scopus subject areas

  • Genetics

Cite this

Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq. / Ran, Di; Daye, Zhongyin J.

In: Nucleic Acids Research, Vol. 45, No. 13, e127, 01.07.2017.

Research output: Contribution to journalArticle

@article{b08dd70a090142c7bc46dbd895971b03,
title = "Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq",
abstract = "Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provide robust and computationally efficient analysis of both gene expression means and variability on RNA-seq counts. TheMDSeq utilizes a novel reparametrization of the negative binomial to provide flexible generalized linear models (GLMs) on both the mean and dispersion. We address challenges of analyzing largescale RNA-seq data via several new developments to provide a comprehensive toolset that models technical excess zeros, identifies outliers efficiently, and evaluates differential expressions at biologically interesting levels. We evaluated performances of the MDSeq using simulated data when the ground truths are known. Results suggest that the MDSeq often outperforms current methods for the analysis of gene expression mean and variability. Moreover, the MDSeq is applied in two real RNA-seq studies, in which we identified functionally relevant genes and gene pathways. Specifically, the analysis of gene expression variability with the MDSeq on the GTEx human brain tissue data has identified pathways associated with common neurodegenerative disorders when gene expression means were conserved.",
author = "Di Ran and Daye, {Zhongyin J}",
year = "2017",
month = "7",
day = "1",
doi = "10.1093/nar/gkx456",
language = "English (US)",
volume = "45",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "13",

}

TY - JOUR

T1 - Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq

AU - Ran, Di

AU - Daye, Zhongyin J

PY - 2017/7/1

Y1 - 2017/7/1

N2 - Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provide robust and computationally efficient analysis of both gene expression means and variability on RNA-seq counts. TheMDSeq utilizes a novel reparametrization of the negative binomial to provide flexible generalized linear models (GLMs) on both the mean and dispersion. We address challenges of analyzing largescale RNA-seq data via several new developments to provide a comprehensive toolset that models technical excess zeros, identifies outliers efficiently, and evaluates differential expressions at biologically interesting levels. We evaluated performances of the MDSeq using simulated data when the ground truths are known. Results suggest that the MDSeq often outperforms current methods for the analysis of gene expression mean and variability. Moreover, the MDSeq is applied in two real RNA-seq studies, in which we identified functionally relevant genes and gene pathways. Specifically, the analysis of gene expression variability with the MDSeq on the GTEx human brain tissue data has identified pathways associated with common neurodegenerative disorders when gene expression means were conserved.

AB - Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provide robust and computationally efficient analysis of both gene expression means and variability on RNA-seq counts. TheMDSeq utilizes a novel reparametrization of the negative binomial to provide flexible generalized linear models (GLMs) on both the mean and dispersion. We address challenges of analyzing largescale RNA-seq data via several new developments to provide a comprehensive toolset that models technical excess zeros, identifies outliers efficiently, and evaluates differential expressions at biologically interesting levels. We evaluated performances of the MDSeq using simulated data when the ground truths are known. Results suggest that the MDSeq often outperforms current methods for the analysis of gene expression mean and variability. Moreover, the MDSeq is applied in two real RNA-seq studies, in which we identified functionally relevant genes and gene pathways. Specifically, the analysis of gene expression variability with the MDSeq on the GTEx human brain tissue data has identified pathways associated with common neurodegenerative disorders when gene expression means were conserved.

UR - http://www.scopus.com/inward/record.url?scp=85026466568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026466568&partnerID=8YFLogxK

U2 - 10.1093/nar/gkx456

DO - 10.1093/nar/gkx456

M3 - Article

C2 - 28535263

AN - SCOPUS:85026466568

VL - 45

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 13

M1 - e127

ER -