Regression Models for Multivariate Count Data

Yiwen Zhang, Hua Zhou, Jin Zhou, Wei Sun

Research output: Contribution to journalArticle

19 Scopus citations

Abstract

Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of overdispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly because they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. Supplementary materials for this article are available online.

Original languageEnglish (US)
Pages (from-to)1-13
Number of pages13
JournalJournal of Computational and Graphical Statistics
Volume26
Issue number1
DOIs
StatePublished - Jan 2 2017

Keywords

  • Analysis of deviance
  • Categorical data analysis
  • Dirichlet-multinomial
  • Generalized Dirichlet-multinomial
  • Iteratively reweighted Poisson regression (IRPR)
  • Negative multinomial
  • Reduced rank GLM
  • Regularization

ASJC Scopus subject areas

  • Statistics and Probability
  • Discrete Mathematics and Combinatorics
  • Statistics, Probability and Uncertainty

Fingerprint Dive into the research topics of 'Regression Models for Multivariate Count Data'. Together they form a unique fingerprint.

  • Cite this