Variance component selection with applications to microbiome taxonomic data

Jing Zhai, Juhyun Kim, Kenneth S Knox, Homer L. Twigg, Hua Zhou, Jin Zhou

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.

Original languageEnglish (US)
Article number509
JournalFrontiers in Microbiology
Volume9
Issue numberMAR
DOIs
StatePublished - Mar 28 2018

Fingerprint

Microbiota
Computing Methodologies
Joints
Documentation
Longitudinal Studies
Language
Software
Regression Analysis
HIV
Technology
Phenotype
Population

Keywords

  • Human Immunodeficiency Virus (HIV)
  • Lasso
  • Longitudinal study
  • Lung microbiome
  • MM-algorithm
  • Variable selection
  • Variance component models

ASJC Scopus subject areas

  • Microbiology
  • Microbiology (medical)

Cite this

Variance component selection with applications to microbiome taxonomic data. / Zhai, Jing; Kim, Juhyun; Knox, Kenneth S; Twigg, Homer L.; Zhou, Hua; Zhou, Jin.

In: Frontiers in Microbiology, Vol. 9, No. MAR, 509, 28.03.2018.

Research output: Contribution to journalArticle

Zhai, Jing ; Kim, Juhyun ; Knox, Kenneth S ; Twigg, Homer L. ; Zhou, Hua ; Zhou, Jin. / Variance component selection with applications to microbiome taxonomic data. In: Frontiers in Microbiology. 2018 ; Vol. 9, No. MAR.
@article{592331c292ba4c52a7df144f9b9a081c,
title = "Variance component selection with applications to microbiome taxonomic data",
abstract = "High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.",
keywords = "Human Immunodeficiency Virus (HIV), Lasso, Longitudinal study, Lung microbiome, MM-algorithm, Variable selection, Variance component models",
author = "Jing Zhai and Juhyun Kim and Knox, {Kenneth S} and Twigg, {Homer L.} and Hua Zhou and Jin Zhou",
year = "2018",
month = "3",
day = "28",
doi = "10.3389/fmicb.2018.00509",
language = "English (US)",
volume = "9",
journal = "Frontiers in Microbiology",
issn = "1664-302X",
publisher = "Frontiers Media S. A.",
number = "MAR",

}

TY - JOUR

T1 - Variance component selection with applications to microbiome taxonomic data

AU - Zhai, Jing

AU - Kim, Juhyun

AU - Knox, Kenneth S

AU - Twigg, Homer L.

AU - Zhou, Hua

AU - Zhou, Jin

PY - 2018/3/28

Y1 - 2018/3/28

N2 - High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.

AB - High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.

KW - Human Immunodeficiency Virus (HIV)

KW - Lasso

KW - Longitudinal study

KW - Lung microbiome

KW - MM-algorithm

KW - Variable selection

KW - Variance component models

UR - http://www.scopus.com/inward/record.url?scp=85044824277&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044824277&partnerID=8YFLogxK

U2 - 10.3389/fmicb.2018.00509

DO - 10.3389/fmicb.2018.00509

M3 - Article

AN - SCOPUS:85044824277

VL - 9

JO - Frontiers in Microbiology

JF - Frontiers in Microbiology

SN - 1664-302X

IS - MAR

M1 - 509

ER -