Machine learning-based differential network analysis

A study of stress-responsive transcriptomes in Arabidopsis

Chuang Ma, Mingming Xin, Kenneth A Feldmann, Xiangfeng Wang

Research output: Contribution to journalArticle

39 Citations (Scopus)

Abstract

Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning-based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stressresponsive "noninformative" genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained "informative" genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing-based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress-related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes.

Original languageEnglish (US)
Pages (from-to)520-537
Number of pages18
JournalPlant Cell
Volume26
Issue number2
DOIs
StatePublished - 2014

Fingerprint

artificial intelligence
Transcriptome
Arabidopsis
transcriptome
Gene Regulatory Networks
genes
Genes
prediction
learning
Salts
Learning
Data Mining
Machine Learning
Gene Expression Profiling
methodology
transcriptomics
mutagenesis
Mutagenesis
abiotic stress
salt stress

ASJC Scopus subject areas

  • Plant Science
  • Cell Biology

Cite this

Machine learning-based differential network analysis : A study of stress-responsive transcriptomes in Arabidopsis. / Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A; Wang, Xiangfeng.

In: Plant Cell, Vol. 26, No. 2, 2014, p. 520-537.

Research output: Contribution to journalArticle

@article{aeafe62c6fed44d1b068ad4575c0db81,
title = "Machine learning-based differential network analysis: A study of stress-responsive transcriptomes in Arabidopsis",
abstract = "Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning-based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stressresponsive {"}noninformative{"} genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained {"}informative{"} genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing-based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress-related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes.",
author = "Chuang Ma and Mingming Xin and Feldmann, {Kenneth A} and Xiangfeng Wang",
year = "2014",
doi = "10.1105/tpc.113.121913",
language = "English (US)",
volume = "26",
pages = "520--537",
journal = "Plant Cell",
issn = "1040-4651",
publisher = "American Society of Plant Biologists",
number = "2",

}

TY - JOUR

T1 - Machine learning-based differential network analysis

T2 - A study of stress-responsive transcriptomes in Arabidopsis

AU - Ma, Chuang

AU - Xin, Mingming

AU - Feldmann, Kenneth A

AU - Wang, Xiangfeng

PY - 2014

Y1 - 2014

N2 - Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning-based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stressresponsive "noninformative" genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained "informative" genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing-based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress-related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes.

AB - Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning-based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stressresponsive "noninformative" genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained "informative" genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing-based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress-related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes.

UR - http://www.scopus.com/inward/record.url?scp=84897076534&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897076534&partnerID=8YFLogxK

U2 - 10.1105/tpc.113.121913

DO - 10.1105/tpc.113.121913

M3 - Article

VL - 26

SP - 520

EP - 537

JO - Plant Cell

JF - Plant Cell

SN - 1040-4651

IS - 2

ER -