Improved sparse multi-class SVM and its application for gene selection in cancer classification

Lingkang Hhuang, Hao Zhang, Zhao Bang Zeng, Pierre Rr Bushel

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Background: Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results: The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions: High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention.

Original languageEnglish (US)
Pages (from-to)143-153
Number of pages11
JournalCancer Informatics
Volume12
DOIs
StatePublished - Aug 4 2013

Fingerprint

Genes
Neoplasms
Sample Size
Learning
Gene Expression
Aptitude
Neoplasm Genes
Support Vector Machine
Transcriptome
Therapeutics

Keywords

  • Cancer classification
  • Classification
  • Microarray
  • Multi-class SVM
  • Shrinkage methods
  • Support vector machine (SVM)
  • Variable selection

ASJC Scopus subject areas

  • Cancer Research
  • Oncology

Cite this

Improved sparse multi-class SVM and its application for gene selection in cancer classification. / Hhuang, Lingkang; Zhang, Hao; Zeng, Zhao Bang; Bushel, Pierre Rr.

In: Cancer Informatics, Vol. 12, 04.08.2013, p. 143-153.

Research output: Contribution to journalArticle

Hhuang, Lingkang ; Zhang, Hao ; Zeng, Zhao Bang ; Bushel, Pierre Rr. / Improved sparse multi-class SVM and its application for gene selection in cancer classification. In: Cancer Informatics. 2013 ; Vol. 12. pp. 143-153.
@article{4626e663aec9430f9248e227be68bf34,
title = "Improved sparse multi-class SVM and its application for gene selection in cancer classification",
abstract = "Background: Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results: The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions: High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention.",
keywords = "Cancer classification, Classification, Microarray, Multi-class SVM, Shrinkage methods, Support vector machine (SVM), Variable selection",
author = "Lingkang Hhuang and Hao Zhang and Zeng, {Zhao Bang} and Bushel, {Pierre Rr}",
year = "2013",
month = "8",
day = "4",
doi = "10.4137/CIN.S10212",
language = "English (US)",
volume = "12",
pages = "143--153",
journal = "Cancer Informatics",
issn = "1176-9351",
publisher = "Libertas Academica Ltd.",

}

TY - JOUR

T1 - Improved sparse multi-class SVM and its application for gene selection in cancer classification

AU - Hhuang, Lingkang

AU - Zhang, Hao

AU - Zeng, Zhao Bang

AU - Bushel, Pierre Rr

PY - 2013/8/4

Y1 - 2013/8/4

N2 - Background: Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results: The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions: High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention.

AB - Background: Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results: The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions: High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention.

KW - Cancer classification

KW - Classification

KW - Microarray

KW - Multi-class SVM

KW - Shrinkage methods

KW - Support vector machine (SVM)

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=84881281027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881281027&partnerID=8YFLogxK

U2 - 10.4137/CIN.S10212

DO - 10.4137/CIN.S10212

M3 - Article

C2 - 23966761

AN - SCOPUS:84881281027

VL - 12

SP - 143

EP - 153

JO - Cancer Informatics

JF - Cancer Informatics

SN - 1176-9351

ER -