Sparse Penalized Forward Selection for Support Vector Classification

Subhashis Ghosal, Bradley Turnbull, Hao Zhang, Wook Yeon Hwang

Research output: Contribution to journalArticle

Abstract

We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ℓ1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ℓ1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.

Original languageEnglish (US)
Pages (from-to)493-514
Number of pages22
JournalJournal of Computational and Graphical Statistics
Volume25
Issue number2
DOIs
StatePublished - Apr 2 2016

Fingerprint

Support Vector
Predictors
Support Vector Machine
Variable Selection
High-dimensional
Forward Search
Misclassification Rate
Optimization Problem
Binary Classification
Binary Variables
Classification Rules
Penalization
Prediction
Loss Function
Logistic Regression
Piecewise Linear
One Dimension
Penalty
Convexity
Model

Keywords

  • High dimension
  • Penalization
  • Sparsity
  • SVM
  • Variable selection

ASJC Scopus subject areas

  • Discrete Mathematics and Combinatorics
  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Sparse Penalized Forward Selection for Support Vector Classification. / Ghosal, Subhashis; Turnbull, Bradley; Zhang, Hao; Hwang, Wook Yeon.

In: Journal of Computational and Graphical Statistics, Vol. 25, No. 2, 02.04.2016, p. 493-514.

Research output: Contribution to journalArticle

Ghosal, Subhashis ; Turnbull, Bradley ; Zhang, Hao ; Hwang, Wook Yeon. / Sparse Penalized Forward Selection for Support Vector Classification. In: Journal of Computational and Graphical Statistics. 2016 ; Vol. 25, No. 2. pp. 493-514.
@article{fcb20f1e31e2430ba54893fa1ae435b2,
title = "Sparse Penalized Forward Selection for Support Vector Classification",
abstract = "We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ℓ1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ℓ1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.",
keywords = "High dimension, Penalization, Sparsity, SVM, Variable selection",
author = "Subhashis Ghosal and Bradley Turnbull and Hao Zhang and Hwang, {Wook Yeon}",
year = "2016",
month = "4",
day = "2",
doi = "10.1080/10618600.2015.1023395",
language = "English (US)",
volume = "25",
pages = "493--514",
journal = "Journal of Computational and Graphical Statistics",
issn = "1061-8600",
publisher = "American Statistical Association",
number = "2",

}

TY - JOUR

T1 - Sparse Penalized Forward Selection for Support Vector Classification

AU - Ghosal, Subhashis

AU - Turnbull, Bradley

AU - Zhang, Hao

AU - Hwang, Wook Yeon

PY - 2016/4/2

Y1 - 2016/4/2

N2 - We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ℓ1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ℓ1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.

AB - We propose a new binary classification and variable selection technique especially designed for high-dimensional predictors. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection along with classification. By adding an ℓ1-type penalty to the loss function, common classification methods such as logistic regression or support vector machines (SVM) can perform variable selection. Existing penalized SVM methods all attempt to jointly solve all the parameters involved in the penalization problem altogether. When data dimension is very high, the joint optimization problem is very complex and involves a lot of memory allocation. In this article, we propose a new penalized forward search technique that can reduce high-dimensional optimization problems to one-dimensional optimization by iterating the selection steps. The new algorithm can be regarded as a forward selection version of the penalized SVM and its variants. The advantage of optimizing in one dimension is that the location of the optimum solution can be obtained with intelligent search by exploiting convexity and a piecewise linear or quadratic structure of the criterion function. In each step, the predictor that is most able to predict the outcome is chosen in the model. The search is then repeatedly used in an iterative fashion until convergence occurs. Comparison of our new classification rule with ℓ1-SVM and other common methods show very promising performance, in that the proposed method leads to much leaner models without compromising misclassification rates, particularly for high-dimensional predictors.

KW - High dimension

KW - Penalization

KW - Sparsity

KW - SVM

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=84971449318&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971449318&partnerID=8YFLogxK

U2 - 10.1080/10618600.2015.1023395

DO - 10.1080/10618600.2015.1023395

M3 - Article

AN - SCOPUS:84971449318

VL - 25

SP - 493

EP - 514

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

SN - 1061-8600

IS - 2

ER -