Iterative selection using orthogonal regression techniques

Bradley Turnbull, Subhashis Ghosal, Hao Zhang

Research output: Contribution to journalArticle

Abstract

High dimensional data are nowadays encountered in various branches of science. Variable selection techniques play a key role in analyzing high dimensional data. Generally two approaches for variable selection in the high dimensional data setting are considered-forward selection methods and penalization methods. In the former, variables are introduced in the model one at a time depending on their ability to explain variation and the procedure is terminated at some stage following some stopping rule. In penalization techniques such as the least absolute selection and shrinkage operator (LASSO), as optimization procedure is carried out with an added carefully chosen penalty function, so that the solutions have a sparse structure. Recently, the idea of penalized forward selection has been introduced. The motivation comes from the fact that the penalization techniques like the LASSO give rise to closed form expressions when used in one dimension, just like the least squares estimator. Hence one can repeat such a procedure in a forward selection setting until it converges. The resulting procedure selects sparser models than comparable methods without compromising on predictive power. However, when the regressor is high dimensional, it is typical that many predictors are highly correlated. We show that in such situations, it is possible to improve stability and computational efficiency of the procedure further by introducing an orthogonalization step. At each selection step, variables potentially available to be selected in the model are screened on the basis of their correlation with variables already in the model, thus preventing unnecessary duplication. The new strategy, called the Selection Technique in Orthogonalized Regression Models (STORM), turns out to be extremely successful in reducing the model dimension further and also leads to improved predicting power. We also consider an aggressive version of the STORM, where a potential predictor will be permanently removed from further consideration if its regression coefficient is estimated as zero at any stage. We shall carry out a detailed simulation study to compare the newly proposed method with existing ones and analyze a real dataset.

Original languageEnglish (US)
Pages (from-to)557-564
Number of pages8
JournalStatistical Analysis and Data Mining
Volume6
Issue number6
DOIs
StatePublished - Dec 2013

Fingerprint

Orthogonal Regression
High-dimensional Data
Penalization
Variable Selection
Shrinkage
Predictors
Regression Model
Penalization Method
Orthogonalization
Model
Stopping Rule
Computational efficiency
Least Squares Estimator
Penalty Function
Duplication
Regression Coefficient
Operator
Mathematical operators
Computational Efficiency
One Dimension

Keywords

  • Forward selection
  • High dimensional regression
  • LASSO
  • Orthogonalization

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Analysis

Cite this

Iterative selection using orthogonal regression techniques. / Turnbull, Bradley; Ghosal, Subhashis; Zhang, Hao.

In: Statistical Analysis and Data Mining, Vol. 6, No. 6, 12.2013, p. 557-564.

Research output: Contribution to journalArticle

Turnbull, Bradley ; Ghosal, Subhashis ; Zhang, Hao. / Iterative selection using orthogonal regression techniques. In: Statistical Analysis and Data Mining. 2013 ; Vol. 6, No. 6. pp. 557-564.
@article{4464f9d65bbe4ef1adf06cdd32f756de,
title = "Iterative selection using orthogonal regression techniques",
abstract = "High dimensional data are nowadays encountered in various branches of science. Variable selection techniques play a key role in analyzing high dimensional data. Generally two approaches for variable selection in the high dimensional data setting are considered-forward selection methods and penalization methods. In the former, variables are introduced in the model one at a time depending on their ability to explain variation and the procedure is terminated at some stage following some stopping rule. In penalization techniques such as the least absolute selection and shrinkage operator (LASSO), as optimization procedure is carried out with an added carefully chosen penalty function, so that the solutions have a sparse structure. Recently, the idea of penalized forward selection has been introduced. The motivation comes from the fact that the penalization techniques like the LASSO give rise to closed form expressions when used in one dimension, just like the least squares estimator. Hence one can repeat such a procedure in a forward selection setting until it converges. The resulting procedure selects sparser models than comparable methods without compromising on predictive power. However, when the regressor is high dimensional, it is typical that many predictors are highly correlated. We show that in such situations, it is possible to improve stability and computational efficiency of the procedure further by introducing an orthogonalization step. At each selection step, variables potentially available to be selected in the model are screened on the basis of their correlation with variables already in the model, thus preventing unnecessary duplication. The new strategy, called the Selection Technique in Orthogonalized Regression Models (STORM), turns out to be extremely successful in reducing the model dimension further and also leads to improved predicting power. We also consider an aggressive version of the STORM, where a potential predictor will be permanently removed from further consideration if its regression coefficient is estimated as zero at any stage. We shall carry out a detailed simulation study to compare the newly proposed method with existing ones and analyze a real dataset.",
keywords = "Forward selection, High dimensional regression, LASSO, Orthogonalization",
author = "Bradley Turnbull and Subhashis Ghosal and Hao Zhang",
year = "2013",
month = "12",
doi = "10.1002/sam.11212",
language = "English (US)",
volume = "6",
pages = "557--564",
journal = "Statistical Analysis and Data Mining",
issn = "1932-1864",
publisher = "John Wiley and Sons Inc.",
number = "6",

}

TY - JOUR

T1 - Iterative selection using orthogonal regression techniques

AU - Turnbull, Bradley

AU - Ghosal, Subhashis

AU - Zhang, Hao

PY - 2013/12

Y1 - 2013/12

N2 - High dimensional data are nowadays encountered in various branches of science. Variable selection techniques play a key role in analyzing high dimensional data. Generally two approaches for variable selection in the high dimensional data setting are considered-forward selection methods and penalization methods. In the former, variables are introduced in the model one at a time depending on their ability to explain variation and the procedure is terminated at some stage following some stopping rule. In penalization techniques such as the least absolute selection and shrinkage operator (LASSO), as optimization procedure is carried out with an added carefully chosen penalty function, so that the solutions have a sparse structure. Recently, the idea of penalized forward selection has been introduced. The motivation comes from the fact that the penalization techniques like the LASSO give rise to closed form expressions when used in one dimension, just like the least squares estimator. Hence one can repeat such a procedure in a forward selection setting until it converges. The resulting procedure selects sparser models than comparable methods without compromising on predictive power. However, when the regressor is high dimensional, it is typical that many predictors are highly correlated. We show that in such situations, it is possible to improve stability and computational efficiency of the procedure further by introducing an orthogonalization step. At each selection step, variables potentially available to be selected in the model are screened on the basis of their correlation with variables already in the model, thus preventing unnecessary duplication. The new strategy, called the Selection Technique in Orthogonalized Regression Models (STORM), turns out to be extremely successful in reducing the model dimension further and also leads to improved predicting power. We also consider an aggressive version of the STORM, where a potential predictor will be permanently removed from further consideration if its regression coefficient is estimated as zero at any stage. We shall carry out a detailed simulation study to compare the newly proposed method with existing ones and analyze a real dataset.

AB - High dimensional data are nowadays encountered in various branches of science. Variable selection techniques play a key role in analyzing high dimensional data. Generally two approaches for variable selection in the high dimensional data setting are considered-forward selection methods and penalization methods. In the former, variables are introduced in the model one at a time depending on their ability to explain variation and the procedure is terminated at some stage following some stopping rule. In penalization techniques such as the least absolute selection and shrinkage operator (LASSO), as optimization procedure is carried out with an added carefully chosen penalty function, so that the solutions have a sparse structure. Recently, the idea of penalized forward selection has been introduced. The motivation comes from the fact that the penalization techniques like the LASSO give rise to closed form expressions when used in one dimension, just like the least squares estimator. Hence one can repeat such a procedure in a forward selection setting until it converges. The resulting procedure selects sparser models than comparable methods without compromising on predictive power. However, when the regressor is high dimensional, it is typical that many predictors are highly correlated. We show that in such situations, it is possible to improve stability and computational efficiency of the procedure further by introducing an orthogonalization step. At each selection step, variables potentially available to be selected in the model are screened on the basis of their correlation with variables already in the model, thus preventing unnecessary duplication. The new strategy, called the Selection Technique in Orthogonalized Regression Models (STORM), turns out to be extremely successful in reducing the model dimension further and also leads to improved predicting power. We also consider an aggressive version of the STORM, where a potential predictor will be permanently removed from further consideration if its regression coefficient is estimated as zero at any stage. We shall carry out a detailed simulation study to compare the newly proposed method with existing ones and analyze a real dataset.

KW - Forward selection

KW - High dimensional regression

KW - LASSO

KW - Orthogonalization

UR - http://www.scopus.com/inward/record.url?scp=84890230844&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890230844&partnerID=8YFLogxK

U2 - 10.1002/sam.11212

DO - 10.1002/sam.11212

M3 - Article

VL - 6

SP - 557

EP - 564

JO - Statistical Analysis and Data Mining

JF - Statistical Analysis and Data Mining

SN - 1932-1864

IS - 6

ER -