A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance

Gregory Ditzler, Robi Polikar, Gail Rosen

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.

Original languageEnglish (US)
Article number6823119
Pages (from-to)880-886
Number of pages7
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume26
Issue number4
DOIs
StatePublished - Apr 1 2015
Externally publishedYes

Fingerprint

Feature extraction
Statistical tests
Data structures

Keywords

  • Feature selection (FS)
  • Neyman-Pearson

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance. / Ditzler, Gregory; Polikar, Robi; Rosen, Gail.

In: IEEE Transactions on Neural Networks and Learning Systems, Vol. 26, No. 4, 6823119, 01.04.2015, p. 880-886.

Research output: Contribution to journalArticle

@article{bb4aa98189684cc1a02281e30a0d5828,
title = "A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance",
abstract = "Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.",
keywords = "Feature selection (FS), Neyman-Pearson",
author = "Gregory Ditzler and Robi Polikar and Gail Rosen",
year = "2015",
month = "4",
day = "1",
doi = "10.1109/TNNLS.2014.2320415",
language = "English (US)",
volume = "26",
pages = "880--886",
journal = "IEEE Transactions on Neural Networks and Learning Systems",
issn = "2162-237X",
publisher = "IEEE Computational Intelligence Society",
number = "4",

}

TY - JOUR

T1 - A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance

AU - Ditzler, Gregory

AU - Polikar, Robi

AU - Rosen, Gail

PY - 2015/4/1

Y1 - 2015/4/1

N2 - Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.

AB - Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.

KW - Feature selection (FS)

KW - Neyman-Pearson

UR - http://www.scopus.com/inward/record.url?scp=85028170986&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028170986&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2014.2320415

DO - 10.1109/TNNLS.2014.2320415

M3 - Article

AN - SCOPUS:85028170986

VL - 26

SP - 880

EP - 886

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

SN - 2162-237X

IS - 4

M1 - 6823119

ER -