Multiple comparisons in induction algorithms

David D. Jensen, Paul R Cohen

Research output: Contribution to journalArticle

117 Citations (Scopus)

Abstract

A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.

Original languageEnglish (US)
Pages (from-to)309-338
Number of pages30
JournalMachine Learning
Volume38
Issue number3
DOIs
StatePublished - 2000
Externally publishedYes

Fingerprint

Pathology
Function evaluation
Testing

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Systems Engineering

Cite this

Multiple comparisons in induction algorithms. / Jensen, David D.; Cohen, Paul R.

In: Machine Learning, Vol. 38, No. 3, 2000, p. 309-338.

Research output: Contribution to journalArticle

Jensen, David D. ; Cohen, Paul R. / Multiple comparisons in induction algorithms. In: Machine Learning. 2000 ; Vol. 38, No. 3. pp. 309-338.
@article{a5e06b0188404aa891ab5013b06c0abd,
title = "Multiple comparisons in induction algorithms",
abstract = "A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.",
author = "Jensen, {David D.} and Cohen, {Paul R}",
year = "2000",
doi = "10.1023/A:1007631014630",
language = "English (US)",
volume = "38",
pages = "309--338",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "3",

}

TY - JOUR

T1 - Multiple comparisons in induction algorithms

AU - Jensen, David D.

AU - Cohen, Paul R

PY - 2000

Y1 - 2000

N2 - A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.

AB - A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a multiple comparison procedure (MCP). We analyze the statistical properties of MCPs and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation.

UR - http://www.scopus.com/inward/record.url?scp=0033907286&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033907286&partnerID=8YFLogxK

U2 - 10.1023/A:1007631014630

DO - 10.1023/A:1007631014630

M3 - Article

VL - 38

SP - 309

EP - 338

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 3

ER -