Minimum description length principle: Generators are preferable to closed patterns

Jinyan Li, Haiquan Li, Limsoon Wong, Jian Pei, Guozliu Dong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

41 Citations (Scopus)

Abstract

The generators and the unique closed pattern of an equivalence class of itemsets share a common set of transactions. The generators are the minimal ones among the equivalent itemsets, while the closed pattern is the maximum one. As a generator is usually smaller than the closed pattern in cardinality, by the Minimum Description Length Principle, the generator is preferable to the closed pattern in inductive inference and classification. To efficiently discover frequent generators from a large dataset, we develop a depth-first algorithm called Gr-growth. The idea is novel in contrast to traditional breadth-first bottom-up generator-mining algorithms. Our extensive performance study shows that Gr-growth is significantly faster (an order or even two orders of magnitudes when the support thresholds are low) than the existing generator mining algorithms. It can be also faster than the state-of-the-art frequent closed itemset mining algorithms such as FPclose and CLOSET+.

Original languageEnglish (US)
Title of host publicationProceedings of the National Conference on Artificial Intelligence
Pages409-414
Number of pages6
Volume1
StatePublished - 2006
Externally publishedYes
Event21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06 - Boston, MA, United States
Duration: Jul 16 2006Jul 20 2006

Other

Other21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
CountryUnited States
CityBoston, MA
Period7/16/067/20/06

Fingerprint

Equivalence classes

ASJC Scopus subject areas

  • Software

Cite this

Li, J., Li, H., Wong, L., Pei, J., & Dong, G. (2006). Minimum description length principle: Generators are preferable to closed patterns. In Proceedings of the National Conference on Artificial Intelligence (Vol. 1, pp. 409-414)

Minimum description length principle : Generators are preferable to closed patterns. / Li, Jinyan; Li, Haiquan; Wong, Limsoon; Pei, Jian; Dong, Guozliu.

Proceedings of the National Conference on Artificial Intelligence. Vol. 1 2006. p. 409-414.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, J, Li, H, Wong, L, Pei, J & Dong, G 2006, Minimum description length principle: Generators are preferable to closed patterns. in Proceedings of the National Conference on Artificial Intelligence. vol. 1, pp. 409-414, 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06, Boston, MA, United States, 7/16/06.
Li J, Li H, Wong L, Pei J, Dong G. Minimum description length principle: Generators are preferable to closed patterns. In Proceedings of the National Conference on Artificial Intelligence. Vol. 1. 2006. p. 409-414
Li, Jinyan ; Li, Haiquan ; Wong, Limsoon ; Pei, Jian ; Dong, Guozliu. / Minimum description length principle : Generators are preferable to closed patterns. Proceedings of the National Conference on Artificial Intelligence. Vol. 1 2006. pp. 409-414
@inproceedings{398b909e73e84afebd8c3ebde2ffe526,
title = "Minimum description length principle: Generators are preferable to closed patterns",
abstract = "The generators and the unique closed pattern of an equivalence class of itemsets share a common set of transactions. The generators are the minimal ones among the equivalent itemsets, while the closed pattern is the maximum one. As a generator is usually smaller than the closed pattern in cardinality, by the Minimum Description Length Principle, the generator is preferable to the closed pattern in inductive inference and classification. To efficiently discover frequent generators from a large dataset, we develop a depth-first algorithm called Gr-growth. The idea is novel in contrast to traditional breadth-first bottom-up generator-mining algorithms. Our extensive performance study shows that Gr-growth is significantly faster (an order or even two orders of magnitudes when the support thresholds are low) than the existing generator mining algorithms. It can be also faster than the state-of-the-art frequent closed itemset mining algorithms such as FPclose and CLOSET+.",
author = "Jinyan Li and Haiquan Li and Limsoon Wong and Jian Pei and Guozliu Dong",
year = "2006",
language = "English (US)",
isbn = "1577352815",
volume = "1",
pages = "409--414",
booktitle = "Proceedings of the National Conference on Artificial Intelligence",

}

TY - GEN

T1 - Minimum description length principle

T2 - Generators are preferable to closed patterns

AU - Li, Jinyan

AU - Li, Haiquan

AU - Wong, Limsoon

AU - Pei, Jian

AU - Dong, Guozliu

PY - 2006

Y1 - 2006

N2 - The generators and the unique closed pattern of an equivalence class of itemsets share a common set of transactions. The generators are the minimal ones among the equivalent itemsets, while the closed pattern is the maximum one. As a generator is usually smaller than the closed pattern in cardinality, by the Minimum Description Length Principle, the generator is preferable to the closed pattern in inductive inference and classification. To efficiently discover frequent generators from a large dataset, we develop a depth-first algorithm called Gr-growth. The idea is novel in contrast to traditional breadth-first bottom-up generator-mining algorithms. Our extensive performance study shows that Gr-growth is significantly faster (an order or even two orders of magnitudes when the support thresholds are low) than the existing generator mining algorithms. It can be also faster than the state-of-the-art frequent closed itemset mining algorithms such as FPclose and CLOSET+.

AB - The generators and the unique closed pattern of an equivalence class of itemsets share a common set of transactions. The generators are the minimal ones among the equivalent itemsets, while the closed pattern is the maximum one. As a generator is usually smaller than the closed pattern in cardinality, by the Minimum Description Length Principle, the generator is preferable to the closed pattern in inductive inference and classification. To efficiently discover frequent generators from a large dataset, we develop a depth-first algorithm called Gr-growth. The idea is novel in contrast to traditional breadth-first bottom-up generator-mining algorithms. Our extensive performance study shows that Gr-growth is significantly faster (an order or even two orders of magnitudes when the support thresholds are low) than the existing generator mining algorithms. It can be also faster than the state-of-the-art frequent closed itemset mining algorithms such as FPclose and CLOSET+.

UR - http://www.scopus.com/inward/record.url?scp=33750743089&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750743089&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33750743089

SN - 1577352815

SN - 9781577352815

VL - 1

SP - 409

EP - 414

BT - Proceedings of the National Conference on Artificial Intelligence

ER -