Relative risk and odds ratio: A data mining perspective

Haiquan Li, Jinyan Li, Limsoon Wong, Mengling Feng, Yap Peng Tan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Citations (Scopus)

Abstract

We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
Pages368-377
Number of pages10
StatePublished - 2005
Externally publishedYes
EventTwenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005 - Baltimore, MD, United States
Duration: Jun 13 2005Jun 15 2005

Other

OtherTwenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005
CountryUnited States
CityBaltimore, MD
Period6/13/056/15/05

Fingerprint

Data mining
Labels
Acoustic waves

ASJC Scopus subject areas

  • Software

Cite this

Li, H., Li, J., Wong, L., Feng, M., & Tan, Y. P. (2005). Relative risk and odds ratio: A data mining perspective. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 368-377)

Relative risk and odds ratio : A data mining perspective. / Li, Haiquan; Li, Jinyan; Wong, Limsoon; Feng, Mengling; Tan, Yap Peng.

Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 2005. p. 368-377.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, H, Li, J, Wong, L, Feng, M & Tan, YP 2005, Relative risk and odds ratio: A data mining perspective. in Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. pp. 368-377, Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005, Baltimore, MD, United States, 6/13/05.
Li H, Li J, Wong L, Feng M, Tan YP. Relative risk and odds ratio: A data mining perspective. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 2005. p. 368-377
Li, Haiquan ; Li, Jinyan ; Wong, Limsoon ; Feng, Mengling ; Tan, Yap Peng. / Relative risk and odds ratio : A data mining perspective. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 2005. pp. 368-377
@inproceedings{7a545f91169444f98c37d6ee8a4ba763,
title = "Relative risk and odds ratio: A data mining perspective",
abstract = "We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of {"}relative risk{"}: What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of {"}odds ratio{"}: The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.",
author = "Haiquan Li and Jinyan Li and Limsoon Wong and Mengling Feng and Tan, {Yap Peng}",
year = "2005",
language = "English (US)",
pages = "368--377",
booktitle = "Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems",

}

TY - GEN

T1 - Relative risk and odds ratio

T2 - A data mining perspective

AU - Li, Haiquan

AU - Li, Jinyan

AU - Wong, Limsoon

AU - Feng, Mengling

AU - Tan, Yap Peng

PY - 2005

Y1 - 2005

N2 - We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

AB - We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

UR - http://www.scopus.com/inward/record.url?scp=33244461081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33244461081&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33244461081

SP - 368

EP - 377

BT - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

ER -