Relative risk and odds ratio: A data mining perspective

Haiquan Li, Jinyan Li, Limsoon Wong, Mengling Feng, Yap Peng Tan

Research output: Contribution to conferencePaper

Abstract

We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

Original languageEnglish (US)
Pages368-377
Number of pages10
StatePublished - Dec 1 2005
EventTwenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005 - Baltimore, MD, United States
Duration: Jun 13 2005Jun 15 2005

Other

OtherTwenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005
CountryUnited States
CityBaltimore, MD
Period6/13/056/15/05

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this

Li, H., Li, J., Wong, L., Feng, M., & Tan, Y. P. (2005). Relative risk and odds ratio: A data mining perspective. 368-377. Paper presented at Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005, Baltimore, MD, United States.