### Abstract

We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems |

Pages | 368-377 |

Number of pages | 10 |

State | Published - 2005 |

Externally published | Yes |

Event | Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005 - Baltimore, MD, United States Duration: Jun 13 2005 → Jun 15 2005 |

### Other

Other | Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005 |
---|---|

Country | United States |

City | Baltimore, MD |

Period | 6/13/05 → 6/15/05 |

### Fingerprint

### ASJC Scopus subject areas

- Software

### Cite this

*Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems*(pp. 368-377)

**Relative risk and odds ratio : A data mining perspective.** / Li, Haiquan; Li, Jinyan; Wong, Limsoon; Feng, Mengling; Tan, Yap Peng.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.*pp. 368-377, Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005, Baltimore, MD, United States, 6/13/05.

}

TY - GEN

T1 - Relative risk and odds ratio

T2 - A data mining perspective

AU - Li, Haiquan

AU - Li, Jinyan

AU - Wong, Limsoon

AU - Feng, Mengling

AU - Tan, Yap Peng

PY - 2005

Y1 - 2005

N2 - We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

AB - We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

UR - http://www.scopus.com/inward/record.url?scp=33244461081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33244461081&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33244461081

SP - 368

EP - 377

BT - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

ER -