Recognizing ontology-applicable multiple-record web documents

David W. Embley, Yiu Kai Ng, Li - Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Automatically recognizing which Web documents are “of interest” for some specified application is non-trivial. As a step toward solving this problem, we propose a technique for recognizing which multiple record Web documents apply to an ontologically specified application. Given the values and kinds of values recognized by an ontological specification in an unstructured Web document, we apply three heuristics: (1) a density heuristic that measures the percent of the document that appears to apply to an application ontology, (2) an expected-value heuristic that compares the number and kind of values found in a document to the number and kind expected by the application ontology, and (3) a grouping heuristic that considers whether the values of the document appear to be grouped as application-ontology records. Then, based on machine-learned rules over these heuristic measurements, we determine whether a Web document is applicable for a given ontology. Our experimental results show that we have been able to achieve over 90% for both recall and precision, with an F-measure of about 95%.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages555-570
Number of pages16
Volume2224
ISBN (Print)3540428666, 9783540428664
StatePublished - 2001
Externally publishedYes
Event20th International Conference on Conceptual Modeling, ER 2001 - Yokohama, Japan
Duration: Nov 27 2001Nov 30 2001

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2224
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other20th International Conference on Conceptual Modeling, ER 2001
CountryJapan
CityYokohama
Period11/27/0111/30/01

Fingerprint

Ontology
Heuristics
Expected Value
Grouping
Percent
Specification
Specifications
Experimental Results

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Embley, D. W., Ng, Y. K., & Xu, L. . (2001). Recognizing ontology-applicable multiple-record web documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2224, pp. 555-570). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2224). Springer Verlag.

Recognizing ontology-applicable multiple-record web documents. / Embley, David W.; Ng, Yiu Kai; Xu, Li -.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2224 Springer Verlag, 2001. p. 555-570 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2224).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Embley, DW, Ng, YK & Xu, L 2001, Recognizing ontology-applicable multiple-record web documents. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 2224, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2224, Springer Verlag, pp. 555-570, 20th International Conference on Conceptual Modeling, ER 2001, Yokohama, Japan, 11/27/01.
Embley DW, Ng YK, Xu L. Recognizing ontology-applicable multiple-record web documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2224. Springer Verlag. 2001. p. 555-570. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Embley, David W. ; Ng, Yiu Kai ; Xu, Li -. / Recognizing ontology-applicable multiple-record web documents. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2224 Springer Verlag, 2001. pp. 555-570 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{7ad8902faa074ecf812e828c7287a348,
title = "Recognizing ontology-applicable multiple-record web documents",
abstract = "Automatically recognizing which Web documents are “of interest” for some specified application is non-trivial. As a step toward solving this problem, we propose a technique for recognizing which multiple record Web documents apply to an ontologically specified application. Given the values and kinds of values recognized by an ontological specification in an unstructured Web document, we apply three heuristics: (1) a density heuristic that measures the percent of the document that appears to apply to an application ontology, (2) an expected-value heuristic that compares the number and kind of values found in a document to the number and kind expected by the application ontology, and (3) a grouping heuristic that considers whether the values of the document appear to be grouped as application-ontology records. Then, based on machine-learned rules over these heuristic measurements, we determine whether a Web document is applicable for a given ontology. Our experimental results show that we have been able to achieve over 90{\%} for both recall and precision, with an F-measure of about 95{\%}.",
author = "Embley, {David W.} and Ng, {Yiu Kai} and Xu, {Li -}",
year = "2001",
language = "English (US)",
isbn = "3540428666",
volume = "2224",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "555--570",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Recognizing ontology-applicable multiple-record web documents

AU - Embley, David W.

AU - Ng, Yiu Kai

AU - Xu, Li -

PY - 2001

Y1 - 2001

N2 - Automatically recognizing which Web documents are “of interest” for some specified application is non-trivial. As a step toward solving this problem, we propose a technique for recognizing which multiple record Web documents apply to an ontologically specified application. Given the values and kinds of values recognized by an ontological specification in an unstructured Web document, we apply three heuristics: (1) a density heuristic that measures the percent of the document that appears to apply to an application ontology, (2) an expected-value heuristic that compares the number and kind of values found in a document to the number and kind expected by the application ontology, and (3) a grouping heuristic that considers whether the values of the document appear to be grouped as application-ontology records. Then, based on machine-learned rules over these heuristic measurements, we determine whether a Web document is applicable for a given ontology. Our experimental results show that we have been able to achieve over 90% for both recall and precision, with an F-measure of about 95%.

AB - Automatically recognizing which Web documents are “of interest” for some specified application is non-trivial. As a step toward solving this problem, we propose a technique for recognizing which multiple record Web documents apply to an ontologically specified application. Given the values and kinds of values recognized by an ontological specification in an unstructured Web document, we apply three heuristics: (1) a density heuristic that measures the percent of the document that appears to apply to an application ontology, (2) an expected-value heuristic that compares the number and kind of values found in a document to the number and kind expected by the application ontology, and (3) a grouping heuristic that considers whether the values of the document appear to be grouped as application-ontology records. Then, based on machine-learned rules over these heuristic measurements, we determine whether a Web document is applicable for a given ontology. Our experimental results show that we have been able to achieve over 90% for both recall and precision, with an F-measure of about 95%.

UR - http://www.scopus.com/inward/record.url?scp=84884496282&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84884496282&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540428666

SN - 9783540428664

VL - 2224

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 555

EP - 570

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -