Recognizing ontology-applicable multiple-record web documents

David W. Embley, Yiu Kai Ng, Li Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Automatically recognizing which Web documents are “of interest” for some specified application is non-trivial. As a step toward solving this problem, we propose a technique for recognizing which multiple record Web documents apply to an ontologically specified application. Given the values and kinds of values recognized by an ontological specification in an unstructured Web document, we apply three heuristics: (1) a density heuristic that measures the percent of the document that appears to apply to an application ontology, (2) an expected-value heuristic that compares the number and kind of values found in a document to the number and kind expected by the application ontology, and (3) a grouping heuristic that considers whether the values of the document appear to be grouped as application-ontology records. Then, based on machine-learned rules over these heuristic measurements, we determine whether a Web document is applicable for a given ontology. Our experimental results show that we have been able to achieve over 90% for both recall and precision, with an F-measure of about 95%.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsSushil Jajodia, Hideko S. Kunii, Arne Solvberg
PublisherSpringer-Verlag
Pages555-570
Number of pages16
ISBN (Print)3540428666, 9783540428664
StatePublished - Jan 1 2001
Externally publishedYes
Event20th International Conference on Conceptual Modeling, ER 2001 - Yokohama, Japan
Duration: Nov 27 2001Nov 30 2001

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2224
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other20th International Conference on Conceptual Modeling, ER 2001
CountryJapan
CityYokohama
Period11/27/0111/30/01

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Recognizing ontology-applicable multiple-record web documents'. Together they form a unique fingerprint.

Cite this