Building a large-scale testing dataset for conceptual semantic annotation of text

Xiao Wei, Dajun Zeng, Xiangfeng Luo, Wei Wu

Research output: Contribution to journalArticle

Abstract

One major obstacle facing the research on semantic annotation is lack of large-scale testing datasets. In this paper, we develop a systematic approach to constructing such datasets. This approach is based on guided ontology auto-construction and annotation methods which use little priori domain knowledge and little user knowledge in documents. We demonstrate the efficacy of the proposed approach by developing a large-scale testing dataset using information available from MeSH and PubMed. The developed testing dataset consists of a large-scale ontology, a large-scale set of annotated documents, and the baselines to evaluate the target algorithm, which can be employed to evaluate both the ontology construction algorithms and semantic annotation algorithms.

Original languageEnglish (US)
Pages (from-to)63-72
Number of pages10
JournalInternational Journal of Computational Science and Engineering
Volume16
Issue number1
DOIs
StatePublished - Jan 1 2018
Externally publishedYes

Fingerprint

Semantic Annotation
Ontology
Semantics
Testing
Evaluate
Domain Knowledge
Annotation
Efficacy
Baseline
Mesh
Target
Text
Demonstrate

Keywords

  • evaluation baseline
  • evaluation parameters
  • guided annotation method
  • MeSH
  • ontology auto-construction
  • ontology concept learning
  • priori knowledge
  • PubMed
  • semantic annotation
  • testing dataset

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Hardware and Architecture
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this

Building a large-scale testing dataset for conceptual semantic annotation of text. / Wei, Xiao; Zeng, Dajun; Luo, Xiangfeng; Wu, Wei.

In: International Journal of Computational Science and Engineering, Vol. 16, No. 1, 01.01.2018, p. 63-72.

Research output: Contribution to journalArticle

@article{9cfe4b08f910429abd20ec237daafa6c,
title = "Building a large-scale testing dataset for conceptual semantic annotation of text",
abstract = "One major obstacle facing the research on semantic annotation is lack of large-scale testing datasets. In this paper, we develop a systematic approach to constructing such datasets. This approach is based on guided ontology auto-construction and annotation methods which use little priori domain knowledge and little user knowledge in documents. We demonstrate the efficacy of the proposed approach by developing a large-scale testing dataset using information available from MeSH and PubMed. The developed testing dataset consists of a large-scale ontology, a large-scale set of annotated documents, and the baselines to evaluate the target algorithm, which can be employed to evaluate both the ontology construction algorithms and semantic annotation algorithms.",
keywords = "evaluation baseline, evaluation parameters, guided annotation method, MeSH, ontology auto-construction, ontology concept learning, priori knowledge, PubMed, semantic annotation, testing dataset",
author = "Xiao Wei and Dajun Zeng and Xiangfeng Luo and Wei Wu",
year = "2018",
month = "1",
day = "1",
doi = "10.1504/IJCSE.2018.089582",
language = "English (US)",
volume = "16",
pages = "63--72",
journal = "International Journal of Computational Science and Engineering",
issn = "1742-7185",
publisher = "Inderscience Enterprises Ltd",
number = "1",

}

TY - JOUR

T1 - Building a large-scale testing dataset for conceptual semantic annotation of text

AU - Wei, Xiao

AU - Zeng, Dajun

AU - Luo, Xiangfeng

AU - Wu, Wei

PY - 2018/1/1

Y1 - 2018/1/1

N2 - One major obstacle facing the research on semantic annotation is lack of large-scale testing datasets. In this paper, we develop a systematic approach to constructing such datasets. This approach is based on guided ontology auto-construction and annotation methods which use little priori domain knowledge and little user knowledge in documents. We demonstrate the efficacy of the proposed approach by developing a large-scale testing dataset using information available from MeSH and PubMed. The developed testing dataset consists of a large-scale ontology, a large-scale set of annotated documents, and the baselines to evaluate the target algorithm, which can be employed to evaluate both the ontology construction algorithms and semantic annotation algorithms.

AB - One major obstacle facing the research on semantic annotation is lack of large-scale testing datasets. In this paper, we develop a systematic approach to constructing such datasets. This approach is based on guided ontology auto-construction and annotation methods which use little priori domain knowledge and little user knowledge in documents. We demonstrate the efficacy of the proposed approach by developing a large-scale testing dataset using information available from MeSH and PubMed. The developed testing dataset consists of a large-scale ontology, a large-scale set of annotated documents, and the baselines to evaluate the target algorithm, which can be employed to evaluate both the ontology construction algorithms and semantic annotation algorithms.

KW - evaluation baseline

KW - evaluation parameters

KW - guided annotation method

KW - MeSH

KW - ontology auto-construction

KW - ontology concept learning

KW - priori knowledge

KW - PubMed

KW - semantic annotation

KW - testing dataset

UR - http://www.scopus.com/inward/record.url?scp=85041414846&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041414846&partnerID=8YFLogxK

U2 - 10.1504/IJCSE.2018.089582

DO - 10.1504/IJCSE.2018.089582

M3 - Article

AN - SCOPUS:85041414846

VL - 16

SP - 63

EP - 72

JO - International Journal of Computational Science and Engineering

JF - International Journal of Computational Science and Engineering

SN - 1742-7185

IS - 1

ER -