Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Digitizing and repurposing taxonomic descriptions of living organisms is an urgent task facing biodiversity informatics researchers. Semantic annotation is the essential technology that makes taxonomic descriptions' reuse and repurpose possible. However, annotation systems performance often vary by collections. Given large content and structural variations inherent in different collections of taxonomic descriptions, this paper looks into corpus characteristic measures in an attempt to establish a performance prediction model which, when given a small set of samples, predicts a system's performance for a collection. The predication model helps deepen our understanding of strengths and weaknesses of an annotation system, but more importantly provides a valuable decision-making tool for end users. We started this research by using MARTT (Markuper for Taxonomic Treatments) system as a base. Although an universal performance predication model for all systems and all corpora may not be possible at this time, we hope more and more individual systems would offer such tools as a regular component in their delivery package.

Original languageEnglish (US)
Title of host publicationICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology
Pages92-96
Number of pages5
DOIs
StatePublished - 2010
Event2010 International Conference on Bioinformatics and Biomedical Technology, ICBBT 2010 - Chengdu, China
Duration: Apr 16 2010Apr 18 2010

Other

Other2010 International Conference on Bioinformatics and Biomedical Technology, ICBBT 2010
CountryChina
CityChengdu
Period4/16/104/18/10

Fingerprint

Informatics
Biodiversity
Semantics
Decision Making
Research Personnel
Technology
Research
Decision making

Keywords

  • Corpus characteristics
  • Performance evauation
  • Performance prediction
  • Semantic annotation systems

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Cui, H. (2010). Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions. In ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology (pp. 92-96). [5479002] https://doi.org/10.1109/ICBBT.2010.5479002

Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions. / Cui, Hong.

ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology. 2010. p. 92-96 5479002.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cui, H 2010, Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions. in ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology., 5479002, pp. 92-96, 2010 International Conference on Bioinformatics and Biomedical Technology, ICBBT 2010, Chengdu, China, 4/16/10. https://doi.org/10.1109/ICBBT.2010.5479002
Cui H. Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions. In ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology. 2010. p. 92-96. 5479002 https://doi.org/10.1109/ICBBT.2010.5479002
Cui, Hong. / Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions. ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology. 2010. pp. 92-96
@inproceedings{86ae4937ad4f4720be0d133409b65651,
title = "Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions",
abstract = "Digitizing and repurposing taxonomic descriptions of living organisms is an urgent task facing biodiversity informatics researchers. Semantic annotation is the essential technology that makes taxonomic descriptions' reuse and repurpose possible. However, annotation systems performance often vary by collections. Given large content and structural variations inherent in different collections of taxonomic descriptions, this paper looks into corpus characteristic measures in an attempt to establish a performance prediction model which, when given a small set of samples, predicts a system's performance for a collection. The predication model helps deepen our understanding of strengths and weaknesses of an annotation system, but more importantly provides a valuable decision-making tool for end users. We started this research by using MARTT (Markuper for Taxonomic Treatments) system as a base. Although an universal performance predication model for all systems and all corpora may not be possible at this time, we hope more and more individual systems would offer such tools as a regular component in their delivery package.",
keywords = "Corpus characteristics, Performance evauation, Performance prediction, Semantic annotation systems",
author = "Hong Cui",
year = "2010",
doi = "10.1109/ICBBT.2010.5479002",
language = "English (US)",
isbn = "9781424467761",
pages = "92--96",
booktitle = "ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology",

}

TY - GEN

T1 - Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions

AU - Cui, Hong

PY - 2010

Y1 - 2010

N2 - Digitizing and repurposing taxonomic descriptions of living organisms is an urgent task facing biodiversity informatics researchers. Semantic annotation is the essential technology that makes taxonomic descriptions' reuse and repurpose possible. However, annotation systems performance often vary by collections. Given large content and structural variations inherent in different collections of taxonomic descriptions, this paper looks into corpus characteristic measures in an attempt to establish a performance prediction model which, when given a small set of samples, predicts a system's performance for a collection. The predication model helps deepen our understanding of strengths and weaknesses of an annotation system, but more importantly provides a valuable decision-making tool for end users. We started this research by using MARTT (Markuper for Taxonomic Treatments) system as a base. Although an universal performance predication model for all systems and all corpora may not be possible at this time, we hope more and more individual systems would offer such tools as a regular component in their delivery package.

AB - Digitizing and repurposing taxonomic descriptions of living organisms is an urgent task facing biodiversity informatics researchers. Semantic annotation is the essential technology that makes taxonomic descriptions' reuse and repurpose possible. However, annotation systems performance often vary by collections. Given large content and structural variations inherent in different collections of taxonomic descriptions, this paper looks into corpus characteristic measures in an attempt to establish a performance prediction model which, when given a small set of samples, predicts a system's performance for a collection. The predication model helps deepen our understanding of strengths and weaknesses of an annotation system, but more importantly provides a valuable decision-making tool for end users. We started this research by using MARTT (Markuper for Taxonomic Treatments) system as a base. Although an universal performance predication model for all systems and all corpora may not be possible at this time, we hope more and more individual systems would offer such tools as a regular component in their delivery package.

KW - Corpus characteristics

KW - Performance evauation

KW - Performance prediction

KW - Semantic annotation systems

UR - http://www.scopus.com/inward/record.url?scp=77954480345&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954480345&partnerID=8YFLogxK

U2 - 10.1109/ICBBT.2010.5479002

DO - 10.1109/ICBBT.2010.5479002

M3 - Conference contribution

SN - 9781424467761

SP - 92

EP - 96

BT - ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology

ER -