Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Digitizing and repurposing taxonomic descriptions of living organisms is an urgent task facing biodiversity informatics researchers. Semantic annotation is the essential technology that makes taxonomic descriptions' reuse and repurpose possible. However, annotation systems performance often vary by collections. Given large content and structural variations inherent in different collections of taxonomic descriptions, this paper looks into corpus characteristic measures in an attempt to establish a performance prediction model which, when given a small set of samples, predicts a system's performance for a collection. The predication model helps deepen our understanding of strengths and weaknesses of an annotation system, but more importantly provides a valuable decision-making tool for end users. We started this research by using MARTT (Markuper for Taxonomic Treatments) system as a base. Although an universal performance predication model for all systems and all corpora may not be possible at this time, we hope more and more individual systems would offer such tools as a regular component in their delivery package.

Original languageEnglish (US)
Title of host publicationICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology
Pages92-96
Number of pages5
DOIs
StatePublished - Jul 16 2010
Event2010 International Conference on Bioinformatics and Biomedical Technology, ICBBT 2010 - Chengdu, China
Duration: Apr 16 2010Apr 18 2010

Publication series

NameICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology

Other

Other2010 International Conference on Bioinformatics and Biomedical Technology, ICBBT 2010
CountryChina
CityChengdu
Period4/16/104/18/10

Keywords

  • Corpus characteristics
  • Performance evauation
  • Performance prediction
  • Semantic annotation systems

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Fingerprint Dive into the research topics of 'Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions'. Together they form a unique fingerprint.

  • Cite this

    Cui, H. (2010). Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions. In ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology (pp. 92-96). [5479002] (ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology). https://doi.org/10.1109/ICBBT.2010.5479002