CharaParser+EQ: Performance evaluation without gold standard

Hong Cui, Wasila Dahdul, Alexander T. Dececchi, Nizar Ibrahim, Paula Mabee, James P. Balhoff, Hariharan Gopalakrishnan

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

To make phenotypic characters of organisms widely useful for computerized biology research, biocurators manually convert character descriptions to a structured format, for example the Entity-Quality (EQ) format. The manual approach is time consuming and affected by inter-curator variations. In this paper we report a software application, CharaParser+EQ, to our knowledge the first software that produces EQ statements from textual character descriptions. We report a recent experiment that evaluates the performance of the software against three experienced biocurators. While the software is still far from being able to compete with biocurators on this highly intellectual task, the results show (1) CharaParser+EQ's performance (precision and recall) is greatly improved compared to a previous version, (2) the completeness of the ontologies used in the process has significant impact both on the software's EQ generation performance and on the agreement among curators, and (3) unlimited access to external knowledge (published papers, books) by curators has no significant impact on inter-curator agreements. A detailed error analysis that compares machine and curator generated EQs is included.

Original languageEnglish (US)
Pages (from-to)1-10
Number of pages10
JournalProceedings of the Association for Information Science and Technology
Volume52
Issue number1
DOIs
StatePublished - 2015

Fingerprint

gold standard
Application programs
Error analysis
Ontology
evaluation
performance
Experiments
ontology
biology
software
experiment

Keywords

  • curation inconsistency
  • EQ statements
  • Natural Language Processing
  • ontology search
  • Phenotype character curation

ASJC Scopus subject areas

  • Computer Science(all)
  • Library and Information Sciences

Cite this

CharaParser+EQ : Performance evaluation without gold standard. / Cui, Hong; Dahdul, Wasila; Dececchi, Alexander T.; Ibrahim, Nizar; Mabee, Paula; Balhoff, James P.; Gopalakrishnan, Hariharan.

In: Proceedings of the Association for Information Science and Technology, Vol. 52, No. 1, 2015, p. 1-10.

Research output: Contribution to journalArticle

Cui, Hong ; Dahdul, Wasila ; Dececchi, Alexander T. ; Ibrahim, Nizar ; Mabee, Paula ; Balhoff, James P. ; Gopalakrishnan, Hariharan. / CharaParser+EQ : Performance evaluation without gold standard. In: Proceedings of the Association for Information Science and Technology. 2015 ; Vol. 52, No. 1. pp. 1-10.
@article{f3b337c159ab42648a9d6aeaf464e0ee,
title = "CharaParser+EQ: Performance evaluation without gold standard",
abstract = "To make phenotypic characters of organisms widely useful for computerized biology research, biocurators manually convert character descriptions to a structured format, for example the Entity-Quality (EQ) format. The manual approach is time consuming and affected by inter-curator variations. In this paper we report a software application, CharaParser+EQ, to our knowledge the first software that produces EQ statements from textual character descriptions. We report a recent experiment that evaluates the performance of the software against three experienced biocurators. While the software is still far from being able to compete with biocurators on this highly intellectual task, the results show (1) CharaParser+EQ's performance (precision and recall) is greatly improved compared to a previous version, (2) the completeness of the ontologies used in the process has significant impact both on the software's EQ generation performance and on the agreement among curators, and (3) unlimited access to external knowledge (published papers, books) by curators has no significant impact on inter-curator agreements. A detailed error analysis that compares machine and curator generated EQs is included.",
keywords = "curation inconsistency, EQ statements, Natural Language Processing, ontology search, Phenotype character curation",
author = "Hong Cui and Wasila Dahdul and Dececchi, {Alexander T.} and Nizar Ibrahim and Paula Mabee and Balhoff, {James P.} and Hariharan Gopalakrishnan",
year = "2015",
doi = "10.1002/pra2.2015.145052010020",
language = "English (US)",
volume = "52",
pages = "1--10",
journal = "Proceedings of the Association for Information Science and Technology",
issn = "2373-9231",
publisher = "John Wiley and Sons Inc.",
number = "1",

}

TY - JOUR

T1 - CharaParser+EQ

T2 - Performance evaluation without gold standard

AU - Cui, Hong

AU - Dahdul, Wasila

AU - Dececchi, Alexander T.

AU - Ibrahim, Nizar

AU - Mabee, Paula

AU - Balhoff, James P.

AU - Gopalakrishnan, Hariharan

PY - 2015

Y1 - 2015

N2 - To make phenotypic characters of organisms widely useful for computerized biology research, biocurators manually convert character descriptions to a structured format, for example the Entity-Quality (EQ) format. The manual approach is time consuming and affected by inter-curator variations. In this paper we report a software application, CharaParser+EQ, to our knowledge the first software that produces EQ statements from textual character descriptions. We report a recent experiment that evaluates the performance of the software against three experienced biocurators. While the software is still far from being able to compete with biocurators on this highly intellectual task, the results show (1) CharaParser+EQ's performance (precision and recall) is greatly improved compared to a previous version, (2) the completeness of the ontologies used in the process has significant impact both on the software's EQ generation performance and on the agreement among curators, and (3) unlimited access to external knowledge (published papers, books) by curators has no significant impact on inter-curator agreements. A detailed error analysis that compares machine and curator generated EQs is included.

AB - To make phenotypic characters of organisms widely useful for computerized biology research, biocurators manually convert character descriptions to a structured format, for example the Entity-Quality (EQ) format. The manual approach is time consuming and affected by inter-curator variations. In this paper we report a software application, CharaParser+EQ, to our knowledge the first software that produces EQ statements from textual character descriptions. We report a recent experiment that evaluates the performance of the software against three experienced biocurators. While the software is still far from being able to compete with biocurators on this highly intellectual task, the results show (1) CharaParser+EQ's performance (precision and recall) is greatly improved compared to a previous version, (2) the completeness of the ontologies used in the process has significant impact both on the software's EQ generation performance and on the agreement among curators, and (3) unlimited access to external knowledge (published papers, books) by curators has no significant impact on inter-curator agreements. A detailed error analysis that compares machine and curator generated EQs is included.

KW - curation inconsistency

KW - EQ statements

KW - Natural Language Processing

KW - ontology search

KW - Phenotype character curation

UR - http://www.scopus.com/inward/record.url?scp=84987732864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84987732864&partnerID=8YFLogxK

U2 - 10.1002/pra2.2015.145052010020

DO - 10.1002/pra2.2015.145052010020

M3 - Article

AN - SCOPUS:84987732864

VL - 52

SP - 1

EP - 10

JO - Proceedings of the Association for Information Science and Technology

JF - Proceedings of the Association for Information Science and Technology

SN - 2373-9231

IS - 1

ER -