An overview of the BioCreative 2012 Workshop Track III

Interactive text mining task

Cecilia N. Arighi, Ben Carterette, K. Bretonnel Cohen, Martin Krallinger, W. John Wilbur, Petra Fey, Robert Dodson, Laurel Cooper, Ceri E. Van Slyke, Wasila Dahdul, Paula Mabee, Donghui Li, Bethany Harris, Marc Gillespie, Silvia Jimenez, Phoebe Roberts, Lisa Matthews, Kevin Becker, Harold Drabkin, Susan Bello & 21 others Luana Licata, Andrew Chatr-aryamontri, Mary L. Schaeffer, Julie Park, Melissa Haendel, Kimberly Van Auken, Yuling Li, Juancarlos Chan, Hans Michael Muller, Hong Cui, James P. Balhoff, Johnny Chi Yang Wu, Zhiyong Lu, Chih Hsuan Wei, Catalina O. Tudor, Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan, Juan Miguel Cejuela, Pratibha Dubey, Cathy Wu

Research output: Contribution to journalArticle

39 Citations (Scopus)

Abstract

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

Original languageEnglish (US)
Article numberbas056
JournalDatabase
Volume2013
DOIs
StatePublished - 2013

Fingerprint

Data Mining
Education
System program documentation
Databases
Guidelines
user interface
computer science
Linguistics
Documentation
Computer science
User interfaces
testing
Biological Sciences
Testing

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Information Systems
  • Medicine(all)

Cite this

Arighi, C. N., Carterette, B., Cohen, K. B., Krallinger, M., Wilbur, W. J., Fey, P., ... Wu, C. (2013). An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task. Database, 2013, [bas056]. https://doi.org/10.1093/database/bas056

An overview of the BioCreative 2012 Workshop Track III : Interactive text mining task. / Arighi, Cecilia N.; Carterette, Ben; Cohen, K. Bretonnel; Krallinger, Martin; Wilbur, W. John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E.; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L.; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans Michael; Cui, Hong; Balhoff, James P.; Wu, Johnny Chi Yang; Lu, Zhiyong; Wei, Chih Hsuan; Tudor, Catalina O.; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy.

In: Database, Vol. 2013, bas056, 2013.

Research output: Contribution to journalArticle

Arighi, CN, Carterette, B, Cohen, KB, Krallinger, M, Wilbur, WJ, Fey, P, Dodson, R, Cooper, L, Van Slyke, CE, Dahdul, W, Mabee, P, Li, D, Harris, B, Gillespie, M, Jimenez, S, Roberts, P, Matthews, L, Becker, K, Drabkin, H, Bello, S, Licata, L, Chatr-aryamontri, A, Schaeffer, ML, Park, J, Haendel, M, Van Auken, K, Li, Y, Chan, J, Muller, HM, Cui, H, Balhoff, JP, Wu, JCY, Lu, Z, Wei, CH, Tudor, CO, Raja, K, Subramani, S, Natarajan, J, Cejuela, JM, Dubey, P & Wu, C 2013, 'An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task', Database, vol. 2013, bas056. https://doi.org/10.1093/database/bas056
Arighi CN, Carterette B, Cohen KB, Krallinger M, Wilbur WJ, Fey P et al. An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task. Database. 2013;2013. bas056. https://doi.org/10.1093/database/bas056
Arighi, Cecilia N. ; Carterette, Ben ; Cohen, K. Bretonnel ; Krallinger, Martin ; Wilbur, W. John ; Fey, Petra ; Dodson, Robert ; Cooper, Laurel ; Van Slyke, Ceri E. ; Dahdul, Wasila ; Mabee, Paula ; Li, Donghui ; Harris, Bethany ; Gillespie, Marc ; Jimenez, Silvia ; Roberts, Phoebe ; Matthews, Lisa ; Becker, Kevin ; Drabkin, Harold ; Bello, Susan ; Licata, Luana ; Chatr-aryamontri, Andrew ; Schaeffer, Mary L. ; Park, Julie ; Haendel, Melissa ; Van Auken, Kimberly ; Li, Yuling ; Chan, Juancarlos ; Muller, Hans Michael ; Cui, Hong ; Balhoff, James P. ; Wu, Johnny Chi Yang ; Lu, Zhiyong ; Wei, Chih Hsuan ; Tudor, Catalina O. ; Raja, Kalpana ; Subramani, Suresh ; Natarajan, Jeyakumar ; Cejuela, Juan Miguel ; Dubey, Pratibha ; Wu, Cathy. / An overview of the BioCreative 2012 Workshop Track III : Interactive text mining task. In: Database. 2013 ; Vol. 2013.
@article{8c4314ce4c054c6289d6312141743f5d,
title = "An overview of the BioCreative 2012 Workshop Track III: Interactive text mining task",
abstract = "In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.",
author = "Arighi, {Cecilia N.} and Ben Carterette and Cohen, {K. Bretonnel} and Martin Krallinger and Wilbur, {W. John} and Petra Fey and Robert Dodson and Laurel Cooper and {Van Slyke}, {Ceri E.} and Wasila Dahdul and Paula Mabee and Donghui Li and Bethany Harris and Marc Gillespie and Silvia Jimenez and Phoebe Roberts and Lisa Matthews and Kevin Becker and Harold Drabkin and Susan Bello and Luana Licata and Andrew Chatr-aryamontri and Schaeffer, {Mary L.} and Julie Park and Melissa Haendel and {Van Auken}, Kimberly and Yuling Li and Juancarlos Chan and Muller, {Hans Michael} and Hong Cui and Balhoff, {James P.} and Wu, {Johnny Chi Yang} and Zhiyong Lu and Wei, {Chih Hsuan} and Tudor, {Catalina O.} and Kalpana Raja and Suresh Subramani and Jeyakumar Natarajan and Cejuela, {Juan Miguel} and Pratibha Dubey and Cathy Wu",
year = "2013",
doi = "10.1093/database/bas056",
language = "English (US)",
volume = "2013",
journal = "Database : the journal of biological databases and curation",
issn = "1758-0463",
publisher = "Oxford University Press",

}

TY - JOUR

T1 - An overview of the BioCreative 2012 Workshop Track III

T2 - Interactive text mining task

AU - Arighi, Cecilia N.

AU - Carterette, Ben

AU - Cohen, K. Bretonnel

AU - Krallinger, Martin

AU - Wilbur, W. John

AU - Fey, Petra

AU - Dodson, Robert

AU - Cooper, Laurel

AU - Van Slyke, Ceri E.

AU - Dahdul, Wasila

AU - Mabee, Paula

AU - Li, Donghui

AU - Harris, Bethany

AU - Gillespie, Marc

AU - Jimenez, Silvia

AU - Roberts, Phoebe

AU - Matthews, Lisa

AU - Becker, Kevin

AU - Drabkin, Harold

AU - Bello, Susan

AU - Licata, Luana

AU - Chatr-aryamontri, Andrew

AU - Schaeffer, Mary L.

AU - Park, Julie

AU - Haendel, Melissa

AU - Van Auken, Kimberly

AU - Li, Yuling

AU - Chan, Juancarlos

AU - Muller, Hans Michael

AU - Cui, Hong

AU - Balhoff, James P.

AU - Wu, Johnny Chi Yang

AU - Lu, Zhiyong

AU - Wei, Chih Hsuan

AU - Tudor, Catalina O.

AU - Raja, Kalpana

AU - Subramani, Suresh

AU - Natarajan, Jeyakumar

AU - Cejuela, Juan Miguel

AU - Dubey, Pratibha

AU - Wu, Cathy

PY - 2013

Y1 - 2013

N2 - In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

AB - In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

UR - http://www.scopus.com/inward/record.url?scp=84879330505&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879330505&partnerID=8YFLogxK

U2 - 10.1093/database/bas056

DO - 10.1093/database/bas056

M3 - Article

VL - 2013

JO - Database : the journal of biological databases and curation

JF - Database : the journal of biological databases and curation

SN - 1758-0463

M1 - bas056

ER -