An investigation of coreference phenomena in the biomedical domain

Dane Bell, Gus Hahn-Powell, Marco A. Valenzuela-Escárcega, Mihai Surdeanu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed "sieves", with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2% increase in throughput to our Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
PublisherEuropean Language Resources Association (ELRA)
Pages177-183
Number of pages7
ISBN (Electronic)9782951740891
StatePublished - Jan 1 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: May 23 2016May 28 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
CountrySlovenia
CityPortoroz
Period5/23/165/28/16

Fingerprint

event
knowledge
gender
Coreference
Informing
Syntax
Entity

Keywords

  • Biomedical text mining
  • Coreference resolution
  • Information extraction

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Cite this

Bell, D., Hahn-Powell, G., Valenzuela-Escárcega, M. A., & Surdeanu, M. (2016). An investigation of coreference phenomena in the biomedical domain. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp. 177-183). European Language Resources Association (ELRA).

An investigation of coreference phenomena in the biomedical domain. / Bell, Dane; Hahn-Powell, Gus; Valenzuela-Escárcega, Marco A.; Surdeanu, Mihai.

Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), 2016. p. 177-183.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bell, D, Hahn-Powell, G, Valenzuela-Escárcega, MA & Surdeanu, M 2016, An investigation of coreference phenomena in the biomedical domain. in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), pp. 177-183, 10th International Conference on Language Resources and Evaluation, LREC 2016, Portoroz, Slovenia, 5/23/16.
Bell D, Hahn-Powell G, Valenzuela-Escárcega MA, Surdeanu M. An investigation of coreference phenomena in the biomedical domain. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA). 2016. p. 177-183
Bell, Dane ; Hahn-Powell, Gus ; Valenzuela-Escárcega, Marco A. ; Surdeanu, Mihai. / An investigation of coreference phenomena in the biomedical domain. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), 2016. pp. 177-183
@inproceedings{546cc0f38a744fd2a5c3fcc1ff22ad3a,
title = "An investigation of coreference phenomena in the biomedical domain",
abstract = "We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed {"}sieves{"}, with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2{\%} increase in throughput to our Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.",
keywords = "Biomedical text mining, Coreference resolution, Information extraction",
author = "Dane Bell and Gus Hahn-Powell and Valenzuela-Esc{\'a}rcega, {Marco A.} and Mihai Surdeanu",
year = "2016",
month = "1",
day = "1",
language = "English (US)",
pages = "177--183",
booktitle = "Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - An investigation of coreference phenomena in the biomedical domain

AU - Bell, Dane

AU - Hahn-Powell, Gus

AU - Valenzuela-Escárcega, Marco A.

AU - Surdeanu, Mihai

PY - 2016/1/1

Y1 - 2016/1/1

N2 - We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed "sieves", with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2% increase in throughput to our Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.

AB - We describe challenges and advantages unique to coreference resolution in the biomedical domain, and a sieve-based architecture that leverages domain knowledge for both entity and event coreference resolution. Domain-general coreference resolution algorithms perform poorly on biomedical documents, because the cues they rely on such as gender are largely absent in this domain, and because they do not encode domain-specific knowledge such as the number and type of participants required in chemical reactions. Moreover, it is difficult to directly encode this knowledge into most coreference resolution algorithms because they are not rule-based. Our rule-based architecture uses sequentially applied hand-designed "sieves", with the output of each sieve informing and constraining subsequent sieves. This architecture provides a 3.2% increase in throughput to our Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.

KW - Biomedical text mining

KW - Coreference resolution

KW - Information extraction

UR - http://www.scopus.com/inward/record.url?scp=85021672691&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021672691&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85021672691

SP - 177

EP - 183

BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

PB - European Language Resources Association (ELRA)

ER -