Disambiguating ambiguous biomedical terms in biomedical narrative text

An unsupervised method

Hongfang Liu, Yves A Lussier, Carol Friedman

Research output: Contribution to journalArticle

53 Citations (Scopus)

Abstract

With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.

Original languageEnglish (US)
Pages (from-to)249-261
Number of pages13
JournalJournal of Biomedical Informatics
Volume34
Issue number4
DOIs
StatePublished - 2001
Externally publishedYes

Fingerprint

Classifiers
Natural Language Processing
Vocabulary
Information Storage and Retrieval
Maintenance
Processing
Experiments

Keywords

  • Corpus-based machine learning
  • MedLEE
  • MEDLINE
  • Natural language processing
  • UMLS
  • Word sense disambiguation

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Disambiguating ambiguous biomedical terms in biomedical narrative text : An unsupervised method. / Liu, Hongfang; Lussier, Yves A; Friedman, Carol.

In: Journal of Biomedical Informatics, Vol. 34, No. 4, 2001, p. 249-261.

Research output: Contribution to journalArticle

@article{6460be39a7e343599dddfca411e4f783,
title = "Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method",
abstract = "With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97{\%}, with greater than 90{\%} accuracy for each individual ambiguous term.",
keywords = "Corpus-based machine learning, MedLEE, MEDLINE, Natural language processing, UMLS, Word sense disambiguation",
author = "Hongfang Liu and Lussier, {Yves A} and Carol Friedman",
year = "2001",
doi = "10.1006/jbin.2001.1023",
language = "English (US)",
volume = "34",
pages = "249--261",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - Disambiguating ambiguous biomedical terms in biomedical narrative text

T2 - An unsupervised method

AU - Liu, Hongfang

AU - Lussier, Yves A

AU - Friedman, Carol

PY - 2001

Y1 - 2001

N2 - With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.

AB - With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.

KW - Corpus-based machine learning

KW - MedLEE

KW - MEDLINE

KW - Natural language processing

KW - UMLS

KW - Word sense disambiguation

UR - http://www.scopus.com/inward/record.url?scp=0035564886&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035564886&partnerID=8YFLogxK

U2 - 10.1006/jbin.2001.1023

DO - 10.1006/jbin.2001.1023

M3 - Article

VL - 34

SP - 249

EP - 261

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 4

ER -