Automated encoding of clinical documents based on natural language processing

Carol Friedman, Lyudmila Shagina, Yves A Lussier, George Hripcsak

Research output: Contribution to journalArticle

258 Citations (Scopus)

Abstract

The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method.An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts.Recall of the system for UMLS coding of all terms was. 77 (95% CI. 72-.81), and for coding terms that had corresponding UMLS codes recall was. 83 (.79-.87). Recall of the system for extracting all terms was. 84 (.81-.88). Recall of the experts ranged from. 69 to. 91 for extracting terms. The precision of the system was. 89 (.87-.91), and precision of the experts ranged from. 61 to. 91.Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.

Original languageEnglish (US)
Pages (from-to)392-402
Number of pages11
JournalJournal of the American Medical Informatics Association
Volume11
Issue number5
DOIs
StatePublished - Sep 2004
Externally publishedYes

Fingerprint

Natural Language Processing
Unified Medical Language System

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Automated encoding of clinical documents based on natural language processing. / Friedman, Carol; Shagina, Lyudmila; Lussier, Yves A; Hripcsak, George.

In: Journal of the American Medical Informatics Association, Vol. 11, No. 5, 09.2004, p. 392-402.

Research output: Contribution to journalArticle

Friedman, Carol ; Shagina, Lyudmila ; Lussier, Yves A ; Hripcsak, George. / Automated encoding of clinical documents based on natural language processing. In: Journal of the American Medical Informatics Association. 2004 ; Vol. 11, No. 5. pp. 392-402.
@article{620d6e8ff245450180b75596cad193f5,
title = "Automated encoding of clinical documents based on natural language processing",
abstract = "The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method.An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts.Recall of the system for UMLS coding of all terms was. 77 (95{\%} CI. 72-.81), and for coding terms that had corresponding UMLS codes recall was. 83 (.79-.87). Recall of the system for extracting all terms was. 84 (.81-.88). Recall of the experts ranged from. 69 to. 91 for extracting terms. The precision of the system was. 89 (.87-.91), and precision of the experts ranged from. 61 to. 91.Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.",
author = "Carol Friedman and Lyudmila Shagina and Lussier, {Yves A} and George Hripcsak",
year = "2004",
month = "9",
doi = "10.1197/jamia.M1552",
language = "English (US)",
volume = "11",
pages = "392--402",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Automated encoding of clinical documents based on natural language processing

AU - Friedman, Carol

AU - Shagina, Lyudmila

AU - Lussier, Yves A

AU - Hripcsak, George

PY - 2004/9

Y1 - 2004/9

N2 - The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method.An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts.Recall of the system for UMLS coding of all terms was. 77 (95% CI. 72-.81), and for coding terms that had corresponding UMLS codes recall was. 83 (.79-.87). Recall of the system for extracting all terms was. 84 (.81-.88). Recall of the experts ranged from. 69 to. 91 for extracting terms. The precision of the system was. 89 (.87-.91), and precision of the experts ranged from. 61 to. 91.Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.

AB - The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method.An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts.Recall of the system for UMLS coding of all terms was. 77 (95% CI. 72-.81), and for coding terms that had corresponding UMLS codes recall was. 83 (.79-.87). Recall of the system for extracting all terms was. 84 (.81-.88). Recall of the experts ranged from. 69 to. 91 for extracting terms. The precision of the system was. 89 (.87-.91), and precision of the experts ranged from. 61 to. 91.Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.

UR - http://www.scopus.com/inward/record.url?scp=4544280638&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544280638&partnerID=8YFLogxK

U2 - 10.1197/jamia.M1552

DO - 10.1197/jamia.M1552

M3 - Article

C2 - 15187068

AN - SCOPUS:4544280638

VL - 11

SP - 392

EP - 402

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 5

ER -