Automated terminology networks for the integration of heterogeneous databases

Xiaoyan Wang, Hui Nar Quek, Michael Cantor, Pauline Kra, Aylit Schultz, Yves A Lussier

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

As cross-disciplinary research escalates, researchers are facing the challenge of linking disparate biomedical databases that have been developed without common indexes. Manually indexing these large-scale databases is laborious and often impractical. Solutions involving mediating terminologies have been proposed, but coordination of terms from the databases of interest to these mediating terminologies is also laborious, and regular synchronization between indexes is an additional problem. In this study we describe a novel method of linking heterogeneous databases using terminology networks constructed with automated mapping methods. Linkage was established between two disparate biomedical databases (SNOMED-CT and HDG), using two relevant intermediating databases (UMLS and OMIM). One gold standard of 514 distinct matches is used as proof-of-principle. In our study, the fully manually curated network (baseline index) and one automated terminological pathway (HDG-OMIM-SNOMED) perform at high precision and low recall, while the direct automated terminological pathway (HDG-SNOMED) provides higher recall and lower precision. In conclusion, as hypothesized, 1) Manually curated pathways provide high precision, but offer low recall, 2) the automated terminology pathways can significantly increase recall at acceptable precision. Taken together, our conclusion may suggest the combined manual and automated terminology networks could offer recall and precision in an incremental manner.

Original languageEnglish (US)
Title of host publicationStudies in Health Technology and Informatics
Pages555-559
Number of pages5
Volume107
DOIs
StatePublished - 2004
Externally publishedYes

Fingerprint

Terminology
Systematized Nomenclature of Medicine
Databases
Genetic Databases
Unified Medical Language System
Synchronization
Research Personnel
Research

Keywords

  • databases
  • networks
  • system integration
  • terminologies

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Wang, X., Quek, H. N., Cantor, M., Kra, P., Schultz, A., & Lussier, Y. A. (2004). Automated terminology networks for the integration of heterogeneous databases. In Studies in Health Technology and Informatics (Vol. 107, pp. 555-559) https://doi.org/10.3233/978-1-60750-949-3-555

Automated terminology networks for the integration of heterogeneous databases. / Wang, Xiaoyan; Quek, Hui Nar; Cantor, Michael; Kra, Pauline; Schultz, Aylit; Lussier, Yves A.

Studies in Health Technology and Informatics. Vol. 107 2004. p. 555-559.

Research output: Chapter in Book/Report/Conference proceedingChapter

Wang, X, Quek, HN, Cantor, M, Kra, P, Schultz, A & Lussier, YA 2004, Automated terminology networks for the integration of heterogeneous databases. in Studies in Health Technology and Informatics. vol. 107, pp. 555-559. https://doi.org/10.3233/978-1-60750-949-3-555
Wang X, Quek HN, Cantor M, Kra P, Schultz A, Lussier YA. Automated terminology networks for the integration of heterogeneous databases. In Studies in Health Technology and Informatics. Vol. 107. 2004. p. 555-559 https://doi.org/10.3233/978-1-60750-949-3-555
Wang, Xiaoyan ; Quek, Hui Nar ; Cantor, Michael ; Kra, Pauline ; Schultz, Aylit ; Lussier, Yves A. / Automated terminology networks for the integration of heterogeneous databases. Studies in Health Technology and Informatics. Vol. 107 2004. pp. 555-559
@inbook{4cf89bbc1b13415c8531d7dbbb4227b6,
title = "Automated terminology networks for the integration of heterogeneous databases",
abstract = "As cross-disciplinary research escalates, researchers are facing the challenge of linking disparate biomedical databases that have been developed without common indexes. Manually indexing these large-scale databases is laborious and often impractical. Solutions involving mediating terminologies have been proposed, but coordination of terms from the databases of interest to these mediating terminologies is also laborious, and regular synchronization between indexes is an additional problem. In this study we describe a novel method of linking heterogeneous databases using terminology networks constructed with automated mapping methods. Linkage was established between two disparate biomedical databases (SNOMED-CT and HDG), using two relevant intermediating databases (UMLS and OMIM). One gold standard of 514 distinct matches is used as proof-of-principle. In our study, the fully manually curated network (baseline index) and one automated terminological pathway (HDG-OMIM-SNOMED) perform at high precision and low recall, while the direct automated terminological pathway (HDG-SNOMED) provides higher recall and lower precision. In conclusion, as hypothesized, 1) Manually curated pathways provide high precision, but offer low recall, 2) the automated terminology pathways can significantly increase recall at acceptable precision. Taken together, our conclusion may suggest the combined manual and automated terminology networks could offer recall and precision in an incremental manner.",
keywords = "databases, networks, system integration, terminologies",
author = "Xiaoyan Wang and Quek, {Hui Nar} and Michael Cantor and Pauline Kra and Aylit Schultz and Lussier, {Yves A}",
year = "2004",
doi = "10.3233/978-1-60750-949-3-555",
language = "English (US)",
volume = "107",
pages = "555--559",
booktitle = "Studies in Health Technology and Informatics",

}

TY - CHAP

T1 - Automated terminology networks for the integration of heterogeneous databases

AU - Wang, Xiaoyan

AU - Quek, Hui Nar

AU - Cantor, Michael

AU - Kra, Pauline

AU - Schultz, Aylit

AU - Lussier, Yves A

PY - 2004

Y1 - 2004

N2 - As cross-disciplinary research escalates, researchers are facing the challenge of linking disparate biomedical databases that have been developed without common indexes. Manually indexing these large-scale databases is laborious and often impractical. Solutions involving mediating terminologies have been proposed, but coordination of terms from the databases of interest to these mediating terminologies is also laborious, and regular synchronization between indexes is an additional problem. In this study we describe a novel method of linking heterogeneous databases using terminology networks constructed with automated mapping methods. Linkage was established between two disparate biomedical databases (SNOMED-CT and HDG), using two relevant intermediating databases (UMLS and OMIM). One gold standard of 514 distinct matches is used as proof-of-principle. In our study, the fully manually curated network (baseline index) and one automated terminological pathway (HDG-OMIM-SNOMED) perform at high precision and low recall, while the direct automated terminological pathway (HDG-SNOMED) provides higher recall and lower precision. In conclusion, as hypothesized, 1) Manually curated pathways provide high precision, but offer low recall, 2) the automated terminology pathways can significantly increase recall at acceptable precision. Taken together, our conclusion may suggest the combined manual and automated terminology networks could offer recall and precision in an incremental manner.

AB - As cross-disciplinary research escalates, researchers are facing the challenge of linking disparate biomedical databases that have been developed without common indexes. Manually indexing these large-scale databases is laborious and often impractical. Solutions involving mediating terminologies have been proposed, but coordination of terms from the databases of interest to these mediating terminologies is also laborious, and regular synchronization between indexes is an additional problem. In this study we describe a novel method of linking heterogeneous databases using terminology networks constructed with automated mapping methods. Linkage was established between two disparate biomedical databases (SNOMED-CT and HDG), using two relevant intermediating databases (UMLS and OMIM). One gold standard of 514 distinct matches is used as proof-of-principle. In our study, the fully manually curated network (baseline index) and one automated terminological pathway (HDG-OMIM-SNOMED) perform at high precision and low recall, while the direct automated terminological pathway (HDG-SNOMED) provides higher recall and lower precision. In conclusion, as hypothesized, 1) Manually curated pathways provide high precision, but offer low recall, 2) the automated terminology pathways can significantly increase recall at acceptable precision. Taken together, our conclusion may suggest the combined manual and automated terminology networks could offer recall and precision in an incremental manner.

KW - databases

KW - networks

KW - system integration

KW - terminologies

UR - http://www.scopus.com/inward/record.url?scp=84863245151&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863245151&partnerID=8YFLogxK

U2 - 10.3233/978-1-60750-949-3-555

DO - 10.3233/978-1-60750-949-3-555

M3 - Chapter

VL - 107

SP - 555

EP - 559

BT - Studies in Health Technology and Informatics

ER -