Integration of neuroimaging and microarray datasets through mapping and model-theoretic semantic decomposition of unstructured phenotypes

Spiro P. Pantazatos, Jianrong Li, Paul Pavlidis, Yves A Lussier

Research output: Contribution to journalArticle

Abstract

An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as "List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes". Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets.

Original languageEnglish (US)
Pages (from-to)75-94
Number of pages20
JournalCancer Informatics
Volume8
StatePublished - 2009
Externally publishedYes

Fingerprint

Semantics
Neuroimaging
Theoretical Models
Phenotype
Natural Language Processing
Systematized Nomenclature of Medicine
Databases
Atlases
Brain
Neurosciences
Datasets

Keywords

  • Computational ontologies
  • Database interoperability
  • Mediated schema
  • Phenotypes
  • SNOMED

ASJC Scopus subject areas

  • Cancer Research
  • Oncology

Cite this

Integration of neuroimaging and microarray datasets through mapping and model-theoretic semantic decomposition of unstructured phenotypes. / Pantazatos, Spiro P.; Li, Jianrong; Pavlidis, Paul; Lussier, Yves A.

In: Cancer Informatics, Vol. 8, 2009, p. 75-94.

Research output: Contribution to journalArticle

@article{0ff41715819f41a39cfa83e64df4edeb,
title = "Integration of neuroimaging and microarray datasets through mapping and model-theoretic semantic decomposition of unstructured phenotypes",
abstract = "An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT{\circledR}). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as {"}List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes{"}. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88{\%} (n = 50), and precision of the semantic mapping between these terms across datasets was 98{\%} (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets.",
keywords = "Computational ontologies, Database interoperability, Mediated schema, Phenotypes, SNOMED",
author = "Pantazatos, {Spiro P.} and Jianrong Li and Paul Pavlidis and Lussier, {Yves A}",
year = "2009",
language = "English (US)",
volume = "8",
pages = "75--94",
journal = "Cancer Informatics",
issn = "1176-9351",
publisher = "Libertas Academica Ltd.",

}

TY - JOUR

T1 - Integration of neuroimaging and microarray datasets through mapping and model-theoretic semantic decomposition of unstructured phenotypes

AU - Pantazatos, Spiro P.

AU - Li, Jianrong

AU - Pavlidis, Paul

AU - Lussier, Yves A

PY - 2009

Y1 - 2009

N2 - An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as "List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes". Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets.

AB - An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as "List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes". Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets.

KW - Computational ontologies

KW - Database interoperability

KW - Mediated schema

KW - Phenotypes

KW - SNOMED

UR - http://www.scopus.com/inward/record.url?scp=77649249572&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77649249572&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:77649249572

VL - 8

SP - 75

EP - 94

JO - Cancer Informatics

JF - Cancer Informatics

SN - 1176-9351

ER -