PhenoGO

Assigning phenotypic context to gene ontology annotations with natural language processing

Yves A Lussier, Tara Borlawsky, Daniel Rappaport, Yang Lfu, Carol Friedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

39 Citations (Scopus)

Abstract

Natural language processing (NLP) is a high throughput technology because it can process vast quantities of text within a reasonable time period. It has the potential to substantially facilitate biomedical research by extracting, linking, and organizing massive amounts of information that occur in biomedical journal articles as well as in textual fields of biological databases. Until recently, much of the work in biological NLP and text mining has revolved around recognizing the occurrence of biomolecular entities in articles, and in extracting particular relationships among the entities. Now, researchers have recognized a need to link the extracted information to ontologies or knowledge bases, which is a more difficult task. One such knowledge base is Gene Ontology annotations (GOA), which significantly increases semantic computations over the function, cellular components and processes of genes. For multicellular organisms, these annotations can be refined with phenotypic context, such as the cell type, tissue, and organ because establishing phenotypic contexts in which a gene is expressed is a crucial step for understanding the development and the molecular underpinning of the pathophystology of diseases. In this paper, we propose a system, PhenoGO, which automatically augments annotations in GOA with additional context. PhenoCO utilizes an existing NLP system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. More specifically, PhenoGO adds phenotypic contextual information to existing associations between gene products and GO terms as specified in GOA. The system also maps the context to identifiers that are associated with different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at the following URL: http://www.phenoGO.org.

Original languageEnglish (US)
Title of host publicationProceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006
Pages64-75
Number of pages12
StatePublished - 2006
Externally publishedYes
Event11th Pacific Symposium on Biocomputing 2006, PSB 2006 - Maui, HI, United States
Duration: Jan 3 2006Jan 7 2006

Other

Other11th Pacific Symposium on Biocomputing 2006, PSB 2006
CountryUnited States
CityMaui, HI
Period1/3/061/7/06

Fingerprint

Natural Language Processing
Molecular Sequence Annotation
Gene Ontology
Ontology
Biological Ontologies
Genes
Knowledge Bases
Processing
Phenotype
Unified Medical Language System
Databases
Gene Components
Natural language processing systems
Data Mining
Semantics
Biomedical Research
Anatomy
Research Personnel
Technology
Taxonomies

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering
  • Medicine(all)

Cite this

Lussier, Y. A., Borlawsky, T., Rappaport, D., Lfu, Y., & Friedman, C. (2006). PhenoGO: Assigning phenotypic context to gene ontology annotations with natural language processing. In Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006 (pp. 64-75)

PhenoGO : Assigning phenotypic context to gene ontology annotations with natural language processing. / Lussier, Yves A; Borlawsky, Tara; Rappaport, Daniel; Lfu, Yang; Friedman, Carol.

Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006. 2006. p. 64-75.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lussier, YA, Borlawsky, T, Rappaport, D, Lfu, Y & Friedman, C 2006, PhenoGO: Assigning phenotypic context to gene ontology annotations with natural language processing. in Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006. pp. 64-75, 11th Pacific Symposium on Biocomputing 2006, PSB 2006, Maui, HI, United States, 1/3/06.
Lussier YA, Borlawsky T, Rappaport D, Lfu Y, Friedman C. PhenoGO: Assigning phenotypic context to gene ontology annotations with natural language processing. In Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006. 2006. p. 64-75
Lussier, Yves A ; Borlawsky, Tara ; Rappaport, Daniel ; Lfu, Yang ; Friedman, Carol. / PhenoGO : Assigning phenotypic context to gene ontology annotations with natural language processing. Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006. 2006. pp. 64-75
@inproceedings{7f09f31478e3440291c853df59a56313,
title = "PhenoGO: Assigning phenotypic context to gene ontology annotations with natural language processing",
abstract = "Natural language processing (NLP) is a high throughput technology because it can process vast quantities of text within a reasonable time period. It has the potential to substantially facilitate biomedical research by extracting, linking, and organizing massive amounts of information that occur in biomedical journal articles as well as in textual fields of biological databases. Until recently, much of the work in biological NLP and text mining has revolved around recognizing the occurrence of biomolecular entities in articles, and in extracting particular relationships among the entities. Now, researchers have recognized a need to link the extracted information to ontologies or knowledge bases, which is a more difficult task. One such knowledge base is Gene Ontology annotations (GOA), which significantly increases semantic computations over the function, cellular components and processes of genes. For multicellular organisms, these annotations can be refined with phenotypic context, such as the cell type, tissue, and organ because establishing phenotypic contexts in which a gene is expressed is a crucial step for understanding the development and the molecular underpinning of the pathophystology of diseases. In this paper, we propose a system, PhenoGO, which automatically augments annotations in GOA with additional context. PhenoCO utilizes an existing NLP system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. More specifically, PhenoGO adds phenotypic contextual information to existing associations between gene products and GO terms as specified in GOA. The system also maps the context to identifiers that are associated with different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91{\%} and recall of 92{\%}, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at the following URL: http://www.phenoGO.org.",
author = "Lussier, {Yves A} and Tara Borlawsky and Daniel Rappaport and Yang Lfu and Carol Friedman",
year = "2006",
language = "English (US)",
isbn = "9812564632",
pages = "64--75",
booktitle = "Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006",

}

TY - GEN

T1 - PhenoGO

T2 - Assigning phenotypic context to gene ontology annotations with natural language processing

AU - Lussier, Yves A

AU - Borlawsky, Tara

AU - Rappaport, Daniel

AU - Lfu, Yang

AU - Friedman, Carol

PY - 2006

Y1 - 2006

N2 - Natural language processing (NLP) is a high throughput technology because it can process vast quantities of text within a reasonable time period. It has the potential to substantially facilitate biomedical research by extracting, linking, and organizing massive amounts of information that occur in biomedical journal articles as well as in textual fields of biological databases. Until recently, much of the work in biological NLP and text mining has revolved around recognizing the occurrence of biomolecular entities in articles, and in extracting particular relationships among the entities. Now, researchers have recognized a need to link the extracted information to ontologies or knowledge bases, which is a more difficult task. One such knowledge base is Gene Ontology annotations (GOA), which significantly increases semantic computations over the function, cellular components and processes of genes. For multicellular organisms, these annotations can be refined with phenotypic context, such as the cell type, tissue, and organ because establishing phenotypic contexts in which a gene is expressed is a crucial step for understanding the development and the molecular underpinning of the pathophystology of diseases. In this paper, we propose a system, PhenoGO, which automatically augments annotations in GOA with additional context. PhenoCO utilizes an existing NLP system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. More specifically, PhenoGO adds phenotypic contextual information to existing associations between gene products and GO terms as specified in GOA. The system also maps the context to identifiers that are associated with different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at the following URL: http://www.phenoGO.org.

AB - Natural language processing (NLP) is a high throughput technology because it can process vast quantities of text within a reasonable time period. It has the potential to substantially facilitate biomedical research by extracting, linking, and organizing massive amounts of information that occur in biomedical journal articles as well as in textual fields of biological databases. Until recently, much of the work in biological NLP and text mining has revolved around recognizing the occurrence of biomolecular entities in articles, and in extracting particular relationships among the entities. Now, researchers have recognized a need to link the extracted information to ontologies or knowledge bases, which is a more difficult task. One such knowledge base is Gene Ontology annotations (GOA), which significantly increases semantic computations over the function, cellular components and processes of genes. For multicellular organisms, these annotations can be refined with phenotypic context, such as the cell type, tissue, and organ because establishing phenotypic contexts in which a gene is expressed is a crucial step for understanding the development and the molecular underpinning of the pathophystology of diseases. In this paper, we propose a system, PhenoGO, which automatically augments annotations in GOA with additional context. PhenoCO utilizes an existing NLP system, called BioMedLEE, an existing knowledge-based phenotype organizer system (PhenOS) in conjunction with MeSH indexing and established biomedical ontologies. More specifically, PhenoGO adds phenotypic contextual information to existing associations between gene products and GO terms as specified in GOA. The system also maps the context to identifiers that are associated with different biomedical ontologies, including the UMLS, Cell Ontology, Mouse Anatomy, NCBI taxonomy, GO, and Mammalian Phenotype Ontology. In addition, PhenoGO was evaluated for coding of anatomical and cellular information and assigning the coded phenotypes to the correct GOA; results obtained show that PhenoGO has a precision of 91% and recall of 92%, demonstrating that the PhenoGO NLP system can accurately encode a large number of anatomical and cellular ontologies to GO annotations. The PhenoGO Database may be accessed at the following URL: http://www.phenoGO.org.

UR - http://www.scopus.com/inward/record.url?scp=39049187314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=39049187314&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9812564632

SN - 9789812564634

SP - 64

EP - 75

BT - Proceedings of the Pacific Symposium on Biocomputing 2006, PSB 2006

ER -