Comparative biological studies have led to remarkable biomedical discoveries. While genomic science and technologies are advancing rapidly, our ability to precisely specify a phenotype and compare it to related phenotypes of other organisms remains challenging. This study has examined the systematic use of terminology and knowledge based technologies to enable high-throughput comparative phenomics. More specifically, we measured the accuracy of a multi-strategy automated classification method to bridge the phenotype gap between a phenotypic terminology (MGD: Phenoslim) and a broad-coverage clinical terminology (SNOMED CT). Furthermore, we qualitatively evaluate the additional emerging properties of the combined terminological network for comparative biology and discovery science. According to the gold standard (n = 100), the accuracies (precision / recall) of the composite automated methods were 67% / 97% (mapping for identical concepts) and 85% / 98% (classification). Quantitatively, only 2% of the phenotypic concepts were missing from the clinical terminology, however, qualitatively the gap was larger: conceptual scope, granularity and subtle yet significant, homonymy problems were observed. These results suggest that, as observed in other domains, additional strategies are required for combining terminologies.
|Original language||English (US)|
|Title of host publication||Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing|
|Number of pages||12|
|Publication status||Published - 2004|