Building the “Plant glossary”—a controlled botanical vocabulary using terms extracted from the floras of North America and China

Lorena Endara, Heather A. Cole, J. Gordon Burleigh, Nathalie S. Nagalingum, James A. Macklin, Jing Liu, Sonali Ranade, Hong Cui

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Taxonomic descriptions contain valuable phenotypic data that is often not directly accessible for modern evolutionary, ecological, or biodiversity analyses. We describe a process for building a consensus-based controlled vocabulary from taxonomic descriptions for plants, which also can be applied for building controlled vocabularies for other taxon groups. Controlled vocabularies are useful as lexicons for text mining algorithms, as source of candidate terms for ontologies, and as guides to help future authors use domain vocabulary more appropriately and consistently. We extracted phenotype-describing phrases terms from descriptions of 30 volumes of the Flora of North America and Flora of China and merged these with terms from the Categorical Glossary of the Flora of North America. Seven contributors placed the terms into a set of categories until there was an agreement among two or more categorizations per term. Term categorization makes the meaning of a term more explicit for the subsequent users of the glossary. The resulting “Plant Glossary” (terms and categorization of terms) contains 9228 terms grouped in 53 categories. Differences in term categorization represented 49% of the categorization effort, and the many differences among individual classifications can be attributed to individual interpretation of terms and to the fluid nature of descriptive language used in Floras. The difficulties experienced while classifying the terms allowed us to explore cases where the use of language can hinder the accurate and detailed annotation of taxonomic descriptions. The Plant Glossary represents a significant step towards creating and enriching formal ontologies for plant phenotypes as the semantic phenomena found through this exercise is useful background information for building ontologies. The glossary has been used by new software to parse and annotate plant taxonomic descriptions, and over 6000 new terms are available for creating ontologies.

Original languageEnglish (US)
Pages (from-to)953-966
Number of pages14
JournalTaxon
Volume66
Issue number4
DOIs
StatePublished - Jan 1 2017

Fingerprint

flora
China
phenotype
exercise
glossary
North America
vocabulary
biodiversity
taxonomy
software
fluid

Keywords

  • Controlled vocabulary
  • Phenotypic traits
  • Plant glossary
  • Semantics
  • Taxonomic descriptions

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Plant Science

Cite this

Building the “Plant glossary”—a controlled botanical vocabulary using terms extracted from the floras of North America and China. / Endara, Lorena; Cole, Heather A.; Gordon Burleigh, J.; Nagalingum, Nathalie S.; Macklin, James A.; Liu, Jing; Ranade, Sonali; Cui, Hong.

In: Taxon, Vol. 66, No. 4, 01.01.2017, p. 953-966.

Research output: Contribution to journalArticle

Endara, L, Cole, HA, Gordon Burleigh, J, Nagalingum, NS, Macklin, JA, Liu, J, Ranade, S & Cui, H 2017, 'Building the “Plant glossary”—a controlled botanical vocabulary using terms extracted from the floras of North America and China', Taxon, vol. 66, no. 4, pp. 953-966. https://doi.org/10.12705/664.9
Endara, Lorena ; Cole, Heather A. ; Gordon Burleigh, J. ; Nagalingum, Nathalie S. ; Macklin, James A. ; Liu, Jing ; Ranade, Sonali ; Cui, Hong. / Building the “Plant glossary”—a controlled botanical vocabulary using terms extracted from the floras of North America and China. In: Taxon. 2017 ; Vol. 66, No. 4. pp. 953-966.
@article{c9889e0bb4c34bed8f0138ef79ce70fd,
title = "Building the “Plant glossary”—a controlled botanical vocabulary using terms extracted from the floras of North America and China",
abstract = "Taxonomic descriptions contain valuable phenotypic data that is often not directly accessible for modern evolutionary, ecological, or biodiversity analyses. We describe a process for building a consensus-based controlled vocabulary from taxonomic descriptions for plants, which also can be applied for building controlled vocabularies for other taxon groups. Controlled vocabularies are useful as lexicons for text mining algorithms, as source of candidate terms for ontologies, and as guides to help future authors use domain vocabulary more appropriately and consistently. We extracted phenotype-describing phrases terms from descriptions of 30 volumes of the Flora of North America and Flora of China and merged these with terms from the Categorical Glossary of the Flora of North America. Seven contributors placed the terms into a set of categories until there was an agreement among two or more categorizations per term. Term categorization makes the meaning of a term more explicit for the subsequent users of the glossary. The resulting “Plant Glossary” (terms and categorization of terms) contains 9228 terms grouped in 53 categories. Differences in term categorization represented 49{\%} of the categorization effort, and the many differences among individual classifications can be attributed to individual interpretation of terms and to the fluid nature of descriptive language used in Floras. The difficulties experienced while classifying the terms allowed us to explore cases where the use of language can hinder the accurate and detailed annotation of taxonomic descriptions. The Plant Glossary represents a significant step towards creating and enriching formal ontologies for plant phenotypes as the semantic phenomena found through this exercise is useful background information for building ontologies. The glossary has been used by new software to parse and annotate plant taxonomic descriptions, and over 6000 new terms are available for creating ontologies.",
keywords = "Controlled vocabulary, Phenotypic traits, Plant glossary, Semantics, Taxonomic descriptions",
author = "Lorena Endara and Cole, {Heather A.} and {Gordon Burleigh}, J. and Nagalingum, {Nathalie S.} and Macklin, {James A.} and Jing Liu and Sonali Ranade and Hong Cui",
year = "2017",
month = "1",
day = "1",
doi = "10.12705/664.9",
language = "English (US)",
volume = "66",
pages = "953--966",
journal = "Taxon",
issn = "0040-0262",
publisher = "International Association for Plant Taxonomy",
number = "4",

}

TY - JOUR

T1 - Building the “Plant glossary”—a controlled botanical vocabulary using terms extracted from the floras of North America and China

AU - Endara, Lorena

AU - Cole, Heather A.

AU - Gordon Burleigh, J.

AU - Nagalingum, Nathalie S.

AU - Macklin, James A.

AU - Liu, Jing

AU - Ranade, Sonali

AU - Cui, Hong

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Taxonomic descriptions contain valuable phenotypic data that is often not directly accessible for modern evolutionary, ecological, or biodiversity analyses. We describe a process for building a consensus-based controlled vocabulary from taxonomic descriptions for plants, which also can be applied for building controlled vocabularies for other taxon groups. Controlled vocabularies are useful as lexicons for text mining algorithms, as source of candidate terms for ontologies, and as guides to help future authors use domain vocabulary more appropriately and consistently. We extracted phenotype-describing phrases terms from descriptions of 30 volumes of the Flora of North America and Flora of China and merged these with terms from the Categorical Glossary of the Flora of North America. Seven contributors placed the terms into a set of categories until there was an agreement among two or more categorizations per term. Term categorization makes the meaning of a term more explicit for the subsequent users of the glossary. The resulting “Plant Glossary” (terms and categorization of terms) contains 9228 terms grouped in 53 categories. Differences in term categorization represented 49% of the categorization effort, and the many differences among individual classifications can be attributed to individual interpretation of terms and to the fluid nature of descriptive language used in Floras. The difficulties experienced while classifying the terms allowed us to explore cases where the use of language can hinder the accurate and detailed annotation of taxonomic descriptions. The Plant Glossary represents a significant step towards creating and enriching formal ontologies for plant phenotypes as the semantic phenomena found through this exercise is useful background information for building ontologies. The glossary has been used by new software to parse and annotate plant taxonomic descriptions, and over 6000 new terms are available for creating ontologies.

AB - Taxonomic descriptions contain valuable phenotypic data that is often not directly accessible for modern evolutionary, ecological, or biodiversity analyses. We describe a process for building a consensus-based controlled vocabulary from taxonomic descriptions for plants, which also can be applied for building controlled vocabularies for other taxon groups. Controlled vocabularies are useful as lexicons for text mining algorithms, as source of candidate terms for ontologies, and as guides to help future authors use domain vocabulary more appropriately and consistently. We extracted phenotype-describing phrases terms from descriptions of 30 volumes of the Flora of North America and Flora of China and merged these with terms from the Categorical Glossary of the Flora of North America. Seven contributors placed the terms into a set of categories until there was an agreement among two or more categorizations per term. Term categorization makes the meaning of a term more explicit for the subsequent users of the glossary. The resulting “Plant Glossary” (terms and categorization of terms) contains 9228 terms grouped in 53 categories. Differences in term categorization represented 49% of the categorization effort, and the many differences among individual classifications can be attributed to individual interpretation of terms and to the fluid nature of descriptive language used in Floras. The difficulties experienced while classifying the terms allowed us to explore cases where the use of language can hinder the accurate and detailed annotation of taxonomic descriptions. The Plant Glossary represents a significant step towards creating and enriching formal ontologies for plant phenotypes as the semantic phenomena found through this exercise is useful background information for building ontologies. The glossary has been used by new software to parse and annotate plant taxonomic descriptions, and over 6000 new terms are available for creating ontologies.

KW - Controlled vocabulary

KW - Phenotypic traits

KW - Plant glossary

KW - Semantics

KW - Taxonomic descriptions

UR - http://www.scopus.com/inward/record.url?scp=85038029676&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85038029676&partnerID=8YFLogxK

U2 - 10.12705/664.9

DO - 10.12705/664.9

M3 - Article

VL - 66

SP - 953

EP - 966

JO - Taxon

JF - Taxon

SN - 0040-0262

IS - 4

ER -