Unsupervised extraction of text segments from heterogeneous document collections

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This paper describes a simple, unsupervised bootstrapping procedure that identifies morphological description segments from heterogeneous biodiversity document collections. While the procedure is used to preprocess biodiversity literature for semantic annotation of morphological descriptions in our project, it also can be used to crawl the Web for morphological descriptions for a biodiversity niche search engine.

Original languageEnglish (US)
JournalProceedings of the ASIST Annual Meeting
Volume47
DOIs
StatePublished - Nov 1 2010

Keywords

  • Biodiversity document collections
  • Morphological description
  • Segment information retrieval
  • Semantic annotation
  • Unsupervised machine learning

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Unsupervised extraction of text segments from heterogeneous document collections'. Together they form a unique fingerprint.

Cite this