Using sentence-selection heuristics to rank text segments in TXTRACTOR

Daniel McDonald, Hsinchun Chen

Research output: Contribution to conferencePaper

35 Scopus citations

Abstract

TXTRACTOR is a tool that uses established sentence-selection heuristics to rank text segments, producing summaries that contain a user-defined number of sentences. The purpose of identifying text segments is to maximize topic diversity, which is an adaptation of the Maximal Marginal Relevance criterion used by Carbonell and Goldstein [5]. Sentence selection heuristics are then used to rank the segments. We hypothesize that ranking text segments via traditional sentence-selection heuristics produces a balanced summary with more useful information than one produced by using segmentation alone. The proposed summary is created in a three-step process, which includes 1) sentence evaluation 2) segment identification and 3) segment ranking. As the required length of the summary changes, low-ranking segments can then be dropped from (or higher ranking segments added to) the summary. We compared the output of TXTRACTOR to the output of a segmentation tool based on the TextTiling algorithm to validate the approach.

Original languageEnglish (US)
Pages28-35
Number of pages8
DOIs
StatePublished - 2002
EventProceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries - Portland, OR, United States
Duration: Jul 14 2002Jul 18 2002

Other

OtherProceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries
CountryUnited States
CityPortland, OR
Period7/14/027/18/02

Keywords

  • Information retrieval
  • Text extraction
  • Text segmentation
  • Text summarization

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Using sentence-selection heuristics to rank text segments in TXTRACTOR'. Together they form a unique fingerprint.

  • Cite this

    McDonald, D., & Chen, H. (2002). Using sentence-selection heuristics to rank text segments in TXTRACTOR. 28-35. Paper presented at Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, Portland, OR, United States. https://doi.org/10.1145/544220.544226