Word segmentation as general chunking

Daniel Hewlett, Paul Cohen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

During language acquisition, children learn to segment speech into phonemes, syllables, morphemes, and words. We examine word segmentation specifically, and explore the possibility that children might have general purpose chunking mechanisms to perform word segmentation. The Voting Experts (VE) and Bootstrapped Voting Experts (BVE) algorithms serve as computational models of this chunking ability. VE finds chunks by searching for a particular information-theoretic signature: low internal entropy and high boundary entropy. BVE adds to VE the ability to incorporate information about word boundaries previously found by the algorithm into future segmentations. We evaluate the general chunking model on phonemically encoded corpora of child-directed speech, and show that it is consistent with empirical results in the developmental literature. We argue that it offers a parsimonious alternative to special purpose linguistic models.

Original languageEnglish (US)
Title of host publicationCoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
Pages39-47
Number of pages9
StatePublished - Dec 1 2011
Event15th Conference on Computational Natural Language Learning, CoNLL 2011 - Portland, OR, United States
Duration: Jun 23 2011Jun 24 2011

Publication series

NameCoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference

Other

Other15th Conference on Computational Natural Language Learning, CoNLL 2011
CountryUnited States
CityPortland, OR
Period6/23/116/24/11

ASJC Scopus subject areas

  • Artificial Intelligence
  • Linguistics and Language
  • Human-Computer Interaction

Fingerprint Dive into the research topics of 'Word segmentation as general chunking'. Together they form a unique fingerprint.

  • Cite this

    Hewlett, D., & Cohen, P. (2011). Word segmentation as general chunking. In CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference (pp. 39-47). (CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference).