Spinning straw into gold: Using free text to train monolingual alignment models for non-factoid question answering

Rebecca Sharp, Peter Jansen, Mihai Surdeanu, Peter Clark

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Monolingual alignment models have been shown to boost the performance of question answering systems by "bridging the lexical chasm" between questions and answers. The main limitation of these approaches is that they require semistructured training data in the form of question-answer pairs, which is difficult to obtain in specialized domains or lowresource languages. We propose two inexpensive methods for training alignment models solely using free text, by generating artificial question-answer pairs from discourse structures. Our approach is driven by two representations of discourse: a shallow sequential representation, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We show that these alignment models trained directly from discourse structures imposed on free text improve performance considerably over an information retrieval baseline and a neural network language model trained on the same data.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages231-237
Number of pages7
ISBN (Print)9781941643495
StatePublished - 2015
EventConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States
Duration: May 31 2015Jun 5 2015

Other

OtherConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
CountryUnited States
CityDenver
Period5/31/156/5/15

Fingerprint

Straw
gold
Gold
discourse
language
Information retrieval
information retrieval
neural network
performance
biology
genre
Question Answering
Train
Alignment
Neural networks
Discourse Structure

ASJC Scopus subject areas

  • Computer Science Applications
  • Language and Linguistics
  • Linguistics and Language

Cite this

Sharp, R., Jansen, P., Surdeanu, M., & Clark, P. (2015). Spinning straw into gold: Using free text to train monolingual alignment models for non-factoid question answering. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 231-237). Association for Computational Linguistics (ACL).

Spinning straw into gold : Using free text to train monolingual alignment models for non-factoid question answering. / Sharp, Rebecca; Jansen, Peter; Surdeanu, Mihai; Clark, Peter.

NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2015. p. 231-237.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sharp, R, Jansen, P, Surdeanu, M & Clark, P 2015, Spinning straw into gold: Using free text to train monolingual alignment models for non-factoid question answering. in NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Association for Computational Linguistics (ACL), pp. 231-237, Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015, Denver, United States, 5/31/15.
Sharp R, Jansen P, Surdeanu M, Clark P. Spinning straw into gold: Using free text to train monolingual alignment models for non-factoid question answering. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Association for Computational Linguistics (ACL). 2015. p. 231-237
Sharp, Rebecca ; Jansen, Peter ; Surdeanu, Mihai ; Clark, Peter. / Spinning straw into gold : Using free text to train monolingual alignment models for non-factoid question answering. NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2015. pp. 231-237
@inproceedings{cc260bb7f3bb463c92969d896561de5f,
title = "Spinning straw into gold: Using free text to train monolingual alignment models for non-factoid question answering",
abstract = "Monolingual alignment models have been shown to boost the performance of question answering systems by {"}bridging the lexical chasm{"} between questions and answers. The main limitation of these approaches is that they require semistructured training data in the form of question-answer pairs, which is difficult to obtain in specialized domains or lowresource languages. We propose two inexpensive methods for training alignment models solely using free text, by generating artificial question-answer pairs from discourse structures. Our approach is driven by two representations of discourse: a shallow sequential representation, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We show that these alignment models trained directly from discourse structures imposed on free text improve performance considerably over an information retrieval baseline and a neural network language model trained on the same data.",
author = "Rebecca Sharp and Peter Jansen and Mihai Surdeanu and Peter Clark",
year = "2015",
language = "English (US)",
isbn = "9781941643495",
pages = "231--237",
booktitle = "NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Spinning straw into gold

T2 - Using free text to train monolingual alignment models for non-factoid question answering

AU - Sharp, Rebecca

AU - Jansen, Peter

AU - Surdeanu, Mihai

AU - Clark, Peter

PY - 2015

Y1 - 2015

N2 - Monolingual alignment models have been shown to boost the performance of question answering systems by "bridging the lexical chasm" between questions and answers. The main limitation of these approaches is that they require semistructured training data in the form of question-answer pairs, which is difficult to obtain in specialized domains or lowresource languages. We propose two inexpensive methods for training alignment models solely using free text, by generating artificial question-answer pairs from discourse structures. Our approach is driven by two representations of discourse: a shallow sequential representation, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We show that these alignment models trained directly from discourse structures imposed on free text improve performance considerably over an information retrieval baseline and a neural network language model trained on the same data.

AB - Monolingual alignment models have been shown to boost the performance of question answering systems by "bridging the lexical chasm" between questions and answers. The main limitation of these approaches is that they require semistructured training data in the form of question-answer pairs, which is difficult to obtain in specialized domains or lowresource languages. We propose two inexpensive methods for training alignment models solely using free text, by generating artificial question-answer pairs from discourse structures. Our approach is driven by two representations of discourse: a shallow sequential representation, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We show that these alignment models trained directly from discourse structures imposed on free text improve performance considerably over an information retrieval baseline and a neural network language model trained on the same data.

UR - http://www.scopus.com/inward/record.url?scp=84960085764&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84960085764&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84960085764

SN - 9781941643495

SP - 231

EP - 237

BT - NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

PB - Association for Computational Linguistics (ACL)

ER -