Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading

Hoang Van, Vikas Yadav, Mihai Surdeanu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the location of the answers. We demonstrate that our simple strategy substantially improves both document retrieval and answer extraction performance by providing larger context of the answers and additional training data. In particular, our method significantly improves the performance of BERT based retriever (15.12%), and answer extractor (4.33% F1) on TechQA, a complex, low-resource MRC task. Further, our data augmentation strategy yields significant improvements of up to 3.9% exact match (EM) and 2.7% F1 for answer extraction on PolicyQA, another practical but moderate sized QA dataset that also contains long answer spans.

Original languageEnglish (US)
Title of host publicationSIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages2116-2120
Number of pages5
ISBN (Electronic)9781450380379
DOIs
StatePublished - Jul 11 2021
Event44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021 - Virtual, Online, Canada
Duration: Jul 11 2021Jul 15 2021

Publication series

NameSIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021
Country/TerritoryCanada
CityVirtual, Online
Period7/11/217/15/21

Keywords

  • data augmentation
  • document retrieval
  • question answering

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this