Diamonds in the rough: Event extraction from imperfect microblog data

Ander Intxaurrondo, Eneko Agirre, Oier Lopez De Lacalle, Mihai Surdeanu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

We introduce a distantly supervised event extraction approach that extracts complex event templates from microblogs. We show that this near real-time data source is more challenging than news because it contains information that is both approximate (e.g., with values that are close but different from the gold truth) and ambiguous (due to the brevity of the texts), impacting both the evaluation and extraction methods. For the former, we propose a novel, "soft", F1 metric that incorporates similarity between extracted fillers and the gold truth, giving partial credit to different but similar values. With respect to extraction methodology, we propose two extensions to the distant supervision paradigm: to address approximate information, we allow positive training examples to be generated from information that is similar but not identical to gold values; to address ambiguity, we aggregate contexts across tweets discussing the same event. We evaluate our contributions on the complex domain of earthquakes, with events with up to 20 arguments. Our results indicate that, despite their simplicity, our contributions yield a statistically-significant improvement of 33% (relative) over a strong distantly-supervised system. The dataset containing the knowledge base, relevant tweets and manual annotations is publicly available.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages641-650
Number of pages10
ISBN (Print)9781941643495
StatePublished - 2015
EventConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States
Duration: May 31 2015Jun 5 2015

Other

OtherConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
CountryUnited States
CityDenver
Period5/31/156/5/15

ASJC Scopus subject areas

  • Computer Science Applications
  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Diamonds in the rough: Event extraction from imperfect microblog data'. Together they form a unique fingerprint.

  • Cite this

    Intxaurrondo, A., Agirre, E., De Lacalle, O. L., & Surdeanu, M. (2015). Diamonds in the rough: Event extraction from imperfect microblog data. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 641-650). Association for Computational Linguistics (ACL).