Event extraction using distant supervision

Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

Distant supervision is a successful paradigm that gathers training data for information extraction systems by automatically aligning vast databases of facts with text. Previous work has demonstrated its usefulness for the extraction of binary relations such as a person's employer or a film's director. Here, we extend the distant supervision approach to template-based event extraction, focusing on the extraction of passenger counts, aircraft types, and other facts concerning airplane crash events. We present a new publicly available dataset and event extraction task in the plane crash domain based on Wikipedia infoboxes and newswire text. Using this dataset, we conduct a preliminary evaluation of four distantly supervised extraction models which assign named entity mentions in text to entries in the event template. Our results indicate that joint inference over sequences of candidate entity mentions is beneficial. Furthermore, we demonstrate that the SEARN algorithm outperforms a linear-chain CRF and strong baselines with local inference.

Original languageEnglish (US)
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
PublisherEuropean Language Resources Association (ELRA)
Pages4527-4531
Number of pages5
ISBN (Electronic)9782951740884
StatePublished - Jan 1 2014
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: May 26 2014May 31 2014

Other

Other9th International Conference on Language Resources and Evaluation, LREC 2014
CountryIceland
CityReykjavik
Period5/26/145/31/14

Fingerprint

supervision
event
aircraft
Wikipedia
director
employer
candidacy
paradigm
human being
Supervision
evaluation
Inference
Template
Entity

Keywords

  • Distant-supervision
  • Event-extraction
  • Searn

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Education
  • Language and Linguistics

Cite this

Reschke, K., Jankowiak, M., Surdeanu, M., Manning, C. D., & Jurafsky, D. (2014). Event extraction using distant supervision. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (pp. 4527-4531). European Language Resources Association (ELRA).

Event extraction using distant supervision. / Reschke, Kevin; Jankowiak, Martin; Surdeanu, Mihai; Manning, Christopher D.; Jurafsky, Daniel.

Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), 2014. p. 4527-4531.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Reschke, K, Jankowiak, M, Surdeanu, M, Manning, CD & Jurafsky, D 2014, Event extraction using distant supervision. in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), pp. 4527-4531, 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 5/26/14.
Reschke K, Jankowiak M, Surdeanu M, Manning CD, Jurafsky D. Event extraction using distant supervision. In Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA). 2014. p. 4527-4531
Reschke, Kevin ; Jankowiak, Martin ; Surdeanu, Mihai ; Manning, Christopher D. ; Jurafsky, Daniel. / Event extraction using distant supervision. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014. European Language Resources Association (ELRA), 2014. pp. 4527-4531
@inproceedings{424e4303b7c641f69e13f1c514707838,
title = "Event extraction using distant supervision",
abstract = "Distant supervision is a successful paradigm that gathers training data for information extraction systems by automatically aligning vast databases of facts with text. Previous work has demonstrated its usefulness for the extraction of binary relations such as a person's employer or a film's director. Here, we extend the distant supervision approach to template-based event extraction, focusing on the extraction of passenger counts, aircraft types, and other facts concerning airplane crash events. We present a new publicly available dataset and event extraction task in the plane crash domain based on Wikipedia infoboxes and newswire text. Using this dataset, we conduct a preliminary evaluation of four distantly supervised extraction models which assign named entity mentions in text to entries in the event template. Our results indicate that joint inference over sequences of candidate entity mentions is beneficial. Furthermore, we demonstrate that the SEARN algorithm outperforms a linear-chain CRF and strong baselines with local inference.",
keywords = "Distant-supervision, Event-extraction, Searn",
author = "Kevin Reschke and Martin Jankowiak and Mihai Surdeanu and Manning, {Christopher D.} and Daniel Jurafsky",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
pages = "4527--4531",
booktitle = "Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Event extraction using distant supervision

AU - Reschke, Kevin

AU - Jankowiak, Martin

AU - Surdeanu, Mihai

AU - Manning, Christopher D.

AU - Jurafsky, Daniel

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Distant supervision is a successful paradigm that gathers training data for information extraction systems by automatically aligning vast databases of facts with text. Previous work has demonstrated its usefulness for the extraction of binary relations such as a person's employer or a film's director. Here, we extend the distant supervision approach to template-based event extraction, focusing on the extraction of passenger counts, aircraft types, and other facts concerning airplane crash events. We present a new publicly available dataset and event extraction task in the plane crash domain based on Wikipedia infoboxes and newswire text. Using this dataset, we conduct a preliminary evaluation of four distantly supervised extraction models which assign named entity mentions in text to entries in the event template. Our results indicate that joint inference over sequences of candidate entity mentions is beneficial. Furthermore, we demonstrate that the SEARN algorithm outperforms a linear-chain CRF and strong baselines with local inference.

AB - Distant supervision is a successful paradigm that gathers training data for information extraction systems by automatically aligning vast databases of facts with text. Previous work has demonstrated its usefulness for the extraction of binary relations such as a person's employer or a film's director. Here, we extend the distant supervision approach to template-based event extraction, focusing on the extraction of passenger counts, aircraft types, and other facts concerning airplane crash events. We present a new publicly available dataset and event extraction task in the plane crash domain based on Wikipedia infoboxes and newswire text. Using this dataset, we conduct a preliminary evaluation of four distantly supervised extraction models which assign named entity mentions in text to entries in the event template. Our results indicate that joint inference over sequences of candidate entity mentions is beneficial. Furthermore, we demonstrate that the SEARN algorithm outperforms a linear-chain CRF and strong baselines with local inference.

KW - Distant-supervision

KW - Event-extraction

KW - Searn

UR - http://www.scopus.com/inward/record.url?scp=85021736987&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021736987&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85021736987

SP - 4527

EP - 4531

BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

PB - European Language Resources Association (ELRA)

ER -