A scaffolding approach to coreference resolution integrating statistical and rule-based models

HEEYOUNG LEE, MIHAI SURDEANU, DAN JURAFSKY

Research output: Research - peer-reviewArticle

Abstract

We describe a scaffolding approach to the task of coreference resolution that incrementally combines statistical classifiers, each designed for a particular mention type, with rule-based models (for sub-tasks well-matched to determinism). We motivate our design by an oracle-based analysis of errors in a rule-based coreference resolution system, showing that rule-based approaches are poorly suited to tasks that require a large lexical feature space, such as resolving pronominal and common-noun mentions. Our approach combines many advantages: it incrementally builds clusters integrating joint information about entities, uses rules for deterministic phenomena, and integrates rich lexical, syntactic, and semantic features with random forest classifiers well-suited to modeling the complex feature interactions that are known to characterize the coreference task. We demonstrate that all these decisions are important. The resulting system achieves 63.2 F1 on the CoNLL-2012 shared task dataset, outperforming the rule-based starting point by over seven F1 points. Similarly, our system outperforms an equivalent sieve-based approach that relies on logistic regression classifiers instead of random forests by over four F1 points. Lastly, we show that by changing the coreference resolution system from relying on constituent-based syntax to using dependency syntax, which can be generated in linear time, we achieve a runtime speedup of 550 per cent without considerable loss of accuracy.

LanguageEnglish (US)
Pages1-30
Number of pages30
JournalNatural Language Engineering
DOIs
StateAccepted/In press - Mar 21 2017
Externally publishedYes

Fingerprint

Scaffolding
Coreference
Classifiers
Classifier
Syntax
syntax
Sieves
Syntactics
Logistics
Semantics
Syntactic Features
Determinism
Common Noun
Interaction
Constituent
Oracles
Pronominal
Logistic Regression
Semantic Features
Entity

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Cite this

A scaffolding approach to coreference resolution integrating statistical and rule-based models. / LEE, HEEYOUNG; SURDEANU, MIHAI; JURAFSKY, DAN.

In: Natural Language Engineering, 21.03.2017, p. 1-30.

Research output: Research - peer-reviewArticle

@article{d0d84571aba2400abc514906aa5704fd,
title = "A scaffolding approach to coreference resolution integrating statistical and rule-based models",
abstract = "We describe a scaffolding approach to the task of coreference resolution that incrementally combines statistical classifiers, each designed for a particular mention type, with rule-based models (for sub-tasks well-matched to determinism). We motivate our design by an oracle-based analysis of errors in a rule-based coreference resolution system, showing that rule-based approaches are poorly suited to tasks that require a large lexical feature space, such as resolving pronominal and common-noun mentions. Our approach combines many advantages: it incrementally builds clusters integrating joint information about entities, uses rules for deterministic phenomena, and integrates rich lexical, syntactic, and semantic features with random forest classifiers well-suited to modeling the complex feature interactions that are known to characterize the coreference task. We demonstrate that all these decisions are important. The resulting system achieves 63.2 F1 on the CoNLL-2012 shared task dataset, outperforming the rule-based starting point by over seven F1 points. Similarly, our system outperforms an equivalent sieve-based approach that relies on logistic regression classifiers instead of random forests by over four F1 points. Lastly, we show that by changing the coreference resolution system from relying on constituent-based syntax to using dependency syntax, which can be generated in linear time, we achieve a runtime speedup of 550 per cent without considerable loss of accuracy.",
author = "HEEYOUNG LEE and MIHAI SURDEANU and DAN JURAFSKY",
year = "2017",
month = "3",
doi = "10.1017/S1351324917000109",
pages = "1--30",
journal = "Natural Language Engineering",
issn = "1351-3249",
publisher = "Cambridge University Press",

}

TY - JOUR

T1 - A scaffolding approach to coreference resolution integrating statistical and rule-based models

AU - LEE,HEEYOUNG

AU - SURDEANU,MIHAI

AU - JURAFSKY,DAN

PY - 2017/3/21

Y1 - 2017/3/21

N2 - We describe a scaffolding approach to the task of coreference resolution that incrementally combines statistical classifiers, each designed for a particular mention type, with rule-based models (for sub-tasks well-matched to determinism). We motivate our design by an oracle-based analysis of errors in a rule-based coreference resolution system, showing that rule-based approaches are poorly suited to tasks that require a large lexical feature space, such as resolving pronominal and common-noun mentions. Our approach combines many advantages: it incrementally builds clusters integrating joint information about entities, uses rules for deterministic phenomena, and integrates rich lexical, syntactic, and semantic features with random forest classifiers well-suited to modeling the complex feature interactions that are known to characterize the coreference task. We demonstrate that all these decisions are important. The resulting system achieves 63.2 F1 on the CoNLL-2012 shared task dataset, outperforming the rule-based starting point by over seven F1 points. Similarly, our system outperforms an equivalent sieve-based approach that relies on logistic regression classifiers instead of random forests by over four F1 points. Lastly, we show that by changing the coreference resolution system from relying on constituent-based syntax to using dependency syntax, which can be generated in linear time, we achieve a runtime speedup of 550 per cent without considerable loss of accuracy.

AB - We describe a scaffolding approach to the task of coreference resolution that incrementally combines statistical classifiers, each designed for a particular mention type, with rule-based models (for sub-tasks well-matched to determinism). We motivate our design by an oracle-based analysis of errors in a rule-based coreference resolution system, showing that rule-based approaches are poorly suited to tasks that require a large lexical feature space, such as resolving pronominal and common-noun mentions. Our approach combines many advantages: it incrementally builds clusters integrating joint information about entities, uses rules for deterministic phenomena, and integrates rich lexical, syntactic, and semantic features with random forest classifiers well-suited to modeling the complex feature interactions that are known to characterize the coreference task. We demonstrate that all these decisions are important. The resulting system achieves 63.2 F1 on the CoNLL-2012 shared task dataset, outperforming the rule-based starting point by over seven F1 points. Similarly, our system outperforms an equivalent sieve-based approach that relies on logistic regression classifiers instead of random forests by over four F1 points. Lastly, we show that by changing the coreference resolution system from relying on constituent-based syntax to using dependency syntax, which can be generated in linear time, we achieve a runtime speedup of 550 per cent without considerable loss of accuracy.

UR - http://www.scopus.com/inward/record.url?scp=85015651775&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015651775&partnerID=8YFLogxK

U2 - 10.1017/S1351324917000109

DO - 10.1017/S1351324917000109

M3 - Article

SP - 1

EP - 30

JO - Natural Language Engineering

T2 - Natural Language Engineering

JF - Natural Language Engineering

SN - 1351-3249

ER -