Scientific discovery as link prediction in influence and citation graphs

Fan Luo, Marco Valenzuela-Escárcega, Gus Hahn-Powell, Mihai Surdeanu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce a machine learning approach for the identification of "white spaces" in scientific knowledge. Our approach addresses this task as link prediction over a graph that contains over 2M influence statements such as "CTCF activates FOXA1", which were automatically extracted using open-domain machine reading. We model this prediction task using graph-based features extracted from the above influence graph, as well as from a citation graph that captures scientific communities. We evaluated the proposed approach through backtesting. Although the data is heavily unbalanced (50 times more negative examples than positives), our approach predicts which influence links will be discovered in the "near future" with a F1 score of 27 points, and a mean average precision of 68%.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies - Proceedings of the Student Research Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages1-6
Number of pages6
ISBN (Electronic)9781948087261
StatePublished - Jan 1 2018
Event2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - Student Research Workshop, SRW 2018 - New Orleans, United States
Duration: Jun 2 2018Jun 4 2018

Publication series

NameNAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Student Research Workshop

Conference

Conference2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - Student Research Workshop, SRW 2018
CountryUnited States
CityNew Orleans
Period6/2/186/4/18

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Scientific discovery as link prediction in influence and citation graphs'. Together they form a unique fingerprint.

  • Cite this

    Luo, F., Valenzuela-Escárcega, M., Hahn-Powell, G., & Surdeanu, M. (2018). Scientific discovery as link prediction in influence and citation graphs. In NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Student Research Workshop (pp. 1-6). (NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Student Research Workshop). Association for Computational Linguistics (ACL).