Integrating deep learning approaches for identifying news reprint relation

Yin Luo, Fangfang Wang, Jun Chen, Lei Wang, Daniel Dajun Zeng

Research output: Contribution to journalArticle

1 Scopus citations

Abstract

With the rapid development of big data and new media technologies, a large amount of original news is generated and reprinted on the Internet via news portals. Identifying news reprint relations is of great importance for the analysis of news diffusion patterns and copyright protection. However, the amount of news data on the Internet creates a huge challenge for efficiently identifying news reprint relation. Some existing studies focus on computing the similarity of the full text of news reports, which is not always effective, because some reprints only excerpt some sentences of the original news reports. The core challenge of improving identification accuracy is excavating the potential semantic relevance between news articles at the sentence level. Inspired by deep learning and semantic-based text representation models, this paper proposes an approach for identifying news reprint relation by integrating deep learning approaches. First, news reports that are not related to the topic of the original news report are removed via topic correlation mining. Then, the potential semantic relevance is excavated at the sentence level through the integration of semantic analysis methods, and reprint relations are identified between news reports. The performance of the approach is empirically evaluated using a real-world dataset. Experimental results show that the semantic analysis model integration allows us to mine in-depth semantic associations between news stories and accurately identify news reprint relations. These results benefit news diffusion pattern analysis and copiright protection.

Original languageEnglish (US)
Article number8542722
Pages (from-to)72163-72172
Number of pages10
JournalIEEE Access
Volume6
DOIs
StatePublished - Jan 1 2018

Keywords

  • Deep learning
  • diffusion pattern
  • news reprint relation identification
  • semantic relevance
  • word embedding

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Fingerprint Dive into the research topics of 'Integrating deep learning approaches for identifying news reprint relation'. Together they form a unique fingerprint.

  • Cite this