Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach

Fangyu Lin, Yizhi Liu, Mohammadreza Ebrahimi, Zara Ahmad-Post, James Lee Hu, Jingyu Xin, Sagar Samtani, Weifeng Li, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The information privacy of the Internet users has become a major societal concern. The rapid growth of online services increases the risk of unauthorized access to Personally Identifiable Information (PII) of at-risk populations, who are unaware of their PII exposure. To proactively identify online at-risk populations and increase their privacy awareness, it is crucial to conduct a holistic privacy risk assessment across the internet. Current privacy risk assessment studies are limited to a single platform within either the surface web or the dark web. A comprehensive privacy risk assessment requires matching exposed PII on heterogeneous online platforms across the surface web and the dark web. However, due to the incompleteness and inaccuracy of PII records in each platform, linking the exposed PII to users is a non-trivial task. While Entity Resolution (ER) techniques can be used to facilitate this task, they often require ad-hoc, manual rule development and feature engineering. Recently, Deep Learning (DL)-based ER has outperformed manual entity matching rules by automatically extracting prominent features from incomplete or inaccurate records. In this study, we enhance the existing privacy risk assessment with a DL-based ER method, namely Multi-Context Attention (MCA), to comprehensively evaluate individuals' PII exposure across the different online platforms in the dark web and surface web. Evaluation against benchmark ER models indicates the efficacy of MCA. Using MCA on a random sample of data breach victims in the dark web, we are able to identify 4.3% of the victims on the surface web platforms and calculate their privacy risk scores.

Original languageEnglish (US)
Title of host publicationProceedings - 20th IEEE International Conference on Data Mining Workshops, ICDMW 2020
EditorsGiuseppe Di Fatta, Victor Sheng, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu
PublisherIEEE Computer Society
Pages488-495
Number of pages8
ISBN (Electronic)9781728190129
DOIs
StatePublished - Nov 2020
Event20th IEEE International Conference on Data Mining Workshops, ICDMW 2020 - Virtual, Sorrento, Italy
Duration: Nov 17 2020Nov 20 2020

Publication series

NameIEEE International Conference on Data Mining Workshops, ICDMW
Volume2020-November
ISSN (Print)2375-9232
ISSN (Electronic)2375-9259

Conference

Conference20th IEEE International Conference on Data Mining Workshops, ICDMW 2020
Country/TerritoryItaly
CityVirtual, Sorrento
Period11/17/2011/20/20

Keywords

  • Dark web
  • Data breach
  • Data collection
  • PII
  • Privacy
  • Surface web

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Linking Personally Identifiable Information from the Dark Web to the Surface Web: A Deep Entity Resolution Approach'. Together they form a unique fingerprint.

Cite this