Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web

Yizhi Liu, Fang Yu Lin, Zara Ahmad-Post, Mohammadreza Ebrahimi, Ning Zhang, James Lee Hu, Jingyu Xin, Weifeng Li, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect data breach victims, researchers focus on collecting exposed PII to assess privacy risk and identify at-risk individuals. However, existing studies mostly rely on exposed PII collected from either the dark web or the surface web. Due to the wide exposure of PII on both the dark web and surface web, collecting from only the dark web or the surface web could result in an underestimation of privacy risk. Despite its research and practical value, jointly collecting PII from both sources is a non-trivial task. In this paper, we summarize our effort to systematically identify, collect, and monitor a total of 1, 212, 004, 819 exposed PII records across both the dark web and surface web. Our effort resulted in 5.8 million stolen SSNs, 845, 000 stolen credit/debit cards, and 1.2 billion stolen account credentials. From the surface web, we identified and collected over 1.3 million PII records of the victims whose PII is exposed on the dark web. To the best of our knowledge, this is the largest academic collection of exposed PII, which, if properly anonymized, enables various privacy research inquiries, including assessing privacy risk and identifying at-risk populations.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728188003
DOIs
StatePublished - Nov 9 2020
Event18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020 - Virtual, Arlington, United States
Duration: Nov 9 2020Nov 10 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020

Conference

Conference18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020
Country/TerritoryUnited States
CityVirtual, Arlington
Period11/9/2011/10/20

Keywords

  • dark web
  • data breach
  • data collection
  • PII
  • privacy
  • surface web

ASJC Scopus subject areas

  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Information Systems

Fingerprint

Dive into the research topics of 'Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web'. Together they form a unique fingerprint.

Cite this