Authorship analysis in cybercrime investigation

Rong Zheng, Yi Qin, Zan Huang, Hsinchun Chen

Research output: Contribution to journalArticle

57 Citations (Scopus)

Abstract

Criminals have been using the Internet to distribute a wide range of illegal materials globally in an anonymous manner, making criminal identity tracing difficult in the cybercrime investigation process. In this study we propose to adopt the authorship analysis framework to automatically trace identities of cyber criminals through messages they post on the Internet. Under this framework, three types of message features, including style markers, structural features, and content-specific features, are extracted and inductive learning algorithms are used to build feature-based models to identify authorship of illegal messages. To evaluate the effectiveness of this framework, we conducted an experimental study on data sets of English and Chinese email and online newsgroup messages. We experimented with all three types of message features and three inductive learning algorithms. The results indicate that the proposed approach can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies.

Original languageEnglish (US)
Pages (from-to)59-73
Number of pages15
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2665
StatePublished - 2003

Fingerprint

Authorship
Internet
Inductive Learning
Learning algorithms
Learning Algorithm
Learning
Trace Identity
Electronic mail
Electronic Mail
Tracing
Experimental Study
High Accuracy
Evaluate
Range of data
Framework
Model

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

@article{f16e867da2434287a31d92827ea9073f,
title = "Authorship analysis in cybercrime investigation",
abstract = "Criminals have been using the Internet to distribute a wide range of illegal materials globally in an anonymous manner, making criminal identity tracing difficult in the cybercrime investigation process. In this study we propose to adopt the authorship analysis framework to automatically trace identities of cyber criminals through messages they post on the Internet. Under this framework, three types of message features, including style markers, structural features, and content-specific features, are extracted and inductive learning algorithms are used to build feature-based models to identify authorship of illegal messages. To evaluate the effectiveness of this framework, we conducted an experimental study on data sets of English and Chinese email and online newsgroup messages. We experimented with all three types of message features and three inductive learning algorithms. The results indicate that the proposed approach can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies.",
author = "Rong Zheng and Yi Qin and Zan Huang and Hsinchun Chen",
year = "2003",
language = "English (US)",
volume = "2665",
pages = "59--73",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Authorship analysis in cybercrime investigation

AU - Zheng, Rong

AU - Qin, Yi

AU - Huang, Zan

AU - Chen, Hsinchun

PY - 2003

Y1 - 2003

N2 - Criminals have been using the Internet to distribute a wide range of illegal materials globally in an anonymous manner, making criminal identity tracing difficult in the cybercrime investigation process. In this study we propose to adopt the authorship analysis framework to automatically trace identities of cyber criminals through messages they post on the Internet. Under this framework, three types of message features, including style markers, structural features, and content-specific features, are extracted and inductive learning algorithms are used to build feature-based models to identify authorship of illegal messages. To evaluate the effectiveness of this framework, we conducted an experimental study on data sets of English and Chinese email and online newsgroup messages. We experimented with all three types of message features and three inductive learning algorithms. The results indicate that the proposed approach can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies.

AB - Criminals have been using the Internet to distribute a wide range of illegal materials globally in an anonymous manner, making criminal identity tracing difficult in the cybercrime investigation process. In this study we propose to adopt the authorship analysis framework to automatically trace identities of cyber criminals through messages they post on the Internet. Under this framework, three types of message features, including style markers, structural features, and content-specific features, are extracted and inductive learning algorithms are used to build feature-based models to identify authorship of illegal messages. To evaluate the effectiveness of this framework, we conducted an experimental study on data sets of English and Chinese email and online newsgroup messages. We experimented with all three types of message features and three inductive learning algorithms. The results indicate that the proposed approach can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies.

UR - http://www.scopus.com/inward/record.url?scp=35048871882&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35048871882&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:35048871882

VL - 2665

SP - 59

EP - 73

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -